spark-instrumented-optimizer

History

wm624@hotmail.com 9ac05225e8 [SPARK-19319][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k ## What changes were proposed in this pull request When Kmeans using initMode = "random" and some random seed, it is possible the actual cluster size doesn't equal to the configured `k`. In this case, summary(model) returns error due to the number of cols of coefficient matrix doesn't equal to k. Example: > col1 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0) > col2 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0) > col3 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0) > cols <- as.data.frame(cbind(col1, col2, col3)) > df <- createDataFrame(cols) > > model2 <- spark.kmeans(data = df, ~ ., k = 5, maxIter = 10, initMode = "random", seed = 22222, tol = 1E-5) > > summary(model2) Error in `colnames<-`(`tmp`, value = c("col1", "col2", "col3")) : length of 'dimnames' [2] not equal to array extent In addition: Warning message: In matrix(coefficients, ncol = k) : data length [9] is not a sub-multiple or multiple of the number of rows [2] Fix: Get the actual cluster size in the summary and use it to build the coefficient matrix. ## How was this patch tested? Add unit tests. Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16666 from wangmiao1981/kmeans.		2017-01-31 21:16:37 -08:00
..
jarTest.R	[SPARK-10683][SPARK-16510][SPARKR] Move SparkR include jar test to SparkSubmitSuite	2016-07-19 19:28:08 -07:00
packageInAJarTest.R	[SPARKR][MINOR] R examples and test updates	2016-07-13 13:33:34 -07:00
test_binary_function.R	[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check	2016-08-16 11:19:18 -07:00
test_binaryFile.R	[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check	2016-08-16 11:19:18 -07:00
test_broadcast.R	[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check	2016-08-16 11:19:18 -07:00
test_client.R	[MINOR] [SPARKR] Update data-manipulation.R to use native csv reader	2016-05-09 09:58:36 -07:00
test_context.R	[SPARK-17577][FOLLOW-UP][SPARKR] SparkR spark.addFile supports adding directory recursively	2016-09-26 16:47:57 -07:00
test_includePackage.R	[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check	2016-08-16 11:19:18 -07:00
test_jvm_api.R	[SPARK-16581][SPARKR] Fix JVM API tests in SparkR	2016-08-31 16:56:41 -07:00
test_mllib_classification.R	[SPARK-19395][SPARKR] Convert coefficients in summary to matrix	2017-01-31 12:20:43 -08:00
test_mllib_clustering.R	[SPARK-19319][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k	2017-01-31 21:16:37 -08:00
test_mllib_recommendation.R	[SPARK-18862][SPARKR][ML] Split SparkR mllib.R into multiple files	2017-01-08 01:10:36 -08:00
test_mllib_regression.R	[SPARK-19395][SPARKR] Convert coefficients in summary to matrix	2017-01-31 12:20:43 -08:00
test_mllib_stat.R	[SPARK-18862][SPARKR][ML] Split SparkR mllib.R into multiple files	2017-01-08 01:10:36 -08:00
test_mllib_tree.R	[SPARK-19066][SPARKR] SparkR LDA doesn't set optimizer correctly	2017-01-16 06:05:59 -08:00
test_parallelize_collect.R	[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check	2016-08-16 11:19:18 -07:00
test_rdd.R	[SPARK-18788][SPARKR] Add API for getNumPartitions	2017-01-26 21:06:39 -08:00
test_Serde.R	[SPARK-16027][SPARKR] Fix R tests SparkSession init/stop	2016-07-17 19:02:21 -07:00
test_shuffle.R	[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check	2016-08-16 11:19:18 -07:00
test_sparkR.R	[SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package.	2016-11-22 00:05:30 -08:00
test_sparkSQL.R	[SPARK-18788][SPARKR] Add API for getNumPartitions	2017-01-26 21:06:39 -08:00
test_take.R	[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check	2016-08-16 11:19:18 -07:00
test_textFile.R	[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check	2016-08-16 11:19:18 -07:00
test_utils.R	[SPARK-18810][SPARKR] SparkR install.spark does not work for RCs, snapshots	2016-12-12 14:40:41 -08:00
test_Windows.R	[SPARK-19324][SPARKR] Spark VJM stdout output is getting dropped in SparkR	2017-01-27 12:41:35 -08:00