spark-instrumented-optimizer

History

wm624@hotmail.com 9ac05225e8 [SPARK-19319][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k ## What changes were proposed in this pull request When Kmeans using initMode = "random" and some random seed, it is possible the actual cluster size doesn't equal to the configured `k`. In this case, summary(model) returns error due to the number of cols of coefficient matrix doesn't equal to k. Example: > col1 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0) > col2 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0) > col3 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0) > cols <- as.data.frame(cbind(col1, col2, col3)) > df <- createDataFrame(cols) > > model2 <- spark.kmeans(data = df, ~ ., k = 5, maxIter = 10, initMode = "random", seed = 22222, tol = 1E-5) > > summary(model2) Error in `colnames<-`(`tmp`, value = c("col1", "col2", "col3")) : length of 'dimnames' [2] not equal to array extent In addition: Warning message: In matrix(coefficients, ncol = k) : data length [9] is not a sub-multiple or multiple of the number of rows [2] Fix: Get the actual cluster size in the summary and use it to build the coefficient matrix. ## How was this patch tested? Add unit tests. Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16666 from wangmiao1981/kmeans.		2017-01-31 21:16:37 -08:00
..
inst	[SPARK-19319][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k	2017-01-31 21:16:37 -08:00
R	[SPARK-19319][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k	2017-01-31 21:16:37 -08:00
src-native	[SPARK-6811] Copy SparkR lib in make-distribution.sh	2015-05-23 00:04:01 -07:00
tests	[SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases.	2015-12-07 10:38:17 -08:00
vignettes	[SPARKR][DOCS] update R API doc for subset/extract	2017-01-30 18:47:14 -08:00
.lintr	[SPARK-12327][SPARKR] fix code for lintr warning for commented code	2016-01-03 20:53:35 +05:30
.Rbuildignore	[SPARK-18590][SPARKR] build R source package when making distribution	2016-12-08 11:29:31 -08:00
DESCRIPTION	[SPARK-18862][SPARKR][ML] Split SparkR mllib.R into multiple files	2017-01-08 01:10:36 -08:00
NAMESPACE	[SPARK-19333][SPARKR] Add Apache License headers to R files	2017-01-27 10:31:28 -08:00