spark-instrumented-optimizer/R/pkg/inst/tests/testthat
wm624@hotmail.com 9ac05225e8 [SPARK-19319][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k
## What changes were proposed in this pull request

When Kmeans using initMode = "random" and some random seed, it is possible the actual cluster size doesn't equal to the configured `k`.

In this case, summary(model) returns error due to the number of cols of coefficient matrix doesn't equal to k.

Example:
>  col1 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
>   col2 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
>   col3 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
>   cols <- as.data.frame(cbind(col1, col2, col3))
>   df <- createDataFrame(cols)
>
>   model2 <- spark.kmeans(data = df, ~ ., k = 5, maxIter = 10,  initMode = "random", seed = 22222, tol = 1E-5)
>
> summary(model2)
Error in `colnames<-`(`*tmp*`, value = c("col1", "col2", "col3")) :
  length of 'dimnames' [2] not equal to array extent
In addition: Warning message:
In matrix(coefficients, ncol = k) :
  data length [9] is not a sub-multiple or multiple of the number of rows [2]

Fix: Get the actual cluster size in the summary and use it to build the coefficient matrix.
## How was this patch tested?

Add unit tests.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16666 from wangmiao1981/kmeans.
2017-01-31 21:16:37 -08:00
..
jarTest.R [SPARK-10683][SPARK-16510][SPARKR] Move SparkR include jar test to SparkSubmitSuite 2016-07-19 19:28:08 -07:00
packageInAJarTest.R [SPARKR][MINOR] R examples and test updates 2016-07-13 13:33:34 -07:00
test_binary_function.R [SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check 2016-08-16 11:19:18 -07:00
test_binaryFile.R [SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check 2016-08-16 11:19:18 -07:00
test_broadcast.R [SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check 2016-08-16 11:19:18 -07:00
test_client.R [MINOR] [SPARKR] Update data-manipulation.R to use native csv reader 2016-05-09 09:58:36 -07:00
test_context.R [SPARK-17577][FOLLOW-UP][SPARKR] SparkR spark.addFile supports adding directory recursively 2016-09-26 16:47:57 -07:00
test_includePackage.R [SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check 2016-08-16 11:19:18 -07:00
test_jvm_api.R [SPARK-16581][SPARKR] Fix JVM API tests in SparkR 2016-08-31 16:56:41 -07:00
test_mllib_classification.R [SPARK-19395][SPARKR] Convert coefficients in summary to matrix 2017-01-31 12:20:43 -08:00
test_mllib_clustering.R [SPARK-19319][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k 2017-01-31 21:16:37 -08:00
test_mllib_recommendation.R [SPARK-18862][SPARKR][ML] Split SparkR mllib.R into multiple files 2017-01-08 01:10:36 -08:00
test_mllib_regression.R [SPARK-19395][SPARKR] Convert coefficients in summary to matrix 2017-01-31 12:20:43 -08:00
test_mllib_stat.R [SPARK-18862][SPARKR][ML] Split SparkR mllib.R into multiple files 2017-01-08 01:10:36 -08:00
test_mllib_tree.R [SPARK-19066][SPARKR] SparkR LDA doesn't set optimizer correctly 2017-01-16 06:05:59 -08:00
test_parallelize_collect.R [SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check 2016-08-16 11:19:18 -07:00
test_rdd.R [SPARK-18788][SPARKR] Add API for getNumPartitions 2017-01-26 21:06:39 -08:00
test_Serde.R [SPARK-16027][SPARKR] Fix R tests SparkSession init/stop 2016-07-17 19:02:21 -07:00
test_shuffle.R [SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check 2016-08-16 11:19:18 -07:00
test_sparkR.R [SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package. 2016-11-22 00:05:30 -08:00
test_sparkSQL.R [SPARK-18788][SPARKR] Add API for getNumPartitions 2017-01-26 21:06:39 -08:00
test_take.R [SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check 2016-08-16 11:19:18 -07:00
test_textFile.R [SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R CMD check 2016-08-16 11:19:18 -07:00
test_utils.R [SPARK-18810][SPARKR] SparkR install.spark does not work for RCs, snapshots 2016-12-12 14:40:41 -08:00
test_Windows.R [SPARK-19324][SPARKR] Spark VJM stdout output is getting dropped in SparkR 2017-01-27 12:41:35 -08:00