spark-instrumented-optimizer

History

Zhenhua Wang 655f6f86f8 [SPARK-22208][SQL] Improve percentile_approx by not rounding up targetError and starting from index 0 ## What changes were proposed in this pull request? Currently percentile_approx never returns the first element when percentile is in (relativeError, 1/N], where relativeError default 1/10000, and N is the total number of elements. But ideally, percentiles in [0, 1/N] should all return the first element as the answer. For example, given input data 1 to 10, if a user queries 10% (or even less) percentile, it should return 1, because the first value 1 already reaches 10%. Currently it returns 2. Based on the paper, targetError is not rounded up, and searching index should start from 0 instead of 1. By following the paper, we should be able to fix the cases mentioned above. ## How was this patch tested? Added a new test case and fix existing test cases. Author: Zhenhua Wang <wzh_zju@163.com> Closes #19438 from wzhfy/improve_percentile_approx.		2017-10-11 00:16:12 -07:00
..
jarTest.R	[SPARK-20877][SPARKR] refactor tests to basic tests only for CRAN	2017-06-11 00:00:33 -07:00
packageInAJarTest.R	[SPARK-20877][SPARKR] refactor tests to basic tests only for CRAN	2017-06-11 00:00:33 -07:00
test_binary_function.R	[SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r	2017-10-01 18:42:45 +09:00
test_binaryFile.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_broadcast.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_client.R	[SPARK-19810][BUILD][CORE] Remove support for Scala 2.10	2017-07-13 17:06:24 +08:00
test_context.R	[SPARK-21149][R] Add job description API for R	2017-06-23 09:59:24 -07:00
test_includePackage.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_jvm_api.R	[SPARK-20877][SPARKR] refactor tests to basic tests only for CRAN	2017-06-11 00:00:33 -07:00
test_mllib_classification.R	[SPARK-21381][SPARKR] SparkR: pass on setHandleInvalid for classification algorithms	2017-07-31 20:37:06 -07:00
test_mllib_clustering.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_mllib_fpm.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_mllib_recommendation.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_mllib_regression.R	[SPARK-21622][ML][SPARKR] Support offset in SparkR GLM	2017-08-06 15:14:12 -07:00
test_mllib_stat.R	[SPARK-20877][SPARKR] refactor tests to basic tests only for CRAN	2017-06-11 00:00:33 -07:00
test_mllib_tree.R	[SPARK-21801][SPARKR][TEST] unit test randomly fail with randomforest	2017-08-29 10:09:41 -07:00
test_parallelize_collect.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_rdd.R	[SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r	2017-10-01 18:42:45 +09:00
test_Serde.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_shuffle.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_sparkR.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_sparkSQL.R	[SPARK-22208][SQL] Improve percentile_approx by not rounding up targetError and starting from index 0	2017-10-11 00:16:12 -07:00
test_streaming.R	[SPARK-21224][R] Specify a schema by using a DDL-formatted string when reading in R	2017-06-28 19:36:00 -07:00
test_take.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_textFile.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_utils.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00
test_Windows.R	[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move	2017-06-11 03:00:44 -07:00