spark-instrumented-optimizer

History

pengbo d9b2ce0f0f [SPARK-27539][SQL] Fix inaccurate aggregate outputRows estimation with column containing null values ## What changes were proposed in this pull request? This PR is follow up of https://github.com/apache/spark/pull/24286. As gatorsmile pointed out that column with null value is inaccurate as well. ``` > select key from test; 2 NULL 1 spark-sql> desc extended test key; col_name key data_type int comment NULL min 1 max 2 num_nulls 1 distinct_count 2 ``` The distinct count should be distinct_count + 1 when column contains null value. ## How was this patch tested? Existing tests & new UT added. Closes #24436 from pengbo/aggregation_estimation. Authored-by: pengbo <bo.peng1019@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>		2019-04-22 20:30:08 -07:00
..
benchmarks	[SPARK-25657][SQL][TEST] Refactor HashBenchmark to use main method	2018-10-07 09:49:37 -07:00
src	[SPARK-27539][SQL] Fix inaccurate aggregate outputRows estimation with column containing null values	2019-04-22 20:30:08 -07:00
pom.xml	[SPARK-27016][SQL][BUILD] Treat all antlr warnings as errors while generating parser from the sql grammar file.	2019-03-03 10:02:25 -06:00