spark-instrumented-optimizer

History

Cheng Su cfe012a431 [SPARK-32629][SQL] Track metrics of BitSet/OpenHashSet in full outer SHJ ### What changes were proposed in this pull request? This is followup from https://github.com/apache/spark/pull/29342, where to do two things: * Per https://github.com/apache/spark/pull/29342#discussion_r470153323, change from java `HashSet` to spark in-house `OpenHashSet` to track matched rows for non-unique join keys. I checked `OpenHashSet` implementation which is built from a key index (`OpenHashSet._bitset` as `BitSet`) and key array (`OpenHashSet._data` as `Array`). Java `HashSet` is built from `HashMap`, which stores value in `Node` linked list and by theory should have taken more memory than `OpenHashSet`. Reran the same benchmark query used in https://github.com/apache/spark/pull/29342, and verified the query has similar performance here between `HashSet` and `OpenHashSet`. * Track metrics of the extra data structure `BitSet`/`OpenHashSet` for full outer SHJ. This depends on above thing, because there seems no easy way to get java `HashSet` memory size. ### Why are the changes needed? To better surface the memory usage for full outer SHJ more accurately. This can help users/developers to debug/improve full outer SHJ. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added unite test in `SQLMetricsSuite.scala` . Closes #29566 from c21/add-metrics. Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>		2020-08-30 07:01:33 +09:00
..
benchmarks	[SPARK-30648][SQL] Support filters pushdown in JSON datasource	2020-07-17 00:01:13 +09:00
src	[SPARK-32629][SQL] Track metrics of BitSet/OpenHashSet in full outer SHJ	2020-08-30 07:01:33 +09:00
v1.2/src	[SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis	2020-08-25 13:47:52 +09:00
v2.3/src	[SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis	2020-08-25 13:47:52 +09:00
pom.xml	[SPARK-31336][SQL] Support Oracle Kerberos login in JDBC connector	2020-06-30 10:30:22 -07:00