spark-instrumented-optimizer

History

Davies Liu d7b58f1461 [SPARK-14052] [SQL] build a BytesToBytesMap directly in HashedRelation ## What changes were proposed in this pull request? Currently, for the key that can not fit within a long, we build a hash map for UnsafeHashedRelation, it's converted to BytesToBytesMap after serialization and deserialization. We should build a BytesToBytesMap directly to have better memory efficiency. In order to do that, BytesToBytesMap should support multiple (K,V) pair with the same K, Location.putNewKey() is renamed to Location.append(), which could append multiple values for the same key (same Location). `Location.newValue()` is added to find the next value for the same key. ## How was this patch tested? Existing tests. Added benchmark for broadcast hash join with duplicated keys. Author: Davies Liu <davies@databricks.com> Closes #11870 from davies/map2.	2016-03-28 13:07:32 -07:00
..
src	[SPARK-14052] [SQL] build a BytesToBytesMap directly in HashedRelation	2016-03-28 13:07:32 -07:00
pom.xml	[SPARK-13848][SPARK-5185] Update to Py4J 0.9.2 in order to fix classloading issue	2016-03-14 12:22:02 -07:00

Davies Liu d7b58f1461 [SPARK-14052] [SQL] build a BytesToBytesMap directly in HashedRelation

## What changes were proposed in this pull request?

Currently, for the key that can not fit within a long,  we build a hash map for UnsafeHashedRelation, it's converted to BytesToBytesMap after serialization and deserialization. We should build a BytesToBytesMap directly to have better memory efficiency.

In order to do that, BytesToBytesMap should support multiple (K,V) pair with the same K,  Location.putNewKey() is renamed to Location.append(), which could append multiple values for the same key (same Location). `Location.newValue()` is added to find the next value for the same key.

## How was this patch tested?

Existing tests. Added benchmark for broadcast hash join with duplicated keys.

Author: Davies Liu <davies@databricks.com>

Closes #11870 from davies/map2.

2016-03-28 13:07:32 -07:00

src

[SPARK-14052] [SQL] build a BytesToBytesMap directly in HashedRelation

2016-03-28 13:07:32 -07:00

pom.xml

[SPARK-13848][SPARK-5185] Update to Py4J 0.9.2 in order to fix classloading issue

2016-03-14 12:22:02 -07:00