spark-instrumented-optimizer/sql/core/src/test
Josh Rosen 8cb415a4b9 [SPARK-9451] [SQL] Support entries larger than default page size in BytesToBytesMap & integrate with ShuffleMemoryManager
This patch adds support for entries larger than the default page size in BytesToBytesMap.  These large rows are handled by allocating special overflow pages to hold individual entries.

In addition, this patch integrates BytesToBytesMap with the ShuffleMemoryManager:

- Move BytesToBytesMap from `unsafe` to `core` so that it can import `ShuffleMemoryManager`.
- Before allocating new data pages, ask the ShuffleMemoryManager to reserve the memory:
  - `putNewKey()` now returns a boolean to indicate whether the insert succeeded or failed due to a lack of memory.  The caller can use this value to respond to the memory pressure (e.g. by spilling).
- `UnsafeFixedWidthAggregationMap. getAggregationBuffer()` now returns `null` to signal failure due to a lack of memory.
- Updated all uses of these classes to handle these error conditions.
- Added new tests for allocating large records and for allocations which fail due to memory pressure.
- Extended the `afterAll()` test teardown methods to detect ShuffleMemoryManager leaks.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #7762 from JoshRosen/large-rows and squashes the following commits:

ae7bc56 [Josh Rosen] Fix compilation
82fc657 [Josh Rosen] Merge remote-tracking branch 'origin/master' into large-rows
34ab943 [Josh Rosen] Remove semi
31a525a [Josh Rosen] Integrate BytesToBytesMap with ShuffleMemoryManager.
626b33c [Josh Rosen] Move code to sql/core and spark/core packages so that ShuffleMemoryManager can be integrated
ec4484c [Josh Rosen] Move BytesToBytesMap from unsafe package to core.
642ed69 [Josh Rosen] Rename size to numElements
bea1152 [Josh Rosen] Add basic test.
2cd3570 [Josh Rosen] Remove accidental duplicated code
07ff9ef [Josh Rosen] Basic support for large rows in BytesToBytesMap.
2015-07-31 19:19:27 -07:00
..
avro [SPARK-6123] [SPARK-6775] [SPARK-6776] [SQL] Refactors Parquet read path for interoperability and backwards-compatibility 2015-07-08 15:51:01 -07:00
gen-java/org/apache/spark/sql/parquet/test/avro [SPARK-8959] [SQL] [HOTFIX] Removes parquet-thrift and libthrift dependencies 2015-07-09 17:09:16 -07:00
java/test/org/apache/spark/sql [SPARK-7157][SQL] add sampleBy to DataFrame 2015-07-30 17:16:03 -07:00
resources [SPARK-8959] [SQL] [HOTFIX] Removes parquet-thrift and libthrift dependencies 2015-07-09 17:09:16 -07:00
scala/org/apache/spark/sql [SPARK-9451] [SQL] Support entries larger than default page size in BytesToBytesMap & integrate with ShuffleMemoryManager 2015-07-31 19:19:27 -07:00
scripts [SPARK-6123] [SPARK-6775] [SPARK-6776] [SQL] Refactors Parquet read path for interoperability and backwards-compatibility 2015-07-08 15:51:01 -07:00
thrift [SPARK-6123] [SPARK-6775] [SPARK-6776] [SQL] Refactors Parquet read path for interoperability and backwards-compatibility 2015-07-08 15:51:01 -07:00
README.md [SPARK-6123] [SPARK-6775] [SPARK-6776] [SQL] Refactors Parquet read path for interoperability and backwards-compatibility 2015-07-08 15:51:01 -07:00

Notes for Parquet compatibility tests

The following directories and files are used for Parquet compatibility tests:

.
├── README.md                   # This file
├── avro
│   ├── parquet-compat.avdl     # Testing Avro IDL
│   └── parquet-compat.avpr     # !! NO TOUCH !! Protocol file generated from parquet-compat.avdl
├── gen-java                    # !! NO TOUCH !! Generated Java code
├── scripts
│   └── gen-code.sh             # Script used to generate Java code for Thrift and Avro
└── thrift
    └── parquet-compat.thrift   # Testing Thrift schema

Generated Java code are used in the following test suites:

  • org.apache.spark.sql.parquet.ParquetAvroCompatibilitySuite
  • org.apache.spark.sql.parquet.ParquetThriftCompatibilitySuite

To avoid code generation during build time, Java code generated from testing Thrift schema and Avro IDL are also checked in.

When updating the testing Thrift schema and Avro IDL, please run gen-code.sh to update all the generated Java code.

Prerequisites

Please ensure avro-tools and thrift are installed. You may install these two on Mac OS X via:

$ brew install thrift avro-tools