spark-instrumented-optimizer/sql/core/src/test
Liang-Chi Hsieh 750ed64cd9 [SPARK-13930] [SQL] Apply fast serialization on collect limit operator
## What changes were proposed in this pull request?

JIRA: https://issues.apache.org/jira/browse/SPARK-13930

Recently the fast serialization has been introduced to collecting DataFrame/Dataset (#11664). The same technology can be used on collect limit operator too.

## How was this patch tested?

Add a benchmark for collect limit to `BenchmarkWholeStageCodegen`.

Without this patch:

    model name      : Westmere E56xx/L56xx/X56xx (Nehalem-C)
    collect limit:                      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -------------------------------------------------------------------------------------------
    collect limit 1 million                  3413 / 3768          0.3        3255.0       1.0X
    collect limit 2 millions                9728 / 10440          0.1        9277.3       0.4X

With this patch:

    model name      : Westmere E56xx/L56xx/X56xx (Nehalem-C)
    collect limit:                      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -------------------------------------------------------------------------------------------
    collect limit 1 million                   833 / 1284          1.3         794.4       1.0X
    collect limit 2 millions                 3348 / 4005          0.3        3193.3       0.2X

Author: Liang-Chi Hsieh <simonh@tw.ibm.com>

Closes #11759 from viirya/execute-take.
2016-03-17 23:24:44 -07:00
..
avro [SPARK-10136] [SQL] Fixes Parquet support for Avro array of primitive array 2015-08-20 11:00:29 -07:00
gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro [SPARK-10136] [SQL] Fixes Parquet support for Avro array of primitive array 2015-08-20 11:00:29 -07:00
java/test/org/apache/spark/sql [SPARK-13894][SQL] SqlContext.range return type from DataFrame to DataSet 2016-03-16 11:20:15 -07:00
resources [SPARK-13442][SQL] Make type inference recognize boolean types 2016-03-07 14:32:01 -08:00
scala/org/apache/spark/sql [SPARK-13930] [SQL] Apply fast serialization on collect limit operator 2016-03-17 23:24:44 -07:00
scripts [SPARK-9407] [SQL] Relaxes Parquet ValidTypeMap to allow ENUM predicates to be pushed down 2015-08-12 20:01:34 +08:00
thrift [SPARK-9407] [SQL] Relaxes Parquet ValidTypeMap to allow ENUM predicates to be pushed down 2015-08-12 20:01:34 +08:00
README.md [SPARK-9407] [SQL] Relaxes Parquet ValidTypeMap to allow ENUM predicates to be pushed down 2015-08-12 20:01:34 +08:00

Notes for Parquet compatibility tests

The following directories and files are used for Parquet compatibility tests:

.
├── README.md                   # This file
├── avro
│   ├── *.avdl                  # Testing Avro IDL(s)
│   └── *.avpr                  # !! NO TOUCH !! Protocol files generated from Avro IDL(s)
├── gen-java                    # !! NO TOUCH !! Generated Java code
├── scripts
│   ├── gen-avro.sh             # Script used to generate Java code for Avro
│   └── gen-thrift.sh           # Script used to generate Java code for Thrift
└── thrift
    └── *.thrift                # Testing Thrift schema(s)

To avoid code generation during build time, Java code generated from testing Thrift schema and Avro IDL are also checked in.

When updating the testing Thrift schema and Avro IDL, please run gen-avro.sh and gen-thrift.sh accordingly to update generated Java code.

Prerequisites

Please ensure avro-tools and thrift are installed. You may install these two on Mac OS X via:

$ brew install thrift avro-tools