History

Takeshi YAMAMURO 2b0cc4e0df [SPARK-12978][SQL] Skip unnecessary final group-by when input data already clustered with group-by keys This ticket targets the optimization to skip an unnecessary group-by operation below; Without opt.: ``` == Physical Plan == TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Final,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)], output=[col0#159,sum(col1)#177,avg(col2)#178]) +- TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Partial,isDistinct=false),(avg(col2#161),mode=Partial,isDistinct=false)], output=[col0#159,sum#200,sum#201,count#202L]) +- TungstenExchange hashpartitioning(col0#159,200), None +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], InMemoryRelation [col0#159,col1#160,col2#161], true, 10000, StorageLevel(true, true, false, true, 1), ConvertToUnsafe, None ``` With opt.: ``` == Physical Plan == TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Complete,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)], output=[col0#159,sum(col1)#177,avg(col2)#178]) +- TungstenExchange hashpartitioning(col0#159,200), None +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], InMemoryRelation [col0#159,col1#160,col2#161], true, 10000, StorageLevel(true, true, false, true, 1), ConvertToUnsafe, None ``` Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10896 from maropu/SkipGroupbySpike.		2016-08-25 12:39:58 +02:00
..
avro	[SPARK-10136] [SQL] Fixes Parquet support for Avro array of primitive array	2015-08-20 11:00:29 -07:00
gen-java/org/apache/spark/sql/execution/datasources/parquet/test/avro	[SPARK-13401][SQL][TESTS] Fix SQL test warnings.	2016-03-22 21:08:11 -07:00
java/test/org/apache/spark/sql	[SPARK-17007][SQL] Move test data files into a test-data folder	2016-08-10 21:26:46 -07:00
resources	[SPARK-17098][SQL] Fix `NullPropagation` optimizer to handle `COUNT(NULL) OVER` correctly	2016-08-21 22:07:47 +02:00
scala/org/apache/spark/sql	[SPARK-12978][SQL] Skip unnecessary final group-by when input data already clustered with group-by keys	2016-08-25 12:39:58 +02:00
scripts	[SPARK-9407] [SQL] Relaxes Parquet ValidTypeMap to allow ENUM predicates to be pushed down	2015-08-12 20:01:34 +08:00
thrift	[SPARK-9407] [SQL] Relaxes Parquet ValidTypeMap to allow ENUM predicates to be pushed down	2015-08-12 20:01:34 +08:00
README.md	[SPARK-9407] [SQL] Relaxes Parquet ValidTypeMap to allow ENUM predicates to be pushed down	2015-08-12 20:01:34 +08:00

README.md

Notes for Parquet compatibility tests

The following directories and files are used for Parquet compatibility tests:

.
├── README.md                   # This file
├── avro
│   ├── *.avdl                  # Testing Avro IDL(s)
│   └── *.avpr                  # !! NO TOUCH !! Protocol files generated from Avro IDL(s)
├── gen-java                    # !! NO TOUCH !! Generated Java code
├── scripts
│   ├── gen-avro.sh             # Script used to generate Java code for Avro
│   └── gen-thrift.sh           # Script used to generate Java code for Thrift
└── thrift
    └── *.thrift                # Testing Thrift schema(s)

To avoid code generation during build time, Java code generated from testing Thrift schema and Avro IDL are also checked in.

When updating the testing Thrift schema and Avro IDL, please run gen-avro.sh and gen-thrift.sh accordingly to update generated Java code.

Prerequisites

Please ensure avro-tools and thrift are installed. You may install these two on Mac OS X via:

$ brew install thrift avro-tools