spark-instrumented-optimizer/sql/core
Gengliang Wang 78a403fab9 [SPARK-27627][SQL] Make option "pathGlobFilter" as a general option for all file sources
## What changes were proposed in this pull request?

### Background:
The data source option `pathGlobFilter` is introduced for Binary file format: https://github.com/apache/spark/pull/24354 , which can be used for filtering file names, e.g. reading `.png` files only while there is `.json` files in the same directory.

### Proposal:
Make the option `pathGlobFilter` as a general option for all file sources. The path filtering should happen in the path globbing on Driver.

### Motivation:
Filtering the file path names in file scan tasks on executors is kind of ugly.

### Impact:
1. The splitting of file partitions will be more balanced.
2. The metrics of file scan will be more accurate.
3. Users can use the option for reading other file sources.

## How was this patch tested?

Unit tests

Closes #24518 from gengliangwang/globFilter.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-05-09 08:41:43 +09:00
..
benchmarks [SPARK-27535][SQL][TEST] Date and timestamp JSON benchmarks 2019-04-23 11:09:14 +09:00
src [SPARK-27627][SQL] Make option "pathGlobFilter" as a general option for all file sources 2019-05-09 08:41:43 +09:00
v1.2.1/src [SPARK-27182][SQL] Move the conflict source code of the sql/core module to sql/core/v1.2.1 2019-03-26 22:32:03 -07:00
v2.3.4/src [SPARK-27176][SQL] Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4 2019-04-08 08:42:21 -07:00
pom.xml [SPARK-27182][SQL] Move the conflict source code of the sql/core module to sql/core/v1.2.1 2019-03-26 22:32:03 -07:00