spark-instrumented-optimizer/external
Guy Khazma 2d59ca464e [SPARK-30475][SQL] File source V2: Push data filters for file listing
### What changes were proposed in this pull request?
Follow up on [SPARK-30428](https://github.com/apache/spark/pull/27112) which added support for partition pruning in File source V2.
This PR implements the necessary changes in order to pass the `dataFilters` to the `listFiles`. This enables having `FileIndex` implementations which use the `dataFilters` for further pruning the file listing (see the discussion [here](https://github.com/apache/spark/pull/27112#discussion_r364757217)).

### Why are the changes needed?
Datasources such as `csv` and `json` do not implement the `SupportsPushDownFilters` trait. In order to support data skipping uniformly for all file based data sources, one can override the `listFiles` method in a `FileIndex` implementation, which consults external metadata and prunes the list of files.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Modifying the unit tests for v2 file sources to verify the `dataFilters` are passed

Closes #27157 from guykhazma/PushdataFiltersInFileListing.

Authored-by: Guy Khazma <guykhag@gmail.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
2020-01-20 20:20:37 -08:00
..
avro [SPARK-30475][SQL] File source V2: Push data filters for file listing 2020-01-20 20:20:37 -08:00
docker [SPARK-28683][BUILD] Upgrade Scala to 2.12.10 2019-09-18 13:30:36 -07:00
docker-integration-tests [SPARK-28152][SQL][FOLLOWUP] Add a legacy conf for old MsSqlServerDialect numeric mapping 2020-01-12 23:03:34 -08:00
kafka-0-10 [SPARK-28144][SPARK-29294][SS][FOLLOWUP] Use SystemTime defined in Kafka Time interface 2019-12-24 11:39:03 +09:00
kafka-0-10-assembly [INFRA] Reverts commit 56dcd79 and c216ef1 2019-12-16 19:57:44 -07:00
kafka-0-10-sql [SPARK-30495][SS] Consider spark.security.credentials.kafka.enabled and cluster configuration when checking latest delegation token 2020-01-15 11:46:34 -08:00
kafka-0-10-token-provider [SPARK-30495][SS] Consider spark.security.credentials.kafka.enabled and cluster configuration when checking latest delegation token 2020-01-15 11:46:34 -08:00
kinesis-asl [SPARK-30272][SQL][CORE] Remove usage of Guava that breaks in 27; replace with workalikes 2019-12-20 08:55:04 -06:00
kinesis-asl-assembly [INFRA] Reverts commit 56dcd79 and c216ef1 2019-12-16 19:57:44 -07:00
spark-ganglia-lgpl [INFRA] Reverts commit 56dcd79 and c216ef1 2019-12-16 19:57:44 -07:00