spark-instrumented-optimizer

History

Guy Khazma 2d59ca464e [SPARK-30475][SQL] File source V2: Push data filters for file listing ### What changes were proposed in this pull request? Follow up on [SPARK-30428](https://github.com/apache/spark/pull/27112) which added support for partition pruning in File source V2. This PR implements the necessary changes in order to pass the `dataFilters` to the `listFiles`. This enables having `FileIndex` implementations which use the `dataFilters` for further pruning the file listing (see the discussion [here](https://github.com/apache/spark/pull/27112#discussion_r364757217)). ### Why are the changes needed? Datasources such as `csv` and `json` do not implement the `SupportsPushDownFilters` trait. In order to support data skipping uniformly for all file based data sources, one can override the `listFiles` method in a `FileIndex` implementation, which consults external metadata and prunes the list of files. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Modifying the unit tests for v2 file sources to verify the `dataFilters` are passed Closes #27157 from guykhazma/PushdataFiltersInFileListing. Authored-by: Guy Khazma <guykhag@gmail.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>		2020-01-20 20:20:37 -08:00
..
avro	[SPARK-30475][SQL] File source V2: Push data filters for file listing	2020-01-20 20:20:37 -08:00
docker	[SPARK-28683][BUILD] Upgrade Scala to 2.12.10	2019-09-18 13:30:36 -07:00
docker-integration-tests	[SPARK-28152][SQL][FOLLOWUP] Add a legacy conf for old MsSqlServerDialect numeric mapping	2020-01-12 23:03:34 -08:00
kafka-0-10	[SPARK-28144][SPARK-29294][SS][FOLLOWUP] Use SystemTime defined in Kafka Time interface	2019-12-24 11:39:03 +09:00
kafka-0-10-assembly	[INFRA] Reverts commit `56dcd79` and `c216ef1`	2019-12-16 19:57:44 -07:00
kafka-0-10-sql	[SPARK-30495][SS] Consider spark.security.credentials.kafka.enabled and cluster configuration when checking latest delegation token	2020-01-15 11:46:34 -08:00
kafka-0-10-token-provider	[SPARK-30495][SS] Consider spark.security.credentials.kafka.enabled and cluster configuration when checking latest delegation token	2020-01-15 11:46:34 -08:00
kinesis-asl	[SPARK-30272][SQL][CORE] Remove usage of Guava that breaks in 27; replace with workalikes	2019-12-20 08:55:04 -06:00
kinesis-asl-assembly	[INFRA] Reverts commit `56dcd79` and `c216ef1`	2019-12-16 19:57:44 -07:00
spark-ganglia-lgpl	[INFRA] Reverts commit `56dcd79` and `c216ef1`	2019-12-16 19:57:44 -07:00