spark-instrumented-optimizer/sql/hive
Tathagata Das 0fd3a47484 [SPARK-15103][SQL] Refactored FileCatalog class to allow StreamFileCatalog to infer partitioning
## What changes were proposed in this pull request?

File Stream Sink writes the list of written files in a metadata log. StreamFileCatalog reads the list of the files for processing. However StreamFileCatalog does not infer partitioning like HDFSFileCatalog.

This PR enables that by refactoring HDFSFileCatalog to create an abstract class PartitioningAwareFileCatalog, that has all the functionality to infer partitions from a list of leaf files.
- HDFSFileCatalog has been renamed to ListingFileCatalog and it extends PartitioningAwareFileCatalog by providing a list of leaf files from recursive directory scanning.
- StreamFileCatalog has been renamed to MetadataLogFileCatalog and it extends PartitioningAwareFileCatalog by providing a list of leaf files from the metadata log.
- The above two classes has been moved into their own files as they are not interfaces that should be in fileSourceInterfaces.scala.

## How was this patch tested?
- FileStreamSinkSuite was update to see if partitioning gets inferred, and on reading whether the partitions get pruned correctly based on the query.
- Other unit tests are unchanged and pass as expected.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #12879 from tdas/SPARK-15103.
2016-05-04 11:02:48 -07:00
..
compatibility/src/test/scala/org/apache/spark/sql/hive/execution [SPARK-14414][SQL] Make DDL exceptions more consistent 2016-05-03 18:07:53 -07:00
src [SPARK-15103][SQL] Refactored FileCatalog class to allow StreamFileCatalog to infer partitioning 2016-05-04 11:02:48 -07:00
pom.xml Revert "[SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local" 2016-04-28 19:57:41 -07:00