spark-instrumented-optimizer/sql/hive
Steve Loughran e3536406ec [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible
## What changes were proposed in this pull request?

`BasicWriteTaskStatsTracker.getFileSize()` to catch `FileNotFoundException`, log  info and then return 0 as a file size.

This ensures that if a newly created file isn't visible due to the store not always having create consistency, the metric collection doesn't cause the failure.

## How was this patch tested?

New test suite included, `BasicWriteTaskStatsTrackerSuite`. This not only checks the resilience to missing files, but verifies the existing logic as to how file statistics are gathered.

Note that in the current implementation

1. if you call `Tracker..getFinalStats()` more than once, the file size count will increase by size of the last file. This could be fixed by clearing the filename field inside `getFinalStats()` itself.

2. If you pass in an empty or null string to `Tracker.newFile(path)` then IllegalArgumentException is raised, but only in `getFinalStats()`, rather than in `newFile`.  There's a test for this behaviour in the new suite, as it verifies that only FNFEs get swallowed.

Author: Steve Loughran <stevel@hortonworks.com>

Closes #18979 from steveloughran/cloud/SPARK-21762-missing-files-in-metrics.
2017-10-13 23:08:17 -07:00
..
compatibility/src/test/scala/org/apache/spark/sql/hive/execution [SPARK-21831][TEST] Remove spark.sql.hive.convertMetastoreOrc config in HiveCompatibilitySuite 2017-08-25 19:51:13 -07:00
src [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible 2017-10-13 23:08:17 -07:00
pom.xml [SPARK-21936][SQL] backward compatibility test framework for HiveExternalCatalog 2017-09-07 23:21:49 -07:00