spark-instrumented-optimizer/core
Cheng Su 2da42ca3b4 [SPARK-33298][CORE] Introduce new API to FileCommitProtocol allow flexible file naming
### What changes were proposed in this pull request?

This PR is to introduce a new sets of APIs `newTaskTempFile` and `newTaskTempFileAbsPath` inside `FileCommitProtocol`, to allow more flexible file naming of Spark output. The major change is to pass `FileNameSpec` into `FileCommitProtocol`, instead of original `ext` (currently having `prefix` and `ext`), to allow individual `FileCommitProtocol` implementation comes up with more flexible file names (e.g. has a custom `prefix`) for Hive/Presto bucketing - https://github.com/apache/spark/pull/30003. Provide a default implementations of the added APIs, so all existing implementation of `FileCommitProtocol` is NOT being broken.

### Why are the changes needed?

To make commit protocol more flexible in terms of Spark output file name.
Pre-requisite of https://github.com/apache/spark/pull/30003.

### Does this PR introduce _any_ user-facing change?

Yes for developers  who implement/run custom implementation of `FileCommitProtocol`. They can choose to implement for the newly added API.

### How was this patch tested?

Existing unit tests as this is just adding an API.

Closes #33012 from c21/commit-protocol-api.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-06-24 17:10:54 -07:00
..
benchmarks [SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2 2021-06-17 11:06:50 -07:00
src [SPARK-33298][CORE] Introduce new API to FileCommitProtocol allow flexible file naming 2021-06-24 17:10:54 -07:00
pom.xml [SPARK-34688][PYTHON] Upgrade to Py4J 0.10.9.2 2021-03-11 09:51:41 -06:00