2da42ca3b4
### What changes were proposed in this pull request? This PR is to introduce a new sets of APIs `newTaskTempFile` and `newTaskTempFileAbsPath` inside `FileCommitProtocol`, to allow more flexible file naming of Spark output. The major change is to pass `FileNameSpec` into `FileCommitProtocol`, instead of original `ext` (currently having `prefix` and `ext`), to allow individual `FileCommitProtocol` implementation comes up with more flexible file names (e.g. has a custom `prefix`) for Hive/Presto bucketing - https://github.com/apache/spark/pull/30003. Provide a default implementations of the added APIs, so all existing implementation of `FileCommitProtocol` is NOT being broken. ### Why are the changes needed? To make commit protocol more flexible in terms of Spark output file name. Pre-requisite of https://github.com/apache/spark/pull/30003. ### Does this PR introduce _any_ user-facing change? Yes for developers who implement/run custom implementation of `FileCommitProtocol`. They can choose to implement for the newly added API. ### How was this patch tested? Existing unit tests as this is just adding an API. Closes #33012 from c21/commit-protocol-api. Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |