3dd3a623f2
### What changes were proposed in this pull request? As a follow-up to #24830, this PR adds the `recursiveFileLookup` option to the Python DataFrameReader API. ### Why are the changes needed? This PR maintains Python feature parity with Scala. ### Does this PR introduce any user-facing change? Yes. Before this PR, you'd only be able to use this option as follows: ```python spark.read.option("recursiveFileLookup", True).text("test-data").show() ``` With this PR, you can reference the option from within the format-specific method: ```python spark.read.text("test-data", recursiveFileLookup=True).show() ``` This option now also shows up in the Python API docs. ### How was this patch tested? I tested this manually by creating the following directories with dummy data: ``` test-data ├── 1.txt └── nested └── 2.txt test-parquet ├── nested │ ├── _SUCCESS │ ├── part-00000-...-.parquet ├── _SUCCESS ├── part-00000-...-.parquet ``` I then ran the following tests and confirmed the output looked good: ```python spark.read.parquet("test-parquet", recursiveFileLookup=True).show() spark.read.text("test-data", recursiveFileLookup=True).show() spark.read.csv("test-data", recursiveFileLookup=True).show() ``` `python/pyspark/sql/tests/test_readwriter.py` seems pretty sparse. I'm happy to add my tests there, though it seems we have been deferring testing like this to the Scala side of things. Closes #26718 from nchammas/SPARK-27990-recursiveFileLookup-python. Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
avro | ||
tests | ||
__init__.py | ||
catalog.py | ||
cogroup.py | ||
column.py | ||
conf.py | ||
context.py | ||
dataframe.py | ||
functions.py | ||
group.py | ||
readwriter.py | ||
session.py | ||
streaming.py | ||
types.py | ||
udf.py | ||
utils.py | ||
window.py |