spark-instrumented-optimizer/python/pyspark/sql
Nicholas Chammas 3dd3a623f2 [SPARK-27990][SPARK-29903][PYTHON] Add recursiveFileLookup option to Python DataFrameReader
### What changes were proposed in this pull request?

As a follow-up to #24830, this PR adds the `recursiveFileLookup` option to the Python DataFrameReader API.

### Why are the changes needed?

This PR maintains Python feature parity with Scala.

### Does this PR introduce any user-facing change?

Yes.

Before this PR, you'd only be able to use this option as follows:

```python
spark.read.option("recursiveFileLookup", True).text("test-data").show()
```

With this PR, you can reference the option from within the format-specific method:

```python
spark.read.text("test-data", recursiveFileLookup=True).show()
```

This option now also shows up in the Python API docs.

### How was this patch tested?

I tested this manually by creating the following directories with dummy data:

```
test-data
├── 1.txt
└── nested
   └── 2.txt
test-parquet
├── nested
│  ├── _SUCCESS
│  ├── part-00000-...-.parquet
├── _SUCCESS
├── part-00000-...-.parquet
```

I then ran the following tests and confirmed the output looked good:

```python
spark.read.parquet("test-parquet", recursiveFileLookup=True).show()
spark.read.text("test-data", recursiveFileLookup=True).show()
spark.read.csv("test-data", recursiveFileLookup=True).show()
```

`python/pyspark/sql/tests/test_readwriter.py` seems pretty sparse. I'm happy to add my tests there, though it seems we have been deferring testing like this to the Scala side of things.

Closes #26718 from nchammas/SPARK-27990-recursiveFileLookup-python.

Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2019-12-04 10:10:30 +09:00
..
avro [SPARK-28698][SQL] Support user-specified output schema in to_avro 2019-08-13 20:52:16 +08:00
tests [SPARK-28978][ ] Support > 256 args to python udf 2019-11-08 19:19:14 -08:00
__init__.py [SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF 2019-09-30 22:25:35 +09:00
catalog.py [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 2019-09-09 10:19:40 -05:00
cogroup.py [SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF 2019-09-30 22:25:35 +09:00
column.py [SPARK-29664][PYTHON][SQL] Column.getItem behavior is not consistent with Scala 2019-11-01 12:25:48 +09:00
conf.py [SPARK-23698][PYTHON] Resolve undefined names in Python 3 2018-08-22 10:06:59 -07:00
context.py [MINOR][PYSPARK][DOCS] Fix typo in example documentation 2019-11-01 11:55:29 -07:00
dataframe.py [MINOR][PYSPARK][DOCS] Fix typo in example documentation 2019-11-01 11:55:29 -07:00
functions.py [SPARK-29126][PYSPARK][DOC] Pandas Cogroup udf usage guide 2019-10-31 10:41:57 +09:00
group.py [SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs 2019-09-17 17:13:50 -07:00
readwriter.py [SPARK-27990][SPARK-29903][PYTHON] Add recursiveFileLookup option to Python DataFrameReader 2019-12-04 10:10:30 +09:00
session.py [MINOR][PYSPARK][DOCS] Fix typo in example documentation 2019-11-01 11:55:29 -07:00
streaming.py [SPARK-27990][SPARK-29903][PYTHON] Add recursiveFileLookup option to Python DataFrameReader 2019-12-04 10:10:30 +09:00
types.py [SPARK-29798][PYTHON][SQL] Infers bytes as binary type in createDataFrame in Python 3 at PySpark 2019-11-08 12:10:39 -08:00
udf.py [SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs 2019-09-17 17:13:50 -07:00
utils.py [SPARK-29376][SQL][PYTHON] Upgrade Apache Arrow to version 0.15.1 2019-11-15 13:27:30 +09:00
window.py [SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated usages of Experimental, Evolving annotations 2019-09-01 10:15:00 -05:00