spark-instrumented-optimizer/python/pyspark/sql
iRakson 2f92ea0df4 [SPARK-31763][PYSPARK] Add inputFiles method in PySpark DataFrame Class
### What changes were proposed in this pull request?
Adds `inputFiles()` method to PySpark `DataFrame`. Using this, PySpark users can list all files constituting a `DataFrame`.

**Before changes:**

```
>>> spark.read.load("examples/src/main/resources/people.json", format="json").inputFiles()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/***/***/spark/python/pyspark/sql/dataframe.py", line 1388, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'inputFiles'
```

**After changes:**

```
>>> spark.read.load("examples/src/main/resources/people.json", format="json").inputFiles()
[u'file:///***/***/spark/examples/src/main/resources/people.json']
```

### Why are the changes needed?
This method is already supported for spark with scala and java.

### Does this PR introduce _any_ user-facing change?
Yes, Now users can list all files of a DataFrame using `inputFiles()`

### How was this patch tested?
UT added.

Closes #28652 from iRakson/SPARK-31763.

Authored-by: iRakson <raksonrakesh@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-05-28 09:52:08 +09:00
..
avro [SPARK-27506][SQL][FOLLOWUP] Use option avroSchema to specify an evolved schema in from_avro 2019-12-30 18:14:21 +09:00
pandas [SPARK-25351][SQL][PYTHON] Handle Pandas category type when converting from Python with Arrow 2020-05-27 17:27:29 -07:00
tests [SPARK-31763][PYSPARK] Add inputFiles method in PySpark DataFrame Class 2020-05-28 09:52:08 +09:00
__init__.py [SPARK-31088][SQL] Add back HiveContext and createExternalTable 2020-03-26 23:51:15 -07:00
catalog.py [SPARK-31088][SQL] Add back HiveContext and createExternalTable 2020-03-26 23:51:15 -07:00
column.py [SPARK-29664][PYTHON][SQL][FOLLOW-UP] Add deprecation warnings for getItem instead 2020-04-27 14:49:22 +09:00
conf.py [SPARK-23698][PYTHON] Resolve undefined names in Python 3 2018-08-22 10:06:59 -07:00
context.py [SPARK-31088][SQL] Add back HiveContext and createExternalTable 2020-03-26 23:51:15 -07:00
dataframe.py [SPARK-31763][PYSPARK] Add inputFiles method in PySpark DataFrame Class 2020-05-28 09:52:08 +09:00
functions.py [SPARK-31306][DOCS] update rand() function documentation to indicate exclusive upper bound 2020-03-31 15:16:17 +09:00
group.py [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package 2020-01-09 10:22:50 +09:00
readwriter.py [SPARK-31739][PYSPARK][DOCS][MINOR] Fix docstring syntax issues and misplaced space characters 2020-05-18 20:25:02 +09:00
session.py [SPARK-30856][SQL][PYSPARK] Fix SQLContext.getOrCreate() when SparkContext is restarted 2020-02-20 12:21:24 +09:00
streaming.py [SPARK-31739][PYSPARK][DOCS][MINOR] Fix docstring syntax issues and misplaced space characters 2020-05-18 20:25:02 +09:00
types.py [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields 2020-03-09 11:06:45 -07:00
udf.py [SPARK-30722][PYTHON][DOCS] Update documentation for Pandas UDF with Python type hints 2020-02-12 10:49:46 +09:00
utils.py [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package 2020-01-09 10:22:50 +09:00
window.py [SPARK-30188][SQL] Resolve the failed unit tests when enable AQE 2020-01-13 22:55:19 +08:00