spark-instrumented-optimizer

History

iRakson 2f92ea0df4 [SPARK-31763][PYSPARK] Add `inputFiles` method in PySpark DataFrame Class ### What changes were proposed in this pull request? Adds `inputFiles()` method to PySpark `DataFrame`. Using this, PySpark users can list all files constituting a `DataFrame`. Before changes: ``` >>> spark.read.load("examples/src/main/resources/people.json", format="json").inputFiles() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/*//spark/python/pyspark/sql/dataframe.py", line 1388, in __getattr__ "'%s' object has no attribute '%s'" % (self.__class__.__name__, name)) AttributeError: 'DataFrame' object has no attribute 'inputFiles' ``` After changes:* ``` >>> spark.read.load("examples/src/main/resources/people.json", format="json").inputFiles() [u'file:///*/*/spark/examples/src/main/resources/people.json'] ``` ### Why are the changes needed? This method is already supported for spark with scala and java. ### Does this PR introduce _any_ user-facing change? Yes, Now users can list all files of a DataFrame using `inputFiles()` ### How was this patch tested? UT added. Closes #28652 from iRakson/SPARK-31763. Authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2020-05-28 09:52:08 +09:00
..
__init__.py	[SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files	2018-11-14 14:51:11 +08:00
test_arrow.py	[SPARK-25351][SQL][PYTHON] Handle Pandas category type when converting from Python with Arrow	2020-05-27 17:27:29 -07:00
test_catalog.py	[SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark	2019-06-24 09:58:17 +09:00
test_column.py	[SPARK-29664][PYTHON][SQL][FOLLOW-UP] Add deprecation warnings for getItem instead	2020-04-27 14:49:22 +09:00
test_conf.py	[SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark	2019-06-24 09:58:17 +09:00
test_context.py	[SPARK-31615][SQL] Pretty string output for sql method of RuntimeReplaceable expressions	2020-05-07 14:40:26 +09:00
test_dataframe.py	[SPARK-31763][PYSPARK] Add `inputFiles` method in PySpark DataFrame Class	2020-05-28 09:52:08 +09:00
test_datasources.py	[SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark	2019-06-24 09:58:17 +09:00
test_functions.py	[SPARK-30569][SQL][PYSPARK][SPARKR] Add percentile_approx DSL functions	2020-03-17 10:44:21 +09:00
test_group.py	[SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark	2019-06-24 09:58:17 +09:00
test_pandas_cogrouped_map.py	[SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types	2020-01-22 15:32:58 +09:00
test_pandas_grouped_map.py	[SPARK-30777][PYTHON][TESTS] Fix test failures for Pandas >= 1.0.0	2020-02-11 10:03:01 +09:00
test_pandas_map.py	[SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types	2020-01-22 15:32:58 +09:00
test_pandas_udf.py	[SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy	2020-02-18 20:39:50 +08:00
test_pandas_udf_grouped_agg.py	[SPARK-30921][PYSPARK] Predicates on python udf should not be pushdown through Aggregate	2020-04-06 09:36:20 +09:00
test_pandas_udf_scalar.py	[SPARK-25351][SQL][PYTHON] Handle Pandas category type when converting from Python with Arrow	2020-05-27 17:27:29 -07:00
test_pandas_udf_typehints.py	[SPARK-31287][PYTHON][SQL] Ignore type hints in groupby.(cogroup.)applyInPandas and mapInPandas	2020-03-29 13:59:18 +09:00
test_pandas_udf_window.py	[SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark	2019-06-24 09:58:17 +09:00
test_readwriter.py	[SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored	2019-07-18 13:37:59 +09:00
test_serde.py	[SPARK-29041][PYTHON] Allows createDataFrame to accept bytes as binary type	2019-09-12 08:52:25 +09:00
test_session.py	[SPARK-30856][SQL][PYSPARK] Fix SQLContext.getOrCreate() when SparkContext is restarted	2020-02-20 12:21:24 +09:00
test_streaming.py	[SPARK-28130][PYTHON] Print pretty messages for skipped tests when xmlrunner is available in PySpark	2019-06-24 09:58:17 +09:00
test_types.py	[SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy	2020-02-18 20:39:50 +08:00
test_udf.py	[SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types	2020-01-22 15:32:58 +09:00
test_utils.py	[SPARK-19926][PYSPARK] make captured exception from JVM side user friendly	2019-09-18 23:32:10 +09:00