spark-instrumented-optimizer/python/pyspark/sql
Maxim Gekk 8e8d1177e6 [SPARK-26108][SQL] Support custom lineSep in CSV datasource
## What changes were proposed in this pull request?

In the PR,  I propose new options for CSV datasource - `lineSep` similar to Text and JSON datasource. The option allows to specify custom line separator of maximum length of 2 characters (because of a restriction in `uniVocity` parser). New option can be used in reading and writing CSV files.

## How was this patch tested?

Added a few tests with custom `lineSep` for enabled/disabled `multiLine` in read as well as tests in write. Also I added roundtrip tests.

Closes #23080 from MaxGekk/csv-line-sep.

Lead-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
2018-11-24 00:50:20 +09:00
..
tests [SPARK-26036][PYTHON] Break large tests.py files into smaller files 2018-11-15 12:30:52 +08:00
__init__.py [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark 2017-11-02 15:22:52 +01:00
catalog.py [SPARK-24665][PYSPARK][FOLLOWUP] Use SQLConf in PySpark to manage all sql configs 2018-08-17 10:18:08 +08:00
column.py [SPARK-23847][PYTHON][SQL] Add asc_nulls_first, asc_nulls_last to PySpark 2018-04-08 12:09:06 +08:00
conf.py [SPARK-23698][PYTHON] Resolve undefined names in Python 3 2018-08-22 10:06:59 -07:00
context.py [SPARK-25540][SQL][PYSPARK] Make HiveContext in PySpark behave as the same as Scala. 2018-09-27 09:51:20 +08:00
dataframe.py [SPARK-26024][SQL] Update documentation for repartitionByRange 2018-11-19 22:24:53 +08:00
functions.py [SPARK-26112][SQL] Update since versions of new built-in functions. 2018-11-19 22:18:20 +08:00
group.py [SPARK-24722][SQL] pivot() with Column type argument 2018-08-04 14:17:32 +08:00
readwriter.py [SPARK-26108][SQL] Support custom lineSep in CSV datasource 2018-11-24 00:50:20 +09:00
session.py [SPARK-25255][PYTHON] Add getActiveSession to SparkSession in PySpark 2018-10-26 09:40:13 -07:00
streaming.py [SPARK-26108][SQL] Support custom lineSep in CSV datasource 2018-11-24 00:50:20 +09:00
types.py [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4 2018-09-13 11:19:43 +08:00
udf.py [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement 2018-10-04 09:36:23 +08:00
utils.py [SPARK-24721][SQL] Exclude Python UDFs filters in FileSourceStrategy 2018-08-28 10:57:13 +08:00
window.py [SPARK-25842][SQL] Deprecate rangeBetween APIs introduced in SPARK-21608 2018-10-26 13:17:24 +08:00