spark-instrumented-optimizer/dev
hyukjinkwon 25a020be99
[SPARK-17583][SQL] Remove uesless rowSeparator variable and set auto-expanding buffer as default for maxCharsPerColumn option in CSV
## What changes were proposed in this pull request?

This PR includes the changes below:

1. Upgrade Univocity library from 2.1.1 to 2.2.1

  This includes some performance improvement and also enabling auto-extending buffer in `maxCharsPerColumn` option in CSV. Please refer the [release notes](https://github.com/uniVocity/univocity-parsers/releases).

2. Remove useless `rowSeparator` variable existing in `CSVOptions`

  We have this unused variable in [CSVOptions.scala#L127](29952ed096/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala (L127)) but it seems possibly causing confusion that it actually does not care of `\r\n`. For example, we have an issue open about this, [SPARK-17227](https://issues.apache.org/jira/browse/SPARK-17227), describing this variable.

  This variable is virtually not being used because we rely on `LineRecordReader` in Hadoop which deals with only both `\n` and `\r\n`.

3. Set the default value of `maxCharsPerColumn` to auto-expending.

  We are setting 1000000 for the length of each column. It'd be more sensible we allow auto-expending rather than fixed length by default.

  To make sure, using `-1` is being described in the release note, [2.2.0](https://github.com/uniVocity/univocity-parsers/releases/tag/v2.2.0).

## How was this patch tested?

N/A

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #15138 from HyukjinKwon/SPARK-17583.
2016-09-21 10:35:29 +01:00
..
create-release [SPARK-16967] move mesos to module 2016-08-26 12:25:22 -07:00
deps [SPARK-17583][SQL] Remove uesless rowSeparator variable and set auto-expanding buffer as default for maxCharsPerColumn option in CSV 2016-09-21 10:35:29 +01:00
sparktestsupport [SPARK-17329][BUILD] Don't build PRs with -Pyarn unless YARN code changed 2016-09-01 09:10:01 +01:00
tests [SPARK-10359] Enumerate dependencies in a file and diff against it for new pull requests 2015-12-30 12:47:42 -08:00
.gitignore [SPARK-6219] Reuse pep8.py 2015-04-18 16:46:28 -07:00
.rat-excludes [SPARK-17303] Added spark-warehouse to dev/.rat-excludes 2016-08-29 23:33:00 -07:00
appveyor-guide.md [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate building and testing on Windows (currently SparkR only) 2016-09-08 08:26:59 -07:00
appveyor-install-dependencies.ps1 [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate building and testing on Windows (currently SparkR only) 2016-09-08 08:26:59 -07:00
change-scala-version.sh [SPARK-9250] Make change-scala-version more helpful w.r.t. valid Scala versions 2015-07-24 17:09:33 +01:00
change-version-to-2.10.sh [SPARK-9304] [BUILD] Improve backwards compatibility of SPARK-8401 2015-07-25 11:05:08 +01:00
change-version-to-2.11.sh [SPARK-9304] [BUILD] Improve backwards compatibility of SPARK-8401 2015-07-25 11:05:08 +01:00
check-license [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs 2016-03-07 14:48:02 -08:00
checkstyle-suppressions.xml [MINOR] Fix Java Lint errors introduced by #13286 and #13280 2016-06-08 14:51:00 +01:00
checkstyle.xml [SPARK-14868][BUILD] Enable NewLineAtEofChecker in checkstyle and fix lint-java errors 2016-04-24 20:40:03 -07:00
github_jira_sync.py Fix install jira-python 2015-05-23 09:14:07 -07:00
lint-java [SPARK-16967] move mesos to module 2016-08-26 12:25:22 -07:00
lint-python [SPARK-13887][PYTHON][TRIVIAL][BUILD] Make lint-python script fail fast 2016-03-25 12:53:34 +00:00
lint-r [SPARK-10328] [SPARKR] Fix generic for na.omit 2015-08-28 00:37:50 -07:00
lint-r.R [SPARK-14074][SPARKR] Specify commit sha1 ID when using install_github to install intr package. 2016-03-23 07:57:03 -07:00
lint-scala [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
make-distribution.sh [SPARK-15821][DOCS] Include parallel build info 2016-06-14 13:59:01 +01:00
merge_spark_pr.py [SPARK-9383][PROJECT-INFRA] PR merge script should reset back to previous branch when possible 2016-01-13 11:56:30 -08:00
mima [SPARK-16967] move mesos to module 2016-08-26 12:25:22 -07:00
README.md Merge pull request #565 from pwendell/dev-scripts. Closes #565. 2014-02-08 23:13:34 -08:00
requirements.txt [SPARK-10498][TOOLS][BUILD] Add requirements.txt file for dev python tools 2016-01-24 11:48:28 -08:00
run-tests [SPARK-5161] Parallelize Python test execution 2015-06-29 21:32:40 -07:00
run-tests-jenkins [SPARK-7018][BUILD] Refactor dev/run-tests-jenkins into Python 2015-10-18 22:45:27 -07:00
run-tests-jenkins.py [SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile 2016-01-15 17:07:24 -08:00
run-tests.py [SPARK-17329][BUILD] Don't build PRs with -Pyarn unless YARN code changed 2016-09-01 09:10:01 +01:00
scalastyle [SPARK-16967] move mesos to module 2016-08-26 12:25:22 -07:00
test-dependencies.sh [SPARK-16967] move mesos to module 2016-08-26 12:25:22 -07:00
tox.ini [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs 2016-03-07 14:48:02 -08:00

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.