History

Jeff Evans 95de93b24e [SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read Updating univocity-parsers version to 2.8.3, which adds support for multiple character delimiters Moving univocity-parsers version to spark-parent pom dependencyManagement section Adding new utility method to build multi-char delimiter string, which delegates to existing one Adding tests for multiple character delimited CSV ### What changes were proposed in this pull request? Adds support for parsing CSV data using multiple-character delimiters. Existing logic for converting the input delimiter string to characters was kept and invoked in a loop. Project dependencies were updated to remove redundant declaration of `univocity-parsers` version, and also to change that version to the latest. ### Why are the changes needed? It is quite common for people to have delimited data, where the delimiter is not a single character, but rather a sequence of characters. Currently, it is difficult to handle such data in Spark (typically needs pre-processing). ### Does this PR introduce any user-facing change? Yes. Specifying the "delimiter" option for the DataFrame read, and providing more than one character, will no longer result in an exception. Instead, it will be converted as before and passed to the underlying library (Univocity), which has accepted multiple character delimiters since 2.8.0. ### How was this patch tested? The `CSVSuite` tests were confirmed passing (including new methods), and `sbt` tests for `sql` were executed. Closes #26027 from jeff303/SPARK-24540. Authored-by: Jeff Evans <jeffrey.wayne.evans@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>		2019-10-15 15:44:51 -05:00
..
create-release	[SPARK-28906][BUILD] Fix incorrect information in bin/spark-submit --version	2019-09-11 08:12:44 -05:00
deps	[SPARK-24540][SQL] Support for multiple character delimiter in Spark CSV read	2019-10-15 15:44:51 -05:00
sparktestsupport	[SPARK-27463][PYTHON][FOLLOW-UP] Run the tests of Cogrouped pandas UDF	2019-09-22 21:39:30 +09:00
tests	[MINOR] Fix typos in dev/* scripts.	2018-01-31 07:37:25 +09:00
.gitignore	[SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file.	2018-01-31 00:51:00 +09:00
.rat-excludes	[SPARK-27489][WEBUI] UI updates to show executor resource information	2019-09-04 09:45:44 +08:00
.scalafmt.conf	[SPARK-26177] Config change followup to [] Automated formatting for Scala code	2018-12-03 10:03:51 -06:00
appveyor-guide.md	[SPARK-26918][DOCS] All .md should have ASF license header	2019-03-30 19:49:45 -05:00
appveyor-install-dependencies.ps1	[SPARK-29159][BUILD] Increase ReservedCodeCacheSize to 1G	2019-09-19 00:24:15 -07:00
change-scala-version.sh	[SPARK-26132][BUILD][CORE] Remove support for Scala 2.11 in Spark 3.0.0	2019-03-25 10:46:42 -05:00
check-license	[MINOR][BUILD] Upgrade apache-rat to 0.13	2019-04-01 16:44:42 +09:00
checkstyle-suppressions.xml	[MINOR][BUILD] Update all checkstyle dtd to use "https://checkstyle.org "	2019-02-25 11:25:53 -08:00
checkstyle.xml	[SPARK-29470][BUILD] Update plugins to latest versions	2019-10-15 11:55:52 -07:00
github_jira_sync.py	[SPARK-27889][INFRA] Make development scripts under dev/ support Python 3	2019-08-09 18:55:48 +09:00
lint-java	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
lint-python	[BUILD] refactor dev/lint-python in to something readable	2018-11-20 12:38:40 -08:00
lint-r	[SPARK-10328] [SPARKR] Fix generic for na.omit	2015-08-28 00:37:50 -07:00
lint-r.R	[SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r	2017-10-01 18:42:45 +09:00
lint-scala	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
make-distribution.sh	[SPARK-29159][BUILD] Increase ReservedCodeCacheSize to 1G	2019-09-19 00:24:15 -07:00
merge_spark_pr.py	[MINOR][BUILD] Decode output of commands during merge script as UTF-8 consistently	2019-10-02 11:28:55 +09:00
mima	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
pip-sanity-check.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
README.md	Merge pull request #565 from pwendell/dev-scripts. Closes #565 .	2014-02-08 23:13:34 -08:00
requirements.txt	[SPARK-25270] lint-python: Add flake8 to find syntax errors and undefined names	2018-09-07 09:35:25 -07:00
run-pip-tests	Fix typos detected by github.com/client9/misspell	2018-08-11 21:23:36 -05:00
run-tests	[SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7	2017-10-22 02:22:35 +09:00
run-tests-jenkins	[MINOR] Fix typos in dev/* scripts.	2018-01-31 07:37:25 +09:00
run-tests-jenkins.py	[SPARK-28701][TEST-HADOOP3.2][TEST-JAVA11][K8S] adding java11 support for pull request builds	2019-08-27 00:48:01 +09:00
run-tests.py	[SPARK-28701][INFRA][FOLLOWUP] Fix the key error when looking in os.environ	2019-08-26 12:40:31 -07:00
sbt-checkstyle	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
scalafmt	[SPARK-26177] Automated formatting for Scala code	2018-11-29 08:54:31 -06:00
scalastyle	[SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles	2019-03-15 08:20:42 +09:00
test-dependencies.sh	[SPARK-29308][BUILD] Update deps in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2	2019-10-13 12:53:12 -05:00
tox.ini	[SPARK-23367][BUILD] Include python document style checking	2018-10-27 08:20:42 -05:00

README.md

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.