[SPARK-34768][SQL] Respect the default input buffer size in Univocity
### What changes were proposed in this pull request? This PR proposes to follow Univocity's input buffer. ### Why are the changes needed? - Firstly, it's best to trust their judgement on the default values. Also 128 is too low. - Default values arguably have more test coverage in Univocity. - It will also fix https://github.com/uniVocity/univocity-parsers/issues/449 - ^ is a regression compared to Spark 2.4 ### Does this PR introduce _any_ user-facing change? No. In addition, It fixes a regression. ### How was this patch tested? Manually tested, and added a unit test. Closes #31858 from HyukjinKwon/SPARK-34768. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
parent
1a4971d8a1
commit
385f1e8f5d
|
@ -166,8 +166,6 @@ class CSVOptions(
|
|||
|
||||
val quoteAll = getBool("quoteAll", false)
|
||||
|
||||
val inputBufferSize = 128
|
||||
|
||||
/**
|
||||
* The max error content length in CSV parser/writer exception message.
|
||||
*/
|
||||
|
@ -259,7 +257,6 @@ class CSVOptions(
|
|||
settings.setIgnoreLeadingWhitespaces(ignoreLeadingWhiteSpaceInRead)
|
||||
settings.setIgnoreTrailingWhitespaces(ignoreTrailingWhiteSpaceInRead)
|
||||
settings.setReadInputOnSeparateThread(false)
|
||||
settings.setInputBufferSize(inputBufferSize)
|
||||
settings.setMaxColumns(maxColumns)
|
||||
settings.setNullValue(nullValue)
|
||||
settings.setEmptyValue(emptyValueInRead)
|
||||
|
|
|
@ -2452,6 +2452,17 @@ abstract class CSVSuite
|
|||
assert(result.sameElements(exceptResults))
|
||||
}
|
||||
}
|
||||
|
||||
test("SPARK-34768: counting a long record with ignoreTrailingWhiteSpace set to true") {
|
||||
val bufSize = 128
|
||||
val line = "X" * (bufSize - 1) + "| |"
|
||||
withTempPath { path =>
|
||||
Seq(line).toDF.write.text(path.getAbsolutePath)
|
||||
assert(spark.read.format("csv")
|
||||
.option("delimiter", "|")
|
||||
.option("ignoreTrailingWhiteSpace", "true").load(path.getAbsolutePath).count() == 1)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
class CSVv1Suite extends CSVSuite {
|
||||
|
|
Loading…
Reference in a new issue