[SPARK-29978][SQL][TESTS] Check json_tuple does not truncate results

### What changes were proposed in this pull request?
I propose to add a test from the commit a936522113 for 2.4. I extended the test by a few more lengths of requested field to cover more code branches in Jackson Core. In particular, [the optimization](5eb8973f87/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala (L473-L476)) calls Jackson's method 42b8b56684/src/main/java/com/fasterxml/jackson/core/json/UTF8JsonGenerator.java (L742-L746) where the internal buffer size is **8000**. In this way:
- 2000 to check 2000+2000+2000 < 8000
- 2800 from the 2.4 commit. It covers the specific case: 42b8b56684/src/main/java/com/fasterxml/jackson/core/json/UTF8JsonGenerator.java (L746)
- 8000-1, 8000, 8000+1 are sizes around the size of the internal buffer
- 65535 to test an outstanding large field.

### Why are the changes needed?
To be sure that the current implementation and future versions of Spark don't have the bug fixed in 2.4.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By running `JsonFunctionsSuite`.

Closes #26613 from MaxGekk/json_tuple-test.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
Maxim Gekk 2019-11-21 09:59:31 +09:00 committed by HyukjinKwon
parent 06e203b856
commit e6b157cf70

View file

@ -644,4 +644,15 @@ class JsonFunctionsSuite extends QueryTest with SharedSparkSession {
to_json(struct($"t"), Map("timestampFormat" -> "yyyy-MM-dd HH:mm:ss.SSSSSS")))
checkAnswer(df, Row(s"""{"t":"$s"}"""))
}
test("json_tuple - do not truncate results") {
Seq(2000, 2800, 8000 - 1, 8000, 8000 + 1, 65535).foreach { len =>
val str = Array.tabulate(len)(_ => "a").mkString
val json_tuple_result = Seq(s"""{"test":"$str"}""").toDF("json")
.withColumn("result", json_tuple('json, "test"))
.select('result)
.as[String].head.length
assert(json_tuple_result === len)
}
}
}