spark-instrumented-optimizer

History

Marco Gaido 7a8cc8e071 [SPARK-27607][SQL] Improve Row.toString performance ## What changes were proposed in this pull request? `Row.toString` is currently causing the useless creation of an `Array` containing all the values in the row before generating the string containing it. This operation adds a considerable overhead. The PR proposes to avoid this operation in order to get a faster implementation. ## How was this patch tested? Run ```scala test("Row toString perf test") { val n = 100000 val rows = (1 to n).map { i => Row(i, i.toDouble, i.toString, i.toShort, true, null) } // warmup (1 to 10).foreach { _ => rows.foreach(_.toString) } val times = (1 to 100).map { _ => val t0 = System.nanoTime() rows.foreach(_.toString) val t1 = System.nanoTime() t1 - t0 } // scalastyle:off println println(s"Avg time on ${times.length} iterations for $n toString:" + s" ${times.sum.toDouble / times.length / 1e6} ms") // scalastyle:on println } ``` Before the PR: ``` Avg time on 100 iterations for 100000 toString: 61.08408419 ms ``` After the PR: ``` Avg time on 100 iterations for 100000 toString: 38.16539432 ms ``` This means the new implementation is about 1.60X faster than the original one. Closes #24505 from mgaido91/SPARK-27607. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>		2019-05-02 07:20:33 -07:00
..
benchmarks	[SPARK-27535][SQL][TEST] Date and timestamp JSON benchmarks	2019-04-23 11:09:14 +09:00
src	[SPARK-27607][SQL] Improve Row.toString performance	2019-05-02 07:20:33 -07:00
v1.2.1/src	[SPARK-27182][SQL] Move the conflict source code of the sql/core module to sql/core/v1.2.1	2019-03-26 22:32:03 -07:00
v2.3.4/src	[SPARK-27176][SQL] Upgrade hadoop-3's built-in Hive maven dependencies to 2.3.4	2019-04-08 08:42:21 -07:00
pom.xml	[SPARK-27182][SQL] Move the conflict source code of the sql/core module to sql/core/v1.2.1	2019-03-26 22:32:03 -07:00