spark-instrumented-optimizer/common
Wenchen Fan 4eb41879ce [SPARK-17528][SQL] data should be copied properly before saving into InternalRow
## What changes were proposed in this pull request?

For performance reasons, `UnsafeRow.getString`, `getStruct`, etc. return a "pointer" that points to a memory region of this unsafe row. This makes the unsafe projection a little dangerous, because all of its output rows share one instance.

When we implement SQL operators, we should be careful to not cache the input rows because they may be produced by unsafe projection from child operator and thus its content may change overtime.

However, when we updating values of InternalRow(e.g. in mutable projection and safe projection), we only copy UTF8String, we should also copy InternalRow, ArrayData and MapData. This PR fixes this, and also fixes the copy of vairous InternalRow, ArrayData and MapData implementations.

## How was this patch tested?

new regression tests

Author: Wenchen Fan <wenchen@databricks.com>

Closes #18483 from cloud-fan/fix-copy.
2017-07-01 09:25:29 +08:00
..
kvstore [MINOR][BUILD] Fix Java linter errors 2017-06-19 20:17:54 +01:00
network-common [SPARK-21253][CORE][HOTFIX] Fix Scala 2.10 build 2017-06-29 20:56:37 -07:00
network-shuffle [SPARK-20640][CORE] Make rpc timeout and retry for shuffle registration configurable. 2017-06-21 21:54:29 +08:00
network-yarn [SPARK-20756][YARN] yarn-shuffle jar references unshaded guava 2017-05-22 10:10:41 -07:00
sketch [SPARK-20453] Bump master branch version to 2.3.0-SNAPSHOT 2017-04-24 21:48:04 -07:00
tags [SPARK-20453] Bump master branch version to 2.3.0-SNAPSHOT 2017-04-24 21:48:04 -07:00
unsafe [SPARK-17528][SQL] data should be copied properly before saving into InternalRow 2017-07-01 09:25:29 +08:00