spark-instrumented-optimizer

History

Sean Zhong a425a37a5d [SPARK-17426][SQL] Refactor `TreeNode.toJSON` to avoid OOM when converting unknown fields to JSON ## What changes were proposed in this pull request? This PR is a follow up of SPARK-17356. Current implementation of `TreeNode.toJSON` recursively converts all fields of TreeNode to JSON, even if the field is of type `Seq` or type Map. This may trigger out of memory exception in cases like: 1. the Seq or Map can be very big. Converting them to JSON may take huge memory, which may trigger out of memory error. 2. Some user space input may also be propagated to the Plan. The user space input can be of arbitrary type, and may also be self-referencing. Trying to print user space input to JSON may trigger out of memory error or stack overflow error. For a code example, please check the Jira description of SPARK-17426. In this PR, we refactor the `TreeNode.toJSON` so that we only convert a field to JSON string if the field is a safe type. ## How was this patch tested? Unit test. Author: Sean Zhong <seanzhong@databricks.com> Closes #14990 from clockfly/json_oom2.	2016-09-16 19:37:30 +08:00
..
main	[SPARK-17536][SQL] Minor performance improvement to JDBC batch inserts	2016-09-15 10:00:28 +01:00
test	[SPARK-17426][SQL] Refactor `TreeNode.toJSON` to avoid OOM when converting unknown fields to JSON	2016-09-16 19:37:30 +08:00

Sean Zhong a425a37a5d [SPARK-17426][SQL] Refactor TreeNode.toJSON to avoid OOM when converting unknown fields to JSON

## What changes were proposed in this pull request?

This PR is a follow up of SPARK-17356. Current implementation of `TreeNode.toJSON` recursively converts all fields of TreeNode to JSON, even if the field is of type `Seq` or type Map. This may trigger out of memory exception in cases like:

1. the Seq or Map can be very big. Converting them to JSON may take huge memory, which may trigger out of memory error.
2. Some user space input may also be propagated to the Plan. The user space input can be of arbitrary type, and may also be self-referencing. Trying to print user space input to JSON may trigger out of memory error or stack overflow error.

For a code example, please check the Jira description of SPARK-17426.

In this PR, we refactor the `TreeNode.toJSON` so that we only convert a field to JSON string if the field is a safe type.

## How was this patch tested?

Unit test.

Author: Sean Zhong <seanzhong@databricks.com>

Closes #14990 from clockfly/json_oom2.

2016-09-16 19:37:30 +08:00

main [SPARK-17536][SQL] Minor performance improvement to JDBC batch inserts 2016-09-15 10:00:28 +01:00

test [SPARK-17426][SQL] Refactor TreeNode.toJSON to avoid OOM when converting unknown fields to JSON 2016-09-16 19:37:30 +08:00