spark-instrumented-optimizer

History

Wenchen Fan 52a801124f [SPARK-14554][SQL] disable whole stage codegen if there are too many input columns ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/12047/files#diff-94a1f59bcc9b6758c4ca874652437634R529, we may split field expressions codes in `CreateExternalRow` to support wide table. However, the whole stage codegen framework doesn't support it, because the input for expressions is not always the input row, but can be `CodeGenContext.currentVars`, which doesn't work well with `CodeGenContext.splitExpressions`. Actually we do have a check to guard against this cases, but it's incomplete, it only checks output fields. This PR improves the whole stage codegen support check, to disable it if there are too many input fields, so that we can avoid splitting field expressions codes in `CreateExternalRow` for whole stage codegen. TODO: Is it a better solution if we can make `CodeGenContext.currentVars` work well with `CodeGenContext.splitExpressions`? ## How was this patch tested? new test in DatasetSuite. Author: Wenchen Fan <wenchen@databricks.com> Closes #12322 from cloud-fan/codegen.	2016-04-11 22:58:35 -07:00
..
main	[SPARK-14554][SQL] disable whole stage codegen if there are too many input columns	2016-04-11 22:58:35 -07:00
test	[SPARK-14554][SQL] disable whole stage codegen if there are too many input columns	2016-04-11 22:58:35 -07:00

Wenchen Fan 52a801124f [SPARK-14554][SQL] disable whole stage codegen if there are too many input columns

## What changes were proposed in this pull request?

In https://github.com/apache/spark/pull/12047/files#diff-94a1f59bcc9b6758c4ca874652437634R529, we may split field expressions codes in `CreateExternalRow` to support wide table. However, the whole stage codegen framework doesn't support it, because the input for expressions is not always the input row, but can be `CodeGenContext.currentVars`, which doesn't work well with `CodeGenContext.splitExpressions`.

Actually we do have a check to guard against this cases, but it's incomplete, it only checks output fields.

This PR improves the whole stage codegen support check, to disable it if there are too many input fields, so that we can avoid splitting field expressions codes in `CreateExternalRow` for whole stage codegen.

TODO: Is it a better solution if we can make `CodeGenContext.currentVars` work well with `CodeGenContext.splitExpressions`?

## How was this patch tested?

new test in DatasetSuite.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #12322 from cloud-fan/codegen.

2016-04-11 22:58:35 -07:00

main

[SPARK-14554][SQL] disable whole stage codegen if there are too many input columns

2016-04-11 22:58:35 -07:00

test

[SPARK-14554][SQL] disable whole stage codegen if there are too many input columns

2016-04-11 22:58:35 -07:00