spark-instrumented-optimizer

History

Dilip Biswal e4499932da [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node ### What changes were proposed in this pull request? Improve the EXPLAIN FORMATTED output of DSV2 Scan nodes (file based ones). Before ``` == Physical Plan == * Project (4) +- * Filter (3) +- * ColumnarToRow (2) +- BatchScan (1) (1) BatchScan Output [2]: [value#7, id#8] Arguments: [value#7, id#8], ParquetScan(org.apache.spark.sql.test.TestSparkSession17477bbb,Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml,org.apache.spark.sql.execution.datasources.InMemoryFileIndexa6c363ce,StructType(StructField(value,IntegerType,true)),StructType(StructField(value,IntegerType,true)),StructType(StructField(id,IntegerType,true)),[Lorg.apache.spark.sql.sources.Filter;40fee459,org.apache.spark.sql.util.CaseInsensitiveStringMapfeca1ec6,Vector(isnotnull(id#8), (id#8 > 1)),List(isnotnull(value#7), (value#7 > 2))) (2) ... (3) ... (4) ... ``` After ``` == Physical Plan == * Project (4) +- * Filter (3) +- * ColumnarToRow (2) +- BatchScan (1) (1) BatchScan Output [2]: [value#7, id#8] DataFilters: [isnotnull(value#7), (value#7 > 2)] Format: parquet Location: InMemoryFileIndex[....] PartitionFilters: [isnotnull(id#8), (id#8 > 1)] PushedFilers: [IsNotNull(id), IsNotNull(value), GreaterThan(id,1), GreaterThan(value,2)] ReadSchema: struct<value:int> (2) ... (3) ... (4) ... ``` ### Why are the changes needed? The old format is not very readable. This improves the readability of the plan. ### Does this PR introduce any user-facing change? Yes. the explain output will be different. ### How was this patch tested? Added a test case in ExplainSuite. Closes #28425 from dilipbiswal/dkb_dsv2_explain. Lead-authored-by: Dilip Biswal <dkbiswal@gmail.com> Co-authored-by: Dilip Biswal <dkbiswal@apache.org> Signed-off-by: Dilip Biswal <dkbiswal@apache.org>		2020-07-15 01:28:39 -07:00
..
avro	[SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node	2020-07-15 01:28:39 -07:00
docker	[SPARK-28683][BUILD] Upgrade Scala to 2.12.10	2019-09-18 13:30:36 -07:00
docker-integration-tests	[SPARK-32211][SQL] Pin mariadb-plugin-gssapi-server version to fix MariaDBKrbIntegrationSuite	2020-07-07 09:38:08 -07:00
kafka-0-10	[SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13	2020-07-14 02:06:50 -07:00
kafka-0-10-assembly	[SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT	2020-02-25 19:44:31 -08:00
kafka-0-10-sql	[SPARK-32033][SS][DSTEAMS] Use new poll API in Kafka connector executor side to avoid infinite wait	2020-06-19 14:46:26 -07:00
kafka-0-10-token-provider	[SPARK-30874][SQL] Support Postgres Kerberos login in JDBC connector	2020-03-12 19:04:35 -07:00
kinesis-asl	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5	2020-07-14 11:22:44 +09:00
kinesis-asl-assembly	[SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT	2020-02-25 19:44:31 -08:00
spark-ganglia-lgpl	[SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT	2020-02-25 19:44:31 -08:00