spark-instrumented-optimizer/sql
Cheng Su 6d188cbb08 [SPARK-36272][SQL][TEST] Change shuffled hash join metrics test to check relative value of build size
### What changes were proposed in this pull request?

This is a follow up of https://github.com/apache/spark/pull/33447, where the unit test is disabled, due to failure after memory setting changed. I found the root cause is after https://github.com/apache/spark/pull/33447, in unit test, Spark memory page byte size is changed from `67108864` to `33554432` [1]. So the shuffled hash join build size is also changed accordingly due to [memory page byte size change](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L457). Previously the unit test is checking the exact value of build size, so it no longer works. Here we change the unit test to verify the relative value of build size, and it should work.

[1]: I printed out the memory page byte size explicitly in unit test - `org.apache.spark.SparkException: chengsu pageSizeBytes: 33554432!` in https://github.com/c21/spark/runs/3186680616?check_suite_focus=true .

### Why are the changes needed?

Make previously disabled unit test work.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Changed unit test itself.

Closes #33494 from c21/test.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 6a8dd3229a)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-07-29 11:14:48 +09:00
..
catalyst [SPARK-36286][SQL] Block some invalid datetime string 2021-07-29 09:17:01 +08:00
core [SPARK-36272][SQL][TEST] Change shuffled hash join metrics test to check relative value of build size 2021-07-29 11:14:48 +09:00
hive [SPARK-36312][SQL] ParquetWriterSupport.setSchema should check inner field 2021-07-28 13:52:40 +08:00
hive-thriftserver [SPARK-36179][SQL] Support TimestampNTZType in SparkGetColumnsOperation 2021-07-20 09:49:25 +09:00
create-docs.sh [SPARK-34010][SQL][DODCS] Use python3 instead of python in SQL documentation build 2021-01-05 19:48:10 +09:00
gen-sql-api-docs.py [SPARK-34747][SQL][DOCS] Add virtual operators to the built-in function document 2021-03-19 10:19:26 +09:00
gen-sql-config-docs.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
gen-sql-functions-docs.py
mkdocs.yml
README.md

Spark SQL

This module provides support for executing relational queries expressed in either SQL or the DataFrame/Dataset API.

Spark SQL is broken up into four subprojects:

  • Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions.
  • Execution (sql/core) - A query planner / execution engine for translating Catalyst's logical query plans into Spark RDDs. This component also includes a new public interface, SQLContext, that allows users to execute SQL or LINQ statements against existing RDDs and Parquet files.
  • Hive Support (sql/hive) - Includes extensions that allow users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allow users to run queries that include Hive UDFs, UDAFs, and UDTFs.
  • HiveServer and CLI support (sql/hive-thriftserver) - Includes support for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server.

Running ./sql/create-docs.sh generates SQL documentation for built-in functions under sql/site, and SQL configuration documentation that gets included as part of configuration.md in the main docs directory.