spark-instrumented-optimizer/sql
Niranjan Artal 0d2ef3ae2b [SPARK-30300][SQL][WEB-UI] Fix updating the UI max value string when driver updates the same metric id as the tasks
### What changes were proposed in this pull request?

In this PR, For a given metrics id we are checking if the driver side accumulator's value is greater than max of all stages value. If it's true, then we are removing that entry from the Hashmap. By doing this, for this metrics, "driver" would be displayed on the UI(As the driver would have the maximum value)

### Why are the changes needed?

This PR fixes https://issues.apache.org/jira/browse/SPARK-30300. Currently driver's metric value is not compared while caluculating the max.

### Does this PR introduce any user-facing change?

For the metrics where driver's value is greater than max of all stages, this is the change.
Previous : (min, median, max (stageId 0( attemptId 1): taskId 2))
Now:   (min, median, max (driver))

### How was this patch tested?

Ran unit tests.

Closes #26941 from nartal1/SPARK-30300.

Authored-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
2019-12-20 07:29:28 -06:00
..
catalyst [SPARK-30301][SQL] Fix wrong results when datetimes as fields of complex types 2019-12-20 19:21:43 +08:00
core [SPARK-30300][SQL][WEB-UI] Fix updating the UI max value string when driver updates the same metric id as the tasks 2019-12-20 07:29:28 -06:00
hive [SPARK-30262][SQL] Avoid NumberFormatException when totalSize is empty 2019-12-18 15:12:32 -08:00
hive-thriftserver [INFRA] Reverts commit 56dcd79 and c216ef1 2019-12-16 19:57:44 -07:00
create-docs.sh [MINOR][DOCS] Minor doc fixes related with doc build and uses script dir in SQL doc gen script 2017-08-26 13:56:24 +09:00
gen-sql-markdown.py [SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc 2019-04-09 13:49:42 +08:00
mkdocs.yml [SPARK-21485][SQL][DOCS] Spark SQL documentation generation for built-in functions 2017-07-26 09:38:51 -07:00
README.md [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 2019-09-09 10:19:40 -05:00

Spark SQL

This module provides support for executing relational queries expressed in either SQL or the DataFrame/Dataset API.

Spark SQL is broken up into four subprojects:

  • Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions.
  • Execution (sql/core) - A query planner / execution engine for translating Catalyst's logical query plans into Spark RDDs. This component also includes a new public interface, SQLContext, that allows users to execute SQL or LINQ statements against existing RDDs and Parquet files.
  • Hive Support (sql/hive) - Includes extensions that allow users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allow users to run queries that include Hive UDFs, UDAFs, and UDTFs.
  • HiveServer and CLI support (sql/hive-thriftserver) - Includes support for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server.

Running ./sql/create-docs.sh generates SQL documentation for built-in functions under sql/site.