spark-instrumented-optimizer/sql/catalyst
Sameer Agarwal 138c300f97 [SPARK-12957][SQL] Initial support for constraint propagation in SparkSQL
Based on the semantics of a query, we can derive a number of data constraints on output of each (logical or physical) operator. For instance, if a filter defines `‘a > 10`, we know that the output data of this filter satisfies 2 constraints:

1. `‘a > 10`
2. `isNotNull(‘a)`

This PR proposes a possible way of keeping track of these constraints and propagating them in the logical plan, which can then help us build more advanced optimizations (such as pruning redundant filters, optimizing joins, among others). We define constraints as a set of (implicitly conjunctive) expressions. For e.g., if a filter operator has constraints = `Set(‘a > 10, ‘b < 100)`, it’s implied that the outputs satisfy both individual constraints (i.e., `‘a > 10` AND `‘b < 100`).

Design Document: https://docs.google.com/a/databricks.com/document/d/1WQRgDurUBV9Y6CWOBS75PQIqJwT-6WftVa18xzm7nCo/edit?usp=sharing

Author: Sameer Agarwal <sameer@databricks.com>

Closes #10844 from sameeragarwal/constraints.
2016-02-02 22:22:50 -08:00
..
src [SPARK-12957][SQL] Initial support for constraint propagation in SparkSQL 2016-02-02 22:22:50 -08:00
pom.xml [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version 2016-01-30 00:20:28 -08:00