```scala
df.selectExpr("abs(colA)", "colB")
df.filter("age > 21")
```
Author: Reynold Xin <rxin@databricks.com>
Closes#4348 from rxin/SPARK-5579 and squashes the following commits:
2baeef2 [Reynold Xin] Fix Python.
b416372 [Reynold Xin] [SPARK-5579][SQL][DataFrame] Support for project/filter using SQL expressions.
The previous #3732 is reverted due to some test failure.
Have fixed that.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4325 from adrian-wang/datenative and squashes the following commits:
096e20d [Daoyuan Wang] fix for mixed timezone
0ed0fdc [Daoyuan Wang] fix test data
a2fdd4e [Daoyuan Wang] getDate
c37832b [Daoyuan Wang] row to catalyst
f0005b1 [Daoyuan Wang] add date in sql parser and java type conversion
024c9a6 [Daoyuan Wang] clean some import order
d6715fc [Daoyuan Wang] refactoring Date as Primitive Int internally
374abd5 [Daoyuan Wang] spark native date type support
Add support for alias of udtfs, such as
```
select stack(2, key, value, key, value) as (a, b) from src limit 5;
select a, b from (select stack(2, key, value, key, value) as (a, b) from src) t limit 5
```
Author: wangfei <wangfei1@huawei.com>
Author: scwf <wangfei1@huawei.com>
Author: Fei Wang <wangfei1@huawei.com>
Closes#4186 from scwf/multi-alias-names and squashes the following commits:
c35e922 [wangfei] fix conflicts
adc8311 [wangfei] minor format fix
2783aed [wangfei] convert it to a Generate instead of leaving it inside of a Project clause
a87668a [wangfei] minor improvement
b25d9b3 [wangfei] resolve conflicts
d38f041 [wangfei] style fix
8cfcebf [wangfei] minor improvement
12a239e [wangfei] fix test case
050177f [wangfei] added extendedCheckRules
3d69329 [wangfei] added CheckMultiAlias to analyzer
324150d [wangfei] added multi alias node
74f5a81 [Fei Wang] imports order fix
5bc3f59 [scwf] style fix
3daec28 [scwf] support alias for udfs with multi output columns
SQL in HiveContext, should be case insensitive, however, the following query will fail.
```scala
udf.register("random0", () => { Math.random()})
assert(sql("SELECT RANDOM0() FROM src LIMIT 1").head().getDouble(0) >= 0.0)
```
Author: Cheng Hao <hao.cheng@intel.com>
Closes#4326 from chenghao-intel/udf_case_sensitive and squashes the following commits:
485cf66 [Cheng Hao] Support the case insensitive for UDF
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3820 from adrian-wang/parquettimestamp and squashes the following commits:
b1e2a0d [Daoyuan Wang] fix for nanos
4dadef1 [Daoyuan Wang] fix wrong read
93f438d [Daoyuan Wang] parquet timestamp support
1. Added Java-friendly version of the expression operators (i.e. gt, geq)
2. Added JavaDoc for most operators
3. Simplified expression operators by having only one version of the function (that accepts Any). Previously we had two methods for each expression operator, one accepting Any and another accepting Column.
4. agg function now accepts varargs of (String, String).
Author: Reynold Xin <rxin@databricks.com>
Closes#4332 from rxin/df-update and squashes the following commits:
ab0aa69 [Reynold Xin] Added Java friendly expression methods. Added JavaDoc. For each expression operator, have only one version of the function (that accepts Any). Previously we had two methods for each expression operator, one accepting Any and another accepting Column.
576d07a [Reynold Xin] random commit.
Author: Reynold Xin <rxin@databricks.com>
Closes#4327 from rxin/schemarddTypeAlias and squashes the following commits:
e5a8ff3 [Reynold Xin] [SPARK-5551][SQL] Create type alias for SchemaRDD for source backward compatibility
They were there mostly for code review and easier check of the API. I don't think they need to be there anymore.
Author: Reynold Xin <rxin@databricks.com>
Closes#4328 from rxin/remove-df-api and squashes the following commits:
723d600 [Reynold Xin] [SQL][DataFrame] Remove DataFrameApi and ColumnApi.
This PR aims to support `INSERT INTO/OVERWRITE TABLE tableName` and `CREATE TABLE tableName AS SELECT` for the data source API (partitioned tables are not supported).
In this PR, I am also adding the support of `IF NOT EXISTS` for our ddl parser. The current semantic of `IF NOT EXISTS` is explained as follows.
* For a `CREATE TEMPORARY TABLE` statement, it does not `IF NOT EXISTS` for now.
* For a `CREATE TABLE` statement (we are creating a metastore table), if there is an existing table having the same name ...
* when `IF NOT EXISTS` clause is used, we will do nothing.
* when `IF NOT EXISTS` clause is not used, the user will see an exception saying the table already exists.
TODOs:
- [x] CTAS support
- [x] Programmatic APIs
- [ ] Python API (another PR)
- [x] More unit tests
- [ ] Documents (another PR)
marmbrus liancheng rxin
Author: Yin Huai <yhuai@databricks.com>
Closes#4294 from yhuai/writeSupport and squashes the following commits:
3db1539 [Yin Huai] save does not take overwrite.
1c98881 [Yin Huai] Fix test.
142372a [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupport
34e1bfb [Yin Huai] Address comments.
1682ca6 [Yin Huai] Better support for CTAS statements.
e789d64 [Yin Huai] For the Scala API, let users to use tuples to provide options.
0128065 [Yin Huai] Short hand versions of save and load.
66ebd74 [Yin Huai] Formatting.
9203ec2 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupport
e5d29f2 [Yin Huai] Programmatic APIs.
1a719a5 [Yin Huai] CREATE TEMPORARY TABLE with IF NOT EXISTS is not allowed for now.
909924f [Yin Huai] Add saveAsTable for the data source API to DataFrame.
95a7c71 [Yin Huai] Fix bug when handling IF NOT EXISTS clause in a CREATE TEMPORARY TABLE statement.
d37b19c [Yin Huai] Cheng's comments.
fd6758c [Yin Huai] Use BeforeAndAfterAll.
7880891 [Yin Huai] Support CREATE TABLE AS SELECT STATEMENT and the IF NOT EXISTS clause.
cb85b05 [Yin Huai] Initial write support.
2f91354 [Yin Huai] Make INSERT OVERWRITE/INTO statements consistent between HiveQL and SqlParser.
This pull request contains a Spark SQL data source that can pull data from, and can put data into, a JDBC database.
I have tested both read and write support with H2, MySQL, and Postgres. It would surprise me if both read and write support worked flawlessly out-of-the-box for any other database; different databases have different names for different JDBC data types and different meanings for SQL types with the same name. However, this code is designed (see `DriverQuirks.scala`) to make it *relatively* painless to add support for another database by augmenting the type mapping contained in this PR.
Author: Tor Myklebust <tmyklebu@gmail.com>
Closes#4261 from tmyklebu/master and squashes the following commits:
cf167ce [Tor Myklebust] Work around other Java tests ruining TestSQLContext.
67893bf [Tor Myklebust] Move the jdbcRDD methods into SQLContext itself.
585f95b [Tor Myklebust] Dependencies go into the project's pom.xml.
829d5ba [Tor Myklebust] Merge branch 'master' of https://github.com/apache/spark
41647ef [Tor Myklebust] Hide a couple things that don't need to be public.
7318aea [Tor Myklebust] Fix scalastyle warnings.
a09eeac [Tor Myklebust] JDBC data source for Spark SQL.
176bb98 [Tor Myklebust] Add test deps for JDBC support.
1. Throw UnsupportedOperationException if a Column is not computable.
2. Perform eager analysis on DataFrame so we can catch errors when they happen (not when an action is run).
Author: Reynold Xin <rxin@databricks.com>
Author: Davies Liu <davies@databricks.com>
Closes#4296 from rxin/col-computability and squashes the following commits:
6527b86 [Reynold Xin] Merge pull request #8 from davies/col-computability
fd92bc7 [Reynold Xin] Merge branch 'master' into col-computability
f79034c [Davies Liu] fix python tests
5afe1ff [Reynold Xin] Fix scala test.
17f6bae [Reynold Xin] Various fixes.
b932e86 [Reynold Xin] Added eager analysis for error reporting.
e6f00b8 [Reynold Xin] [SQL][API] ComputableColumn vs IncomputableColumn
Author: Reynold Xin <rxin@databricks.com>
Closes#4313 from rxin/SPARK-5514 and squashes the following commits:
e34e91b [Reynold Xin] [SPARK-5514] DataFrame.collect should call executeCollect
override the MetastoreRelation's sameresult method only compare databasename and table name
because in previous :
cache table t1;
select count(*) from t1;
it will read data from memory but the sql below will not,instead it read from hdfs:
select count(*) from t1 t;
because cache data is keyed by logical plan and compare with sameResult ,so when table with alias the same table 's logicalplan is not the same logical plan with out alias so modify the sameresult method only compare databasename and table name
Author: seayi <405078363@qq.com>
Author: Michael Armbrust <michael@databricks.com>
Closes#3898 from seayi/branch-1.2 and squashes the following commits:
8f0c7d2 [seayi] Update CachedTableSuite.scala
a277120 [seayi] Update HiveMetastoreCatalog.scala
8d910aa [seayi] Update HiveMetastoreCatalog.scala
Store daysSinceEpoch as an Int value(4 bytes) to represent DateType, instead of using java.sql.Date(8 bytes as Long) in catalyst row. This ensures the same comparison behavior of Hive and Catalyst.
Subsumes #3381
I thinks there are already some tests in JavaSQLSuite, and for python it will not affect python's datetime class.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3732 from adrian-wang/datenative and squashes the following commits:
0ed0fdc [Daoyuan Wang] fix test data
a2fdd4e [Daoyuan Wang] getDate
c37832b [Daoyuan Wang] row to catalyst
f0005b1 [Daoyuan Wang] add date in sql parser and java type conversion
024c9a6 [Daoyuan Wang] clean some import order
d6715fc [Daoyuan Wang] refactoring Date as Primitive Int internally
374abd5 [Daoyuan Wang] spark native date type support
This pr adds the support of schema-less syntax, custom field delimiter and SerDe for HiveQL's transform.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#4014 from viirya/schema_less_trans and squashes the following commits:
ac2d1fe [Liang-Chi Hsieh] Refactor codes for comments.
a137933 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into schema_less_trans
aa10fbd [Liang-Chi Hsieh] Add Hive golden answer files again.
575f695 [Liang-Chi Hsieh] Add Hive golden answer files for new unit tests.
a422562 [Liang-Chi Hsieh] Use createQueryTest for unit tests and remove unnecessary imports.
ccb71e3 [Liang-Chi Hsieh] Refactor codes for comments.
37bd391 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into schema_less_trans
6000889 [Liang-Chi Hsieh] Wrap input and output schema into ScriptInputOutputSchema.
21727f7 [Liang-Chi Hsieh] Move schema-less output to proper place. Use multilines instead of a long line SQL.
9a6dc04 [Liang-Chi Hsieh] setRecordReaderID is introduced in 0.13.1, use reflection API to call it.
7a14f31 [Liang-Chi Hsieh] Fix bug.
799b5e1 [Liang-Chi Hsieh] Call getSerializedClass instead of using Text.
be2c3fc [Liang-Chi Hsieh] Fix style.
32d3046 [Liang-Chi Hsieh] Add SerDe support.
ab22f7b [Liang-Chi Hsieh] Fix style.
7a48e42 [Liang-Chi Hsieh] Add support of custom field delimiter.
b1729d9 [Liang-Chi Hsieh] Fix style.
ccee49e [Liang-Chi Hsieh] Add unit test.
f561c37 [Liang-Chi Hsieh] Add support of schema-less script transformation.
Not all Catalyst filter expressions can be converted to Parquet filter predicates. We should try to convert each individual predicate and then collect those convertible ones.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4255)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#4255 from liancheng/spark-5465 and squashes the following commits:
14ccd37 [Cheng Lian] Fixes filter push-down for Parquet data source
I'll add test case in #4040
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4057 from adrian-wang/coal and squashes the following commits:
4d0111a [Daoyuan Wang] address Yin's comments
c393e18 [Daoyuan Wang] fix rebase conflicts
e47c03a [Daoyuan Wang] add coalesce in parser
c74828d [Daoyuan Wang] cast types for coalesce
Support `comment` in create a table field.
__CREATE TEMPORARY TABLE people(name string `comment` "the name of a person")__
Author: OopsOutOfMemory <victorshengli@126.com>
Closes#3999 from OopsOutOfMemory/meta_comment and squashes the following commits:
39150d4 [OopsOutOfMemory] add comment and refine test suite
Simplify some codes related to DataFrame.
* Calling `toAttributes` instead of a `map`.
* Original `createDataFrame` creates the `StructType` and its attributes in a redundant way. Refactored it to create `StructType` and call `toAttributes` on it directly.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#4298 from viirya/refactor_df and squashes the following commits:
1d61c64 [Liang-Chi Hsieh] Revert it.
f36efb5 [Liang-Chi Hsieh] Relax the constraint of toDataFrame.
2c9f370 [Liang-Chi Hsieh] Just refactor DataFrame codes.
Author: kai <kaizeng@eecs.berkeley.edu>
Closes#4291 from kai-zeng/aggregate-fix and squashes the following commits:
78658ef [kai] remove redundant field "childOutput"
After the recent refactoring, convertToCatalyst in ScalaReflection does not recurse on Arrays. It should.
The test suite modification made the test fail before the fix in ScalaReflection. The fix makes the test suite succeed.
CC: marmbrus
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#4295 from jkbradley/SPARK-5504 and squashes the following commits:
6b7276d [Joseph K. Bradley] Fixed issue in ScalaReflection.convertToCatalyst with Arrays with non-primitive types. Modified test suite so it failed before the fix and works after the fix.
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes#4250 from ueshin/issues/SPARK-5457 and squashes the following commits:
3c05e59 [Takuya UESHIN] Remove parameter to use default value of ApproxCountDistinct.
faea19d [Takuya UESHIN] Use overload instead of default value for Java support.
d1cca38 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-5457
663d43d [Takuya UESHIN] Add missing DSL for ApproxCountDistinct.
This PR makes Star a trait, and provides two implementations: UnresolvedStar (used for *, tblName.*) and ResolvedStar (used for df("*")).
Author: Reynold Xin <rxin@databricks.com>
Closes#4283 from rxin/df-star and squashes the following commits:
c9cba3e [Reynold Xin] Removed mapFunction in UnresolvedStar.
1a3a1d7 [Reynold Xin] [SQL] Support df("*") to select all columns in a data frame.
This patch changes DataFrame's `apply()` method to use an analyzed query plan when resolving column names. This fixes a bug where `apply` would throw "invalid call to qualifiers on unresolved object" errors when called on DataFrames constructed via `SQLContext.sql()`.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#4282 from JoshRosen/SPARK-5462 and squashes the following commits:
b9e6da2 [Josh Rosen] [SPARK-5462] Use analyzed query plan in DataFrame.apply().
1. Added Dsl.column in case Dsl.col is shadowed.
2. Allow using String to specify the target data type in cast.
3. Support sorting on multiple columns using column names.
4. Added Java API test file.
Author: Reynold Xin <rxin@databricks.com>
Closes#4280 from rxin/dsl1 and squashes the following commits:
33ecb7a [Reynold Xin] Add the Java test.
d06540a [Reynold Xin] [SQL] DataFrame API improvements.
I believe that SPARK-4296 has been fixed by 3684fd21e1. I am adding tests based #3910 (change the udf to HiveUDF instead).
Author: Yin Huai <yhuai@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Closes#4010 from yhuai/SPARK-4296-yin and squashes the following commits:
6343800 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-4296-yin
6cfadd2 [Yin Huai] Actually, this issue has been fixed by 3684fd21e1.
d42b707 [Yin Huai] Update comment.
8b3a274 [Yin Huai] Since expressions in grouping expressions can have aliases, which can be used by the outer query block, revert this change.
443538d [Cheng Lian] Trims aliases when resolving and checking aggregate expressions
`select key, count( * ) from src group by key, 1` will get the wrong answer.
e.g. for this table
```
val testData2 =
TestSQLContext.sparkContext.parallelize(
TestData2(1, 1) ::
TestData2(1, 2) ::
TestData2(2, 1) ::
TestData2(2, 2) ::
TestData2(3, 1) ::
TestData2(3, 2) :: Nil, 2).toSchemaRDD
testData2.registerTempTable("testData2")
```
result of `SELECT a, count(1) FROM testData2 GROUP BY a, 1` is
```
[1,1]
[2,2]
[3,1]
```
Author: wangfei <wangfei1@huawei.com>
Closes#4169 from scwf/agg-bug and squashes the following commits:
05751db [wangfei] fix bugs when literal in agg grouping expressioons
now spark sql does not support star expression in udf, run the following sql by spark-sql will get error
```
select concat(*) from src
```
Author: wangfei <wangfei1@huawei.com>
Author: scwf <wangfei1@huawei.com>
Closes#4163 from scwf/udf-star and squashes the following commits:
9db7b39 [wangfei] addressed comments
da1da09 [scwf] minor fix
f87b5f9 [scwf] added test case
587bf7e [wangfei] compile fix
eb93c16 [wangfei] fix star resolve issue in udf
Enable parquet filter pushdown of castable types like short, byte that can be cast to integer
Author: Yash Datta <Yash.Datta@guavus.com>
Closes#4156 from saucam/filter_short and squashes the following commits:
a403979 [Yash Datta] SPARK-4786: Fix styling issues
d029866 [Yash Datta] SPARK-4786: Add test case
cb2e0d9 [Yash Datta] SPARK-4786: Parquet filter pushdown for castable types
...gs.
Parquet Converters allow developers to take advantage of dictionary encoding of column data to reduce Column Binary decoding.
The Spark PrimitiveConverter was not using that API and consequently for String columns that used dictionary compression repeated Binary to String conversions for the same String.
In measurements this could account for over 25% of entire query time.
For example a 500M row table split across 16 blocks was aggregated and summed in a litte under 30s before this change and a little under 20s after the change.
Author: Michael Davies <Michael.BellDavies@gmail.com>
Closes#4187 from MickDavies/SPARK-5309-2 and squashes the following commits:
327287e [Michael Davies] SPARK-5309: Add support for dictionaries in PrimitiveConverter for Strings.
33c002c [Michael Davies] SPARK-5309: Add support for dictionaries in PrimitiveConverter for Strings.
I found that running `HiveComparisonTest.createQueryTest` to generate Hive golden answer files on Hive 0.13.1 would throw KryoException. I am not sure if this can be reproduced by others. Since Hive 0.13.0, Kryo plan serialization is introduced to replace javaXML as default plan serialization format. This is a quick fix to set hive configuration to use javaXML serialization.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#4223 from viirya/fix_hivetest and squashes the following commits:
97a8760 [Liang-Chi Hsieh] Use javaXML plan serialization.
Turns out Scala does generate static methods for ones defined in a companion object. Finally no need to separate api.java.dsl and api.scala.dsl.
Author: Reynold Xin <rxin@databricks.com>
Closes#4276 from rxin/dsl and squashes the following commits:
30aa611 [Reynold Xin] Add all files.
1a9d215 [Reynold Xin] [SPARK-5445][SQL] Consolidate Java and Scala DSL static methods.
1. Added foreach, foreachPartition, flatMap to DataFrame.
2. Added col() in dsl.
3. Support renaming columns in toDataFrame.
4. Support type inference on arrays (in addition to Seq).
5. Updated mllib to use the new DSL.
Author: Reynold Xin <rxin@databricks.com>
Closes#4260 from rxin/sql-dsl-update and squashes the following commits:
73466c1 [Reynold Xin] Fixed LogisticRegression. Also added better error message for resolve.
fab3ccc [Reynold Xin] Bug fix.
d31fcd2 [Reynold Xin] Style fix.
62608c4 [Reynold Xin] [SQL] Various DataFrame DSL update.
Also removed the literal implicit transformation since it is pretty scary for API design. Instead, created a new lit method for creating literals. This doesn't break anything from a compatibility perspective because Literal was added two days ago.
Author: Reynold Xin <rxin@databricks.com>
Closes#4241 from rxin/df-docupdate and squashes the following commits:
c0f4810 [Reynold Xin] Fix Python merge conflict.
094c7d7 [Reynold Xin] Minor style fix. Reset Python tests.
3c89f4a [Reynold Xin] Package.
dfe6962 [Reynold Xin] Updated Python aggregate.
5dd4265 [Reynold Xin] Made dsl Java callable.
14b3c27 [Reynold Xin] Fix literal expression for symbols.
68b31cb [Reynold Xin] Literal.
4cfeb78 [Reynold Xin] [SPARK-5097][SQL] Address DataFrame code review feedback.
and
[SPARK-5448][SQL] Make CacheManager a concrete class and field in SQLContext
Author: Reynold Xin <rxin@databricks.com>
Closes#4242 from rxin/sqlCleanup and squashes the following commits:
e351cb2 [Reynold Xin] Fixed toDataFrame.
6545c42 [Reynold Xin] More changes.
728c017 [Reynold Xin] [SPARK-5447][SQL] Replaced reference to SchemaRDD with DataFrame.
Author: Reynold Xin <rxin@databricks.com>
Closes#4235 from rxin/df-tests1 and squashes the following commits:
f341db6 [Reynold Xin] [SPARK-5097][SQL] Test cases for DataFrame expressions.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution
This is a block issue for the CLI user, it impacts the existed hql scripts from Hive.
Author: Cheng Hao <hao.cheng@intel.com>
Closes#4003 from chenghao-intel/substitution and squashes the following commits:
bb41fd6 [Cheng Hao] revert the removed the implicit conversion
af7c31a [Cheng Hao] add hql variable substitution support
This PR removes the deprecated `ParquetQuerySuite`, renamed `ParquetQuerySuite2` to `ParquetQuerySuite`, and refactored changes introduced in #4115 to `ParquetFilterSuite` . It is a follow-up of #3644.
Notice that test cases in the old `ParquetQuerySuite` have already been well covered by other test suites introduced in #3644.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4116)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#4116 from liancheng/remove-deprecated-parquet-tests and squashes the following commits:
f73b8f9 [Cheng Lian] Removes deprecated Parquet test suite
* The `SqlLexical.allCaseVersions` will cause `StackOverflowException` if the key word is too long, the patch will fix that by normalizing all of the keywords in `SqlLexical`.
* And make a unified SparkSQLParser for sharing the common code.
Author: Cheng Hao <hao.cheng@intel.com>
Closes#3926 from chenghao-intel/long_keyword and squashes the following commits:
686660f [Cheng Hao] Support Long Keyword and Refactor the SQLParsers
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4040 from adrian-wang/coalesce and squashes the following commits:
0ac8e8f [Daoyuan Wang] add coalesce() in sql parser
JIRA: https://issues.apache.org/jira/browse/SPARK-5287
This PR only add `defaultSizeOf` to data types and make those internal type classes `protected[sql]`. I will use another PR to cleanup the type hierarchy of data types.
Author: Yin Huai <yhuai@databricks.com>
Closes#4081 from yhuai/SPARK-5287 and squashes the following commits:
90cec75 [Yin Huai] Update unit test.
e1c600c [Yin Huai] Make internal classes protected[sql].
7eaba68 [Yin Huai] Add `defaultSize` method to data types.
fd425e0 [Yin Huai] Add all native types to NativeType.defaultSizeOf.
This is a follow-up of #4090. The original deeply nested `reduceOption` code is hard to grasp.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4091)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#4091 from liancheng/refactor-boolean-simplification and squashes the following commits:
cd8860b [Cheng Lian] Improves `compareConditions` to handle more subtle cases
1bf3258 [Cheng Lian] Avoids converting predicate sets to lists
e833ca4 [Cheng Lian] Refactors deeply nested FP style code
Author: Reynold Xin <rxin@databricks.com>
Closes#4117 from rxin/catalyst-test-log4j and squashes the following commits:
8ad610b [Reynold Xin] [SQL][minor] Add a log4j file for catalyst test.
JIRA: https://issues.apache.org/jira/browse/SPARK-5286
Author: Yin Huai <yhuai@databricks.com>
Closes#4076 from yhuai/SPARK-5286 and squashes the following commits:
6b69ed1 [Yin Huai] Catch all exception when we try to uncache a query.