Commit graph

1406 commits

Author SHA1 Message Date
Patrick Wendell 48d6830144 [BUILD] Preparing Spark release 1.4.1 2015-06-22 22:18:52 -07:00
Cheng Hao d73900a903 [SPARK-7859] [SQL] Collect_set() behavior differences which fails the unit test under jdk8
To reproduce that:
```
JAVA_HOME=/home/hcheng/Java/jdk1.8.0_45 | build/sbt -Phadoop-2.3 -Phive  'test-only org.apache.spark.sql.hive.execution.HiveWindowFunctionQueryWithoutCodeGenSuite'
```

A simple workaround to fix that is update the original query, for getting the output size instead of the exact elements of the array (output by collect_set())

Author: Cheng Hao <hao.cheng@intel.com>

Closes #6402 from chenghao-intel/windowing and squashes the following commits:

99312ad [Cheng Hao] add order by for the select clause
edf8ce3 [Cheng Hao] update the code as suggested
7062da7 [Cheng Hao] fix the collect_set() behaviour differences under different versions of JDK

(cherry picked from commit 13321e6555)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-06-22 20:05:00 -07:00
Michael Armbrust 65981619b2 [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings (branch-1.4)
This is branch 1.4 backport of https://github.com/apache/spark/pull/6888.

Below is the original description.

In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to `StringType` when it was involved in a binary comparison with a `StringType`.  This allowed comparing a timestamp with a partial date as a user would expect.
 - `time > "2014-06-10"`
 - `time > "2014"`

In 1.4.0 we tried to cast the String instead into a Timestamp.  However, since partial dates are not a valid complete timestamp this results in `null` which results in the tuple being filtered.

This PR restores the earlier behavior.  Note that we still special case equality so that these comparisons are not affected by not printing zeros for subsecond precision.

Author: Michael Armbrust <michaeldatabricks.com>

Closes #6888 from marmbrus/timeCompareString and squashes the following commits:

bdef29c [Michael Armbrust] test partial date
1f09adf [Michael Armbrust] special handling of equality
1172c60 [Michael Armbrust] more test fixing
4dfc412 [Michael Armbrust] fix tests
aaa9508 [Michael Armbrust] newline
04d908f [Michael Armbrust] [SPARK-8420][SQL] Fix comparision of timestamps/dates with strings

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

Author: Michael Armbrust <michael@databricks.com>

Closes #6914 from yhuai/timeCompareString-1.4 and squashes the following commits:

9882915 [Michael Armbrust] [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings
2015-06-22 10:45:33 -07:00
Cheng Lian 451c8722af [SPARK-8406] [SQL] Backports SPARK-8406 and PR #6864 to branch-1.4
Author: Cheng Lian <lian@databricks.com>

Closes #6932 from liancheng/spark-8406-for-1.4 and squashes the following commits:

a0168fe [Cheng Lian] Backports SPARK-8406 and PR #6864 to branch-1.4
2015-06-22 10:04:29 -07:00
jeanlyn f0e4040202 [SPARK-8379] [SQL] avoid speculative tasks write to the same file
The issue link [SPARK-8379](https://issues.apache.org/jira/browse/SPARK-8379)
Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception
```
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-10000/ds=2015-06-15/type=2/part-00301.lzo
owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53
but is accessed by DFSClient_attempt_201506031520_0011_m_000042_0_-1275047721_57
```
This pr try to write the data to temporary dir when using dynamic parition  avoid the speculative tasks writing the same file

Author: jeanlyn <jeanlyn92@gmail.com>

Closes #6833 from jeanlyn/speculation and squashes the following commits:

64bbfab [jeanlyn] use FileOutputFormat.getTaskOutputPath to get the path
8860af0 [jeanlyn] remove the never using code
e19a3bd [jeanlyn] avoid speculative tasks write same file

(cherry picked from commit a1e3649c87)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-06-21 00:13:55 -07:00
Andrew Or 9b16508d2c [HOTFIX] [SPARK-8489] Correct JIRA number in previous commit
It should be SPARK-8489, not SPARK-8498.
2015-06-19 17:40:21 -07:00
Andrew Or 2248ad8b70 [SPARK-8498] [SQL] Add regression test for SPARK-8470
**Summary of the problem in SPARK-8470.** When using `HiveContext` to create a data frame of a user case class, Spark throws `scala.reflect.internal.MissingRequirementError` when it tries to infer the schema using reflection. This is caused by `HiveContext` silently overwriting the context class loader containing the user classes.

**What this issue is about.** This issue adds regression tests for SPARK-8470, which is already fixed in #6891. We closed SPARK-8470 as a duplicate because it is a different manifestation of the same problem in SPARK-8368. Due to the complexity of the reproduction, this requires us to pre-package a special test jar and include it in the Spark project itself.

I tested this with and without the fix in #6891 and verified that it passes only if the fix is present.

Author: Andrew Or <andrew@databricks.com>

Closes #6909 from andrewor14/SPARK-8498 and squashes the following commits:

5e9d688 [Andrew Or] Add regression test for SPARK-8470

(cherry picked from commit 093c34838d)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-06-19 17:34:36 -07:00
Yin Huai 2510365faa [HOT-FIX] Fix compilation (caused by 0131142d98)
Author: Yin Huai <yhuai@databricks.com>

Closes #6913 from yhuai/branch-1.4-hotfix and squashes the following commits:

7f91fa0 [Yin Huai] [HOT-FIX] Fix compilation (caused by 0131142d98).
2015-06-19 17:29:51 -07:00
Nathan Howell 0131142d98 [SPARK-8093] [SQL] Remove empty structs inferred from JSON documents
Author: Nathan Howell <nhowell@godaddy.com>

Closes #6799 from NathanHowell/spark-8093 and squashes the following commits:

76ac3e8 [Nathan Howell] [SPARK-8093] [SQL] Remove empty structs inferred from JSON documents

(cherry picked from commit 9814b971f0)
Signed-off-by: Yin Huai <yhuai@databricks.com>

Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/json/TestJsonData.scala
2015-06-19 16:23:11 -07:00
Yin Huai 9ac8393663 [SPARK-8368] [SPARK-8058] [SQL] HiveContext may override the context class loader of the current thread (branch 1.4)
This is for 1.4 branch (based on https://github.com/apache/spark/pull/6891).

Author: Yin Huai <yhuai@databricks.com>

Closes #6895 from yhuai/SPARK-8368-1.4 and squashes the following commits:

adbbbc9 [Yin Huai] Minor update.
3cca0e9 [Yin Huai] Correctly set the class loader in the conf of the state in client wrapper.
b1e14a9 [Yin Huai] Failed tests.
2015-06-19 11:15:28 -07:00
Cheng Lian f48f3a2e2f [SPARK-8458] [SQL] Don't strip scheme part of output path when writing ORC files
`Path.toUri.getPath` strips scheme part of output path (from `file:///foo` to `/foo`), which causes ORC data source only writes to the file system configured in Hadoop configuration. Should use `Path.toString` instead.

Author: Cheng Lian <lian@databricks.com>

Closes #6892 from liancheng/spark-8458 and squashes the following commits:

87f8199 [Cheng Lian] Don't strip scheme of output path when writing ORC files

(cherry picked from commit a71cbbdea5)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-06-18 22:02:13 -07:00
Josh Rosen 152f4465d3 [SPARK-8446] [SQL] Add helper functions for testing SparkPlan physical operators
This patch introduces `SparkPlanTest`, a base class for unit tests of SparkPlan physical operators.  This is analogous to Spark SQL's existing `QueryTest`, which does something similar for end-to-end tests with actual queries.

These helper methods provide nicer error output when tests fail and help developers to avoid writing lots of boilerplate in order to execute manually constructed physical plans.

Author: Josh Rosen <joshrosen@databricks.com>
Author: Josh Rosen <rosenville@gmail.com>
Author: Michael Armbrust <michael@databricks.com>

Closes #6885 from JoshRosen/spark-plan-test and squashes the following commits:

f8ce275 [Josh Rosen] Fix some IntelliJ inspections and delete some dead code
84214be [Josh Rosen] Add an extra column which isn't part of the sort
ae1896b [Josh Rosen] Provide implicits automatically
a80f9b0 [Josh Rosen] Merge pull request #4 from marmbrus/pr/6885
d9ab1e4 [Michael Armbrust] Add simple resolver
c60a44d [Josh Rosen] Manually bind references
996332a [Josh Rosen] Add types so that tests compile
a46144a [Josh Rosen] WIP

(cherry picked from commit 207a98ca59)
Signed-off-by: Michael Armbrust <michael@databricks.com>
2015-06-18 16:45:27 -07:00
Yin Huai 73cf5def06 [SPARK-8306] [SQL] AddJar command needs to set the new class loader to the HiveConf inside executionHive.state.
https://issues.apache.org/jira/browse/SPARK-8306

I will try to add a test later.

marmbrus aarondav

Author: Yin Huai <yhuai@databricks.com>

Closes #6758 from yhuai/SPARK-8306 and squashes the following commits:

1292346 [Yin Huai] [SPARK-8306] AddJar command needs to set the new class loader to the HiveConf inside executionHive.state.

(cherry picked from commit 302556ff99)
Signed-off-by: Michael Armbrust <michael@databricks.com>

Conflicts:
	sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala
2015-06-17 15:14:42 -07:00
Punya Biswal 877deb0468 Fix break introduced by backport
rxin this is the fix you requested for the break introduced by backporting #6793

Author: Punya Biswal <pbiswal@palantir.com>

Closes #6850 from punya/feature/fix-backport-break and squashes the following commits:

fdc3693 [Punya Biswal] Fix break introduced by backport
2015-06-16 22:31:49 -07:00
Radek Ostrowski 4da0686508 [SQL] [DOC] improved a comment
[SQL][DOC] I found it a bit confusing when I came across it for the first time in the docs

Author: Radek Ostrowski <dest.hawaii@gmail.com>
Author: radek <radek@radeks-MacBook-Pro-2.local>

Closes #6332 from radek1st/master and squashes the following commits:

dae3347 [Radek Ostrowski] fixed typo
c76bb3a [radek] improved a comment

(cherry picked from commit 4bd10fd509)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-06-16 21:04:45 +01:00
tedyu fff8d7ee6c SPARK-8336 Fix NullPointerException with functions.rand()
This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()'

Tested using spark-shell and verified that the following works:
sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show()

Author: tedyu <yuzhihong@gmail.com>

Closes #6793 from tedyu/master and squashes the following commits:

62fd97b [tedyu] Create RandomSuite
750f92c [tedyu] Add test for Rand() with seed
a1d66c5 [tedyu] Fix NullPointerException with functions.rand()

(cherry picked from commit 1a62d61696)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-15 17:00:43 -07:00
Michael Armbrust 2805d145e3 [SPARK-8358] [SQL] Wait for child resolution when resolving generators
Author: Michael Armbrust <michael@databricks.com>

Closes #6811 from marmbrus/aliasExplodeStar and squashes the following commits:

fbd2065 [Michael Armbrust] more style
806a373 [Michael Armbrust] fix style
7cbb530 [Michael Armbrust] [SPARK-8358][SQL] Wait for child resolution when resolving generatorsa

(cherry picked from commit 9073a426e4)
Signed-off-by: Michael Armbrust <michael@databricks.com>
2015-06-14 11:21:55 -07:00
Josh Rosen 4634be5a7d [SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch space in UnsafeFixedWidthAggregationMap
UnsafeFixedWidthAggregationMap contains an off-by-factor-of-8 error when allocating row conversion scratch space: we take a size requirement, measured in bytes, then allocate a long array of that size.  This means that we end up allocating 8x too much conversion space.

This patch fixes this by allocating a `byte[]` array instead.  This doesn't impose any new limitations on the maximum sizes of UnsafeRows, since UnsafeRowConverter already used integers when calculating the size requirements for rows.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6809 from JoshRosen/sql-bytes-vs-words-fix and squashes the following commits:

6520339 [Josh Rosen] Updates to reflect fact that UnsafeRow max size is constrained by max byte[] size

(cherry picked from commit ea7fd2ff64)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
2015-06-14 09:41:01 -07:00
Michael Armbrust 1ca431e83f [SPARK-8329][SQL] Allow _ in DataSource options
Author: Michael Armbrust <michael@databricks.com>

Closes #6786 from marmbrus/optionsParser and squashes the following commits:

e7d18ef [Michael Armbrust] add dots
99a3452 [Michael Armbrust] [SPARK-8329][SQL] Allow _ in DataSource options

(cherry picked from commit 4aed66f299)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-12 23:11:25 -07:00
navis.ryu 5c05b5c0d2 [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first
case cs  CombineSum(expr) =>
        val calcType = expr.dataType
          expr.dataType match {
            case DecimalType.Fixed(_, _) =>
              DecimalType.Unlimited
            case _ =>
              expr.dataType
          }
calcType is always expr.dataType. credits are all belong to IntelliJ

Author: navis.ryu <navis@apache.org>

Closes #6736 from navis/SPARK-8285 and squashes the following commits:

20382c1 [navis.ryu] [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first

(cherry picked from commit 6a47114bc2)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-10 18:19:24 -07:00
Cheng Lian 69197c3e38 [SPARK-8121] [SQL] Fixes InsertIntoHadoopFsRelation job initialization for Hadoop 1.x (branch 1.4 backport based on https://github.com/apache/spark/pull/6669) 2015-06-08 11:36:42 -07:00
Reynold Xin b9c046f6d7 [SPARK-8004][SQL] Quote identifier in JDBC data source.
This is a follow-up patch to #6577 to replace columnEnclosing to quoteIdentifier.

I also did some minor cleanup to the JdbcDialect file.

Author: Reynold Xin <rxin@databricks.com>

Closes #6689 from rxin/jdbc-quote and squashes the following commits:

bad365f [Reynold Xin] Fixed test compilation...
e39e14e [Reynold Xin] Fixed compilation.
db9a8e0 [Reynold Xin] [SPARK-8004][SQL] Quote identifier in JDBC data source.

(cherry picked from commit d6d601a07b)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-07 10:52:18 -07:00
Liang-Chi Hsieh b4d54417e5 [SPARK-8141] [SQL] Precompute datatypes for partition columns and reuse it
JIRA: https://issues.apache.org/jira/browse/SPARK-8141

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6687 from viirya/reuse_partition_column_types and squashes the following commits:

dab0688 [Liang-Chi Hsieh] Reuse partitionColumnTypes.

(cherry picked from commit 26d07f1ece)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-06-07 15:35:43 +08:00
Liang-Chi Hsieh b6fdc6cf11 [SPARK-8004][SQL] Enclose column names by JDBC Dialect
JIRA: https://issues.apache.org/jira/browse/SPARK-8004

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6577 from viirya/enclose_jdbc_columns and squashes the following commits:

614606a [Liang-Chi Hsieh] For comment.
bc50182 [Liang-Chi Hsieh] Enclose column names by JDBC Dialect.

(cherry picked from commit 901a552c5e)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-06 23:00:18 -07:00
Cheng Lian d8a53fb806 [SPARK-8079] [SQL] Makes InsertIntoHadoopFsRelation job/task abortion more robust
As described in SPARK-8079, when writing a DataFrame to a `HadoopFsRelation`, if `HadoopFsRelation.prepareForWriteJob` throws exception, an unexpected NPE will be thrown during job abortion. (This issue doesn't bring much damage since the job is failing anyway.)

This PR makes the job/task abortion logic in `InsertIntoHadoopFsRelation` more robust to avoid such confusing exceptions.

Author: Cheng Lian <lian@databricks.com>

Closes #6612 from liancheng/spark-8079 and squashes the following commits:

87cd81e [Cheng Lian] Addresses @rxin's comment
1864c75 [Cheng Lian] Addresses review comments
9e6dbb3 [Cheng Lian] Makes InsertIntoHadoopFsRelation job/task abortion more robust

(cherry picked from commit 16fc49617e)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-06-06 17:23:46 +08:00
Shivaram Venkataraman 3e3151e755 [SPARK-8085] [SPARKR] Support user-specified schema in read.df
cc davies sun-rui

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6620 from shivaram/sparkr-read-schema and squashes the following commits:

16a6726 [Shivaram Venkataraman] Fix loadDF to pass schema Also add a unit test
a229877 [Shivaram Venkataraman] Use wrapper function to DataFrameReader
ee70ba8 [Shivaram Venkataraman] Support user-specified schema in read.df

(cherry picked from commit 12f5eaeee1)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-05 10:19:15 -07:00
Mike Dusenberry 81ff7a9012 [SPARK-7969] [SQL] Added a DataFrame.drop function that accepts a Column reference.
Added a `DataFrame.drop` function that accepts a `Column` reference rather than a `String`, and added associated unit tests.  Basically iterates through the `DataFrame` to find a column with an expression that is equivalent to that of the `Column` argument supplied to the function.

Author: Mike Dusenberry <dusenberrymw@gmail.com>

Closes #6585 from dusenberrymw/SPARK-7969_Drop_method_on_Dataframes_should_handle_Column and squashes the following commits:

514727a [Mike Dusenberry] Updating the @since tag of the drop(Column) function doc to reflect version 1.4.1 instead of 1.4.0.
2f1bb4e [Mike Dusenberry] Adding an additional assert statement to the 'drop column after join' unit test in order to make sure the correct column was indeed left over.
6bf7c0e [Mike Dusenberry] Minor code formatting change.
e583888 [Mike Dusenberry] Adding more Python doctests for the df.drop with column reference function to test joined datasets that have columns with the same name.
5f74401 [Mike Dusenberry] Updating DataFrame.drop with column reference function to use logicalPlan.output to prevent ambiguities resulting from columns with the same name. Also added associated unit tests for joined datasets with duplicate column names.
4b8bbe8 [Mike Dusenberry] Adding Python support for Dataframe.drop with a Column reference.
986129c [Mike Dusenberry] Added a DataFrame.drop function that accepts a Column reference rather than a String, and added associated unit tests.  Basically iterates through the DataFrame to find a column with an expression that is equivalent to one supplied to the function.

(cherry picked from commit df7da07a86)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-04 11:30:25 -07:00
Andrew Or bfe74b34a6 [SPARK-7558] Demarcate tests in unit-tests.log (1.4)
This includes the following commits:

original: 9eb222c
hotfix1: 8c99793
hotfix2: a4f2412
scalastyle check: 609c492

---
Original patch #6441
Branch-1.3 patch #6602

Author: Andrew Or <andrew@databricks.com>

Closes #6598 from andrewor14/demarcate-tests-1.4 and squashes the following commits:

4c3c566 [Andrew Or] Merge branch 'branch-1.4' of github.com:apache/spark into demarcate-tests-1.4
e217b78 [Andrew Or] [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike
46d4361 [Andrew Or] Various whitespace changes (minor)
3d9bf04 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
eaa520e [Andrew Or] Fix tests?
b4d93de [Andrew Or] Fix tests
634a777 [Andrew Or] Fix log message
a932e8d [Andrew Or] Fix manual things that cannot be covered through automation
8bc355d [Andrew Or] Add core tests as dependencies in all modules
75d361f [Andrew Or] Introduce base abstract class for all test suites
2015-06-03 20:46:44 -07:00
Reynold Xin 1f90a06bda [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.
Author: Reynold Xin <rxin@databricks.com>

Closes #6608 from rxin/parquet-analysis and squashes the following commits:

b5dc8e2 [Reynold Xin] Code review feedback.
5617cf6 [Reynold Xin] [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.

(cherry picked from commit 939e4f3d8d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-03 13:58:15 -07:00
animesh 0a1dad6cd4 [SPARK-7980] [SQL] Support SQLContext.range(end)
1. range() overloaded in SQLContext.scala
2. range() modified in python sql context.py
3. Tests added accordingly in DataFrameSuite.scala and python sql tests.py

Author: animesh <animesh@apache.spark>

Closes #6609 from animeshbaranawal/SPARK-7980 and squashes the following commits:

935899c [animesh] SPARK-7980:python+scala changes

(cherry picked from commit d053a31be9)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-03 11:28:38 -07:00
Yin Huai 54a4ea4078 [SPARK-7973] [SQL] Increase the timeout of two CliSuite tests.
https://issues.apache.org/jira/browse/SPARK-7973

Author: Yin Huai <yhuai@databricks.com>

Closes #6525 from yhuai/SPARK-7973 and squashes the following commits:

763b821 [Yin Huai] Also change the timeout of "Single command with -e" to 2 minutes.
e598a08 [Yin Huai] Increase the timeout to 3 minutes.

(cherry picked from commit f1646e1023)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-06-03 09:26:30 -07:00
Patrick Wendell ab713af564 Preparing development version 1.4.0-SNAPSHOT 2015-06-02 18:06:41 -07:00
Patrick Wendell 22596c534a Preparing Spark release v1.4.0-rc4 2015-06-02 18:06:35 -07:00
Cheng Lian 0d83720990 [SQL] [TEST] [MINOR] Follow-up of PR #6493, use Guava API to ensure Java 6 friendliness
This is a follow-up of PR #6493, which has been reverted in branch-1.4 because it uses Java 7 specific APIs and breaks Java 6 build. This PR replaces those APIs with equivalent Guava ones to ensure Java 6 friendliness.

cc andrewor14 pwendell, this should also be back ported to branch-1.4.

Author: Cheng Lian <lian@databricks.com>

Closes #6547 from liancheng/override-log4j and squashes the following commits:

c900cfd [Cheng Lian] Addresses Shixiong's comment
72da795 [Cheng Lian] Uses Guava API to ensure Java 6 friendliness

(cherry picked from commit 5cd6a63d96)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-06-02 17:07:20 -07:00
Cheng Lian daeaa0c5ac [SQL] [TEST] [MINOR] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior
The `HiveThriftServer2Test` relies on proper logging behavior to assert whether the Thrift server daemon process is started successfully. However, some other jar files listed in the classpath may potentially contain an unexpected Log4J configuration file which overrides the logging behavior.

This PR writes a temporary `log4j.properties` and prepend it to driver classpath before starting the testing Thrift server process to ensure proper logging behavior.

cc andrewor14 yhuai

Author: Cheng Lian <lian@databricks.com>

Closes #6493 from liancheng/override-log4j and squashes the following commits:

c489e0e [Cheng Lian] Fixes minor Scala styling issue
b46ef0d [Cheng Lian] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior
2015-06-02 17:06:24 -07:00
Patrick Wendell e3c35b217c Preparing development version 1.4.0-SNAPSHOT 2015-06-02 17:01:15 -07:00
Patrick Wendell a14fad11ef Preparing Spark release v1.4.0-rc4 2015-06-02 17:01:10 -07:00
Patrick Wendell 92ccc5ba39 Preparing development version 1.4.0-SNAPSHOT 2015-06-02 14:02:19 -07:00
Patrick Wendell d630f4d697 Preparing Spark release v1.4.0-rc4 2015-06-02 14:02:14 -07:00
Cheng Lian cbaf595447 [SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append
The current code references the schema of the DataFrame to be written before checking save mode. This triggers expensive metadata discovery prematurely. For save mode other than `Append`, this metadata discovery is useless since we either ignore the result (for `Ignore` and `ErrorIfExists`) or delete existing files (for `Overwrite`) later.

This PR fixes this issue by deferring metadata discovery after save mode checking.

Author: Cheng Lian <lian@databricks.com>

Closes #6583 from liancheng/spark-8014 and squashes the following commits:

1aafabd [Cheng Lian] Updates comments
088abaa [Cheng Lian] Avoids schema merging and partition discovery when data schema and partition schema are defined
8fbd93f [Cheng Lian] Fixes SPARK-8014

(cherry picked from commit 686a45f0b9)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-06-02 13:32:34 -07:00
Cheng Lian f71a09de6e [SPARK-8037] [SQL] Ignores files whose name starts with dot in HadoopFsRelation
Author: Cheng Lian <lian@databricks.com>

Closes #6581 from liancheng/spark-8037 and squashes the following commits:

d08e97b [Cheng Lian] Ignores files whose name starts with dot in HadoopFsRelation

(cherry picked from commit 1bb5d716c0)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-06-03 01:09:19 +08:00
Yin Huai 8c3fc3a6cd [HOT-FIX] Add EvaluatedType back to RDG
87941ff8c4 accidentally removed the EvaluatedType.

Author: Yin Huai <yhuai@databricks.com>

Closes #6589 from yhuai/getBackEvaluatedType and squashes the following commits:

618c2eb [Yin Huai] Add EvaluatedType back.
2015-06-02 09:59:19 -07:00
Patrick Wendell 92a677891c Preparing development version 1.4.0-SNAPSHOT 2015-06-02 08:41:15 -07:00
Patrick Wendell 48c506724a Preparing Spark release v1.4.0-rc4 2015-06-02 08:41:10 -07:00
Yin Huai 87941ff8c4 [SPARK-8023][SQL] Add "deterministic" attribute to Expression to avoid collapsing nondeterministic projects.
This closes #6570.

Author: Yin Huai <yhuai@databricks.com>
Author: Reynold Xin <rxin@databricks.com>

Closes #6573 from rxin/deterministic and squashes the following commits:

356cd22 [Reynold Xin] Added unit test for the optimizer.
da3fde1 [Reynold Xin] Merge pull request #6570 from yhuai/SPARK-8023
da56200 [Yin Huai] Comments.
e38f264 [Yin Huai] Comment.
f9d6a73 [Yin Huai] Add a deterministic method to Expression.

(cherry picked from commit 0f80990bfa)
Signed-off-by: Reynold Xin <rxin@databricks.com>

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/random.scala
2015-06-02 00:21:27 -07:00
Yin Huai 4940630f56 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get constructed too early
https://issues.apache.org/jira/browse/SPARK-8020

Author: Yin Huai <yhuai@databricks.com>

Closes #6571 from yhuai/SPARK-8020-1 and squashes the following commits:

0398f5b [Yin Huai] First populate the SQLConf and then construct executionHive and metadataHive.

(cherry picked from commit 7b7f7b6c6f)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-06-02 00:17:09 -07:00
Davies Liu 9d6475b93d [SPARK-6917] [SQL] DecimalType is not read back when non-native type exists
cc yhuai

Author: Davies Liu <davies@databricks.com>

Closes #6558 from davies/decimalType and squashes the following commits:

c877ca8 [Davies Liu] Update ParquetConverter.scala
48cc57c [Davies Liu] Update ParquetConverter.scala
b43845c [Davies Liu] add test
3b4a94f [Davies Liu] DecimalType is not read back when non-native type exists

(cherry picked from commit bcb47ad771)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-01 23:12:37 -07:00
Reynold Xin 575f3b3aa6 Fixed typo in the previous commit.
(cherry picked from commit b53a011647)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-01 21:42:22 -07:00
Yin Huai e6d58955c3 [SPARK-7965] [SPARK-7972] [SQL] Handle expressions containing multiple window expressions and make parser match window frames in case insensitive way
JIRAs:
https://issues.apache.org/jira/browse/SPARK-7965
https://issues.apache.org/jira/browse/SPARK-7972

Author: Yin Huai <yhuai@databricks.com>

Closes #6524 from yhuai/7965-7972 and squashes the following commits:

c12c79c [Yin Huai] Add doc for returned value.
de64328 [Yin Huai] Address rxin's comments.
fc9b1ad [Yin Huai] wip
2996da4 [Yin Huai] scala style
20b65b7 [Yin Huai] Handle expressions containing multiple window expressions.
9568b21 [Yin Huai] case insensitive matches
41f633d [Yin Huai] Failed test case.

(cherry picked from commit e797dba58e)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-01 21:40:35 -07:00
Reynold Xin 3af4c0b4e8 [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API
Author: Reynold Xin <rxin@databricks.com>

Closes #6569 from rxin/freqItemsWarning and squashes the following commits:

7eec145 [Reynold Xin] [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API.

(cherry picked from commit 4c868b9943)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-01 21:29:46 -07:00