Commit graph

11413 commits

Author SHA1 Message Date
Yin Huai cdfa388dd0 [SPARK-7287] [SPARK-8567] [TEST] Add sc.stop to applications in SparkSubmitSuite
Hopefully, this suite will not be flaky anymore.

Author: Yin Huai <yhuai@databricks.com>

Closes #7027 from yhuai/SPARK-8567 and squashes the following commits:

c0167e2 [Yin Huai] Add sc.stop().

(cherry picked from commit fbf75738fe)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-06-29 17:20:14 -07:00
zsxwing f84f24769a [SPARK-8634] [STREAMING] [TESTS] Fix flaky test StreamingListenerSuite "receiver info reporting"
As per the unit test log in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35754/

```
15/06/24 23:09:10.210 Thread-3495 INFO ReceiverTracker: Starting 1 receivers
15/06/24 23:09:10.270 Thread-3495 INFO SparkContext: Starting job: apply at Transformer.scala:22
...
15/06/24 23:09:14.259 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Started receiver and sleeping
15/06/24 23:09:14.270 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Reporting error and sleeping
```

it needs at least 4 seconds to receive all receiver events in this slow machine, but `timeout` for `eventually` is only 2 seconds.
This PR increases `timeout` to make this test stable.

Author: zsxwing <zsxwing@gmail.com>

Closes #7017 from zsxwing/SPARK-8634 and squashes the following commits:

719cae4 [zsxwing] Fix flaky test StreamingListenerSuite "receiver info reporting"

(cherry picked from commit cec98525fd)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-06-29 17:19:24 -07:00
Yin Huai 6a45d86dbd [SPARK-8710] [SQL] Change ScalaReflection.mirror from a val to a def.
jira: https://issues.apache.org/jira/browse/SPARK-8710

Author: Yin Huai <yhuai@databricks.com>

Closes #7094 from yhuai/SPARK-8710 and squashes the following commits:

c854baa [Yin Huai] Change ScalaReflection.mirror from a val to a def.

(cherry picked from commit 4b497a724a)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-29 16:26:13 -07:00
Michael Armbrust 0c3f7fc88a [HOTFIX] Fix whitespace style error
Author: Michael Armbrust <michael@databricks.com>

Closes #7102 from marmbrus/fixStyle and squashes the following commits:

8c08124 [Michael Armbrust] [HOTFIX] Fix whitespace style error
2015-06-29 15:57:36 -07:00
Yin Huai 0de1737a8a [SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmitSuite (1.4 branch)
Cherry-pick f9b397f54d to branch 1.4.

Author: Yin Huai <yhuai@databricks.com>

Closes #7092 from yhuai/SPARK-8567-1.4 and squashes the following commits:

0ae2e14 [Yin Huai] [SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmitSuite.
2015-06-29 15:20:35 -07:00
Ai He 187015f671 [SPARK-7810] [PYSPARK] solve python rdd socket connection problem
Method "_load_from_socket" in rdd.py cannot load data from jvm socket when ipv6 is used. The current method only works well with ipv4. New modification should work around both two protocols.

Author: Ai He <ai.he@ussuning.com>
Author: AiHe <ai.he@ussuning.com>

Closes #6338 from AiHe/pyspark-networking-issue and squashes the following commits:

d4fc9c4 [Ai He] handle code review 2
e75c5c8 [Ai He] handle code review
5644953 [AiHe] solve python rdd socket connection problem to jvm

(cherry picked from commit ecd3aacf28)
Signed-off-by: Davies Liu <davies@databricks.com>
2015-06-29 14:36:48 -07:00
scwf 457d07eaa0 [SQL] [MINOR] Skip unresolved expression for InConversion
Author: scwf <wangfei1@huawei.com>

Closes #6145 from scwf/InConversion and squashes the following commits:

5c8ac6b [scwf] minir fix for InConversion

(cherry picked from commit edf09ea1bd)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-06-29 13:30:47 -07:00
Burak Yavuz 6b9f3831a8 [SPARK-8681] fixed wrong ordering of columns in crosstab
I specifically randomized the test. What crosstab does is equivalent to a countByKey, therefore if this test fails again for any reason, we will know that we hit a corner case or something.

cc rxin marmbrus

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #7060 from brkyvz/crosstab-fixes and squashes the following commits:

0a65234 [Burak Yavuz] addressed comments v1
d96da7e [Burak Yavuz] fixed wrong ordering of columns in crosstab

(cherry picked from commit be7ef06762)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-29 13:15:12 -07:00
Kousuke Saruta da51cf58fc [SQL][DOCS] Remove wrong example from DataFrame.scala
In DataFrame.scala, there are examples like as follows.

```
 * // The following are equivalent:
 * peopleDf.filter($"age" > 15)
 * peopleDf.where($"age" > 15)
 * peopleDf($"age" > 15)
```

But, I think the last example doesn't work.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #6977 from sarutak/fix-dataframe-example and squashes the following commits:

46efbd7 [Kousuke Saruta] Removed wrong example

(cherry picked from commit 94e040d059)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-29 12:16:44 -07:00
Andrew Or f7c200e6ac Revert "[SPARK-8372] History server shows incorrect information for application not started"
This reverts commit f0513733d4.
2015-06-29 10:52:23 -07:00
Cheolsoo Park 9d9c4b476d [SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror Scala
I compared PySpark DataFrameReader/Writer against Scala ones. `Option` function is missing in both reader and writer, but the rest seems to all match.

I added `Option` to reader and writer and updated the `pyspark-sql` test.

Author: Cheolsoo Park <cheolsoop@netflix.com>

Closes #7078 from piaozhexiu/SPARK-8355 and squashes the following commits:

c63d419 [Cheolsoo Park] Fix version
524e0aa [Cheolsoo Park] Add option function to df reader and writer

(cherry picked from commit ac2e17b01c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-29 00:21:31 -07:00
Josh Rosen e1bbf1a080 [SPARK-8606] Prevent exceptions in RDD.getPreferredLocations() from crashing DAGScheduler
If `RDD.getPreferredLocations()` throws an exception it may crash the DAGScheduler and SparkContext. This patch addresses this by adding a try-catch block.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #7023 from JoshRosen/SPARK-8606 and squashes the following commits:

770b169 [Josh Rosen] Fix getPreferredLocations() DAGScheduler crash with try block.
44a9b55 [Josh Rosen] Add test of a buggy getPartitions() method
19aa9f7 [Josh Rosen] Add (failing) regression test for getPreferredLocations() DAGScheduler crash

(cherry picked from commit 0b5abbf5f9)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
2015-06-27 14:41:03 -07:00
Neelesh Srinivas Salian a2dbb48071 [SPARK-3629] [YARN] [DOCS]: Improvement of the "Running Spark on YARN" document
As per the description in the JIRA, I moved the contents of the page and added a few additional content.

Author: Neelesh Srinivas Salian <nsalian@cloudera.com>

Closes #6924 from nssalian/SPARK-3629 and squashes the following commits:

944b7a0 [Neelesh Srinivas Salian] Changed the lines about deploy-mode and added backticks to all parameters
40dbc0b [Neelesh Srinivas Salian] Changed dfs to HDFS, deploy-mode in backticks and updated the master yarn line
9cbc072 [Neelesh Srinivas Salian] Updated a few lines in the Launching Spark on YARN Section
8e8db7f [Neelesh Srinivas Salian] Removed the changes in this commit to help clearly distinguish movement from update
151c298 [Neelesh Srinivas Salian] SPARK-3629: Improvement of the Spark on YARN document

(cherry picked from commit d48e78934a)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-06-27 09:08:26 +03:00
Rosstin 68907d2720 [SPARK-8639] [DOCS] Fixed Minor Typos in Documentation
Ticket: [SPARK-8639](https://issues.apache.org/jira/browse/SPARK-8639)

fixed minor typos in docs/README.md and docs/api.md

Author: Rosstin <asterazul@gmail.com>

Closes #7046 from Rosstin/SPARK-8639 and squashes the following commits:

6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md

(cherry picked from commit b5a6663da2)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-06-27 08:49:13 +03:00
cafreeman 2579948bf5 [SPARK-8607] SparkR -- jars not being added to application classpath correctly
Add `getStaticClass` method in SparkR's `RBackendHandler`

This is a fix for the problem referenced in [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185).

cc shivaram

Author: cafreeman <cfreeman@alteryx.com>

Closes #7001 from cafreeman/branch-1.4 and squashes the following commits:

8f81194 [cafreeman] Add missing license
31aedcf [cafreeman] Refactor test to call an external R script
2c22073 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
0bea809 [cafreeman] Fixed relative path issue and added smaller JAR
ee25e60 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
9a5c362 [cafreeman] test for including JAR when launching sparkContext
9101223 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4
5a80844 [cafreeman] Fix style nits
7c6bd0c [cafreeman] [SPARK-8607] SparkR
2015-06-26 17:06:02 -07:00
cafreeman 78b31a2a63 [SPARK-8662] SparkR Update SparkSQL Test
Test `infer_type` using a more fine-grained approach rather than comparing environments. Since `all.equal`'s behavior has changed in R 3.2, the test became unpassable.

JIRA here:
https://issues.apache.org/jira/browse/SPARK-8662

Author: cafreeman <cfreeman@alteryx.com>

Closes #7045 from cafreeman/R32_Test and squashes the following commits:

b97cc52 [cafreeman] Add `checkStructField` utility
3381e5c [cafreeman] Update SparkSQL Test
2015-06-26 10:07:35 -07:00
Shivaram Venkataraman 6abb4fc8a4 [SPARK-8637] [SPARKR] [HOTFIX] Fix packages argument, sparkSubmitBinName
cc cafreeman

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #7022 from shivaram/sparkr-init-hotfix and squashes the following commits:

9178d15 [Shivaram Venkataraman] Fix packages argument, sparkSubmitBinName

(cherry picked from commit c392a9efab)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-25 10:56:08 -07:00
Tom Graves 13802163de [SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/ta…
…rget versions.

I basically copied the compatibility rules from the top level pom.xml into here.  Someone more familiar with all the options in the top level pom may want to make sure nothing else should be copied on down.

With this is allows me to build with jdk8 and run with lower versions.  Source shows compiled for jdk6 as its supposed to.

Author: Tom Graves <tgraves@yahoo-inc.com>
Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>

Closes #6989 from tgravescs/SPARK-8574 and squashes the following commits:

e1ea2d4 [Thomas Graves] Change to use combine.children="append"
150d645 [Tom Graves] [SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/target versions

(cherry picked from commit e988adb58f)
Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>
2015-06-25 08:27:56 -05:00
Joshi 74001db04f [SPARK-5768] [WEB UI] Fix for incorrect memory in Spark UI
Fix for incorrect memory in Spark UI as per SPARK-5768

Author: Joshi <rekhajoshm@gmail.com>
Author: Rekha Joshi <rekhajoshm@gmail.com>

Closes #6972 from rekhajoshm/SPARK-5768 and squashes the following commits:

b678a91 [Joshi] Fix for incorrect memory in Spark UI
2fe53d9 [Joshi] Fix for incorrect memory in Spark UI
eb823b8 [Joshi] SPARK-5768: Fix for incorrect memory in Spark UI
0be142d [Rekha Joshi] Merge pull request #3 from apache/master
106fd8e [Rekha Joshi] Merge pull request #2 from apache/master
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master

(cherry picked from commit 085a7216bf)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
2015-06-25 20:22:04 +09:00
Cheng Lian 0605e08434 [SPARK-8604] [SQL] HadoopFsRelation subclasses should set their output format class
`HadoopFsRelation` subclasses, especially `ParquetRelation2` should set its own output format class, so that the default output committer can be setup correctly when doing appending (where we ignore user defined output committers).

Author: Cheng Lian <lian@databricks.com>

Closes #6998 from liancheng/spark-8604 and squashes the following commits:

9be51d1 [Cheng Lian] Adds more comments
6db1368 [Cheng Lian] HadoopFsRelation subclasses should set their output format class

(cherry picked from commit c337844ed7)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-06-25 00:07:01 -07:00
Yin Huai 792ed7a4ba [SPARK-8567] [SQL] Increase the timeout of HiveSparkSubmitSuite
https://issues.apache.org/jira/browse/SPARK-8567

Author: Yin Huai <yhuai@databricks.com>

Closes #6957 from yhuai/SPARK-8567 and squashes the following commits:

62dff5b [Yin Huai] Increase the timeout.

Conflicts:
	sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala
2015-06-24 15:56:43 -07:00
BenFradet 93793237ee [SPARK-8399] [STREAMING] [WEB UI] Overlap between histograms and axis' name in Spark Streaming UI
Moved where the X axis' name (#batches) is written in histograms in the spark streaming web ui so the histograms and the axis' name do not overlap.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #6845 from BenFradet/SPARK-8399 and squashes the following commits:

b63695f [BenFradet] adjusted inner histograms
eb610ee [BenFradet] readjusted #batches on the x axis
dd46f98 [BenFradet] aligned all unit labels and ticks
0564b62 [BenFradet] readjusted #batches placement
edd0936 [BenFradet] moved where the X axis' name (#batches) is written in histograms in the spark streaming web ui

(cherry picked from commit 1173483f3f)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-06-24 12:03:23 -07:00
Holden Karau f6682dd6e8 [SPARK-8506] Add pakages to R context created through init.
Author: Holden Karau <holden@pigscanfly.ca>

Closes #6928 from holdenk/SPARK-8506-sparkr-does-not-provide-an-easy-way-to-depend-on-spark-packages-when-performing-init-from-inside-of-r and squashes the following commits:

b60dd63 [Holden Karau] Add an example with the spark-csv package
fa8bc92 [Holden Karau] typo: sparm -> spark
865a90c [Holden Karau] strip spaces for comparision
c7a4471 [Holden Karau] Add some documentation
c1a9233 [Holden Karau] refactor for testing
c818556 [Holden Karau] Add pakages to R

(cherry picked from commit 43e66192f4)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-24 11:55:29 -07:00
Yin Huai 7e53ff2581 [SPARK-8578] [SQL] Should ignore user defined output committer when appending data (branch 1.4)
This is https://github.com/apache/spark/pull/6964 for branch 1.4.

Author: Yin Huai <yhuai@databricks.com>

Closes #6966 from yhuai/SPARK-8578-branch-1.4 and squashes the following commits:

9c3947b [Yin Huai] Do not use a custom output commiter when appendiing data.
2015-06-24 09:51:18 -07:00
Patrick Wendell eafbe13459 Preparing development version 1.4.2-SNAPSHOT 2015-06-23 19:48:44 -07:00
Patrick Wendell 60e08e5075 Preparing Spark release v1.4.1-rc1 2015-06-23 19:48:39 -07:00
Davies Liu 13f7b0a910 [SPARK-8573] [SPARK-8568] [SQL] [PYSPARK] raise Exception if column is used in booelan expression
It's a common mistake that user will put Column in a boolean expression (together with `and` , `or`), which does not work as expected, we should raise a exception in that case, and suggest user to use `&`, `|` instead.

Author: Davies Liu <davies@databricks.com>

Closes #6961 from davies/column_bool and squashes the following commits:

9f19beb [Davies Liu] update message
af74bd6 [Davies Liu] fix tests
07dff84 [Davies Liu] address comments, fix tests
f70c08e [Davies Liu] raise Exception if column is used in booelan expression

(cherry picked from commit 7fb5ae5024)
Signed-off-by: Davies Liu <davies@databricks.com>
2015-06-23 15:51:35 -07:00
Oleksiy Dyagilev 8d6e3636e9 [SPARK-8525] [MLLIB] fix LabeledPoint parser when there is a whitespace between label and features vector
fix LabeledPoint parser when there is a whitespace between label and features vector, e.g.
(y, [x1, x2, x3])

Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com>

Closes #6954 from fe2s/SPARK-8525 and squashes the following commits:

0755b9d [Oleksiy Dyagilev] [SPARK-8525][MLLIB] addressing comment, removing dep on commons-lang
c1abc2b [Oleksiy Dyagilev] [SPARK-8525][MLLIB] fix LabeledPoint parser when there is a whitespace on specific position

(cherry picked from commit a8031183af)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-06-23 13:15:27 -07:00
lockwobr 27693e1757 [SQL] [DOCS] updated the documentation for explode
the syntax was incorrect in the example in explode

Author: lockwobr <lockwobr@gmail.com>

Closes #6943 from lockwobr/master and squashes the following commits:

3d864d1 [lockwobr] updated the documentation for explode

(cherry picked from commit 4f7fbefb8d)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
2015-06-24 02:51:36 +09:00
Josh Rosen 77cb1d5ed1 Revert "[SPARK-8498] [TUNGSTEN] fix npe in errorhandling path in unsafeshuffle writer"
This reverts commit 3348245055.

Reverting because `catch (Exception e) ... throw e` doesn't compile under
Java 6 unless the method declares that it throws Exception.
2015-06-23 09:19:11 -07:00
Holden Karau 3348245055 [SPARK-8498] [TUNGSTEN] fix npe in errorhandling path in unsafeshuffle writer
Author: Holden Karau <holden@pigscanfly.ca>

Closes #6918 from holdenk/SPARK-8498-fix-npe-in-errorhandling-path-in-unsafeshuffle-writer and squashes the following commits:

f807832 [Holden Karau] Log error if we can't throw it
855f9aa [Holden Karau] Spelling - not my strongest suite. Fix Propegates to Propagates.
039d620 [Holden Karau] Add missing closeandwriteoutput
30e558d [Holden Karau] go back to try/finally
e503b8c [Holden Karau] Improve the test to ensure we aren't masking the underlying exception
ae0b7a7 [Holden Karau] Fix the test
2e6abf7 [Holden Karau] Be more cautious when cleaning up during failed write and re-throw user exceptions

(cherry picked from commit 0f92be5b5f)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
2015-06-23 09:08:49 -07:00
Hari Shreedharan 9294796750 [SPARK-8483] [STREAMING] Remove commons-lang3 dependency from Flume Sink
Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #6910 from harishreedharan/remove-commons-lang3 and squashes the following commits:

9875f7d [Hari Shreedharan] Revert back to Flume 1.4.0
ca35eb0 [Hari Shreedharan] [SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Sink. Also bump Flume version to 1.6.0
2015-06-22 23:41:35 -07:00
Scott Taylor d0943afbcf [SPARK-8541] [PYSPARK] test the absolute error in approx doctests
A minor change but one which is (presumably) visible on the public api docs webpage.

Author: Scott Taylor <github@megatron.me.uk>

Closes #6942 from megatron-me-uk/patch-3 and squashes the following commits:

fbed000 [Scott Taylor] test the absolute error in approx doctests

(cherry picked from commit f0dcbe8a7c)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
2015-06-22 23:38:21 -07:00
Holden Karau 22cc1ab66e [SPARK-7781] [MLLIB] gradient boosted trees.train regressor missing max bins
Author: Holden Karau <holden@pigscanfly.ca>

Closes #6331 from holdenk/SPARK-7781-GradientBoostedTrees.trainRegressor-missing-max-bins and squashes the following commits:

2894695 [Holden Karau] remove extra blank line
2573e8d [Holden Karau] Update the scala side of the pythonmllibapi and make the test a bit nicer too
3a09170 [Holden Karau] add maxBins to to the train method as well
af7f274 [Holden Karau] Add maxBins to GradientBoostedTrees.trainRegressor and correctly mention the default of 32 in other places where it mentioned 100

(cherry picked from commit 164fe2aa44)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-06-22 22:40:31 -07:00
Patrick Wendell 1cfa7302ee Preparing development version 1.4.2-SNAPSHOT 2015-06-22 22:21:31 -07:00
Patrick Wendell d0a5560ce4 Preparing Spark release v1.4.1-rc1 2015-06-22 22:21:26 -07:00
Patrick Wendell 48d6830144 [BUILD] Preparing Spark release 1.4.1 2015-06-22 22:18:52 -07:00
Yu ISHIKAWA 250179485b [SPARK-8548] [SPARKR] Remove the trailing whitespaces from the SparkR files
[[SPARK-8548] Remove the trailing whitespaces from the SparkR files - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8548)

- This is the result of `lint-r`
    https://gist.github.com/yu-iskw/0019b37a2c1167f33986

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #6945 from yu-iskw/SPARK-8548 and squashes the following commits:

0bd567a [Yu ISHIKAWA] [SPARK-8548][SparkR] Remove the trailing whitespaces from the SparkR files

(cherry picked from commit 44fa7df64d)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-22 20:55:55 -07:00
Cheng Hao d73900a903 [SPARK-7859] [SQL] Collect_set() behavior differences which fails the unit test under jdk8
To reproduce that:
```
JAVA_HOME=/home/hcheng/Java/jdk1.8.0_45 | build/sbt -Phadoop-2.3 -Phive  'test-only org.apache.spark.sql.hive.execution.HiveWindowFunctionQueryWithoutCodeGenSuite'
```

A simple workaround to fix that is update the original query, for getting the output size instead of the exact elements of the array (output by collect_set())

Author: Cheng Hao <hao.cheng@intel.com>

Closes #6402 from chenghao-intel/windowing and squashes the following commits:

99312ad [Cheng Hao] add order by for the select clause
edf8ce3 [Cheng Hao] update the code as suggested
7062da7 [Cheng Hao] fix the collect_set() behaviour differences under different versions of JDK

(cherry picked from commit 13321e6555)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-06-22 20:05:00 -07:00
Yin Huai 994abbaeb3 [SPARK-8532] [SQL] In Python's DataFrameWriter, save/saveAsTable/json/parquet/jdbc always override mode
https://issues.apache.org/jira/browse/SPARK-8532

This PR has two changes. First, it fixes the bug that save actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`.

Author: Yin Huai <yhuai@databricks.com>

Closes #6937 from yhuai/SPARK-8532 and squashes the following commits:

f972d5d [Yin Huai] davies's comment.
d37abd2 [Yin Huai] style.
d21290a [Yin Huai] Python doc.
889eb25 [Yin Huai] Minor refactoring and add partitionBy to save, saveAsTable, and parquet.
7fbc24b [Yin Huai] Use None instead of "error" as the default value of mode since JVM-side already uses "error" as the default value.
d696dff [Yin Huai] Python style.
88eb6c4 [Yin Huai] If mode is "error", do not call mode method.
c40c461 [Yin Huai] Regression test.

(cherry picked from commit 5ab9fcfb01)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-06-22 13:51:34 -07:00
Yu ISHIKAWA 507381d393 [SPARK-8511] [PYSPARK] Modify a test to remove a saved model in regression.py
[[SPARK-8511] Modify a test to remove a saved model in `regression.py` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8511)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #6926 from yu-iskw/SPARK-8511 and squashes the following commits:

7cd0948 [Yu ISHIKAWA] Use `shutil.rmtree()` to temporary directories for saving model testings, instead of `os.removedirs()`
4a01c9e [Yu ISHIKAWA] [SPARK-8511][pyspark] Modify a test to remove a saved model in `regression.py`

(cherry picked from commit 5d89d9f00b)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

Conflicts:
	python/pyspark/mllib/tests.py
2015-06-22 11:59:53 -07:00
Michael Armbrust 65981619b2 [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings (branch-1.4)
This is branch 1.4 backport of https://github.com/apache/spark/pull/6888.

Below is the original description.

In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to `StringType` when it was involved in a binary comparison with a `StringType`.  This allowed comparing a timestamp with a partial date as a user would expect.
 - `time > "2014-06-10"`
 - `time > "2014"`

In 1.4.0 we tried to cast the String instead into a Timestamp.  However, since partial dates are not a valid complete timestamp this results in `null` which results in the tuple being filtered.

This PR restores the earlier behavior.  Note that we still special case equality so that these comparisons are not affected by not printing zeros for subsecond precision.

Author: Michael Armbrust <michaeldatabricks.com>

Closes #6888 from marmbrus/timeCompareString and squashes the following commits:

bdef29c [Michael Armbrust] test partial date
1f09adf [Michael Armbrust] special handling of equality
1172c60 [Michael Armbrust] more test fixing
4dfc412 [Michael Armbrust] fix tests
aaa9508 [Michael Armbrust] newline
04d908f [Michael Armbrust] [SPARK-8420][SQL] Fix comparision of timestamps/dates with strings

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

Author: Michael Armbrust <michael@databricks.com>

Closes #6914 from yhuai/timeCompareString-1.4 and squashes the following commits:

9882915 [Michael Armbrust] [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings
2015-06-22 10:45:33 -07:00
Cheng Lian 451c8722af [SPARK-8406] [SQL] Backports SPARK-8406 and PR #6864 to branch-1.4
Author: Cheng Lian <lian@databricks.com>

Closes #6932 from liancheng/spark-8406-for-1.4 and squashes the following commits:

a0168fe [Cheng Lian] Backports SPARK-8406 and PR #6864 to branch-1.4
2015-06-22 10:04:29 -07:00
Liang-Chi Hsieh b836bac3fe [HOTFIX] Hotfix branch-1.4 building by removing avgMetrics in CrossValidatorSuite
Ref. #6905
ping yhuai

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6929 from viirya/hot_fix_cv_test and squashes the following commits:

b1aec53 [Liang-Chi Hsieh] Hotfix branch-1.4 by removing avgMetrics in CrossValidatorSuite.
2015-06-21 22:25:08 -07:00
Joseph K. Bradley 2a7ea31a9e [SPARK-7715] [MLLIB] [ML] [DOC] Updated MLlib programming guide for release 1.4
Reorganized docs a bit.  Added migration guides.

**Q**: Do we want to say more for the 1.3 -> 1.4 migration guide for ```spark.ml```?  It would be a lot.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6897 from jkbradley/ml-guide-1.4 and squashes the following commits:

4bf26d6 [Joseph K. Bradley] tiny fix
8085067 [Joseph K. Bradley] fixed spacing/layout issues in ml guide from previous commit in this PR
6cd5c78 [Joseph K. Bradley] Updated MLlib programming guide for release 1.4

(cherry picked from commit a1894422ad)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-06-21 16:27:14 -07:00
jeanlyn f0e4040202 [SPARK-8379] [SQL] avoid speculative tasks write to the same file
The issue link [SPARK-8379](https://issues.apache.org/jira/browse/SPARK-8379)
Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception
```
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-10000/ds=2015-06-15/type=2/part-00301.lzo
owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53
but is accessed by DFSClient_attempt_201506031520_0011_m_000042_0_-1275047721_57
```
This pr try to write the data to temporary dir when using dynamic parition  avoid the speculative tasks writing the same file

Author: jeanlyn <jeanlyn92@gmail.com>

Closes #6833 from jeanlyn/speculation and squashes the following commits:

64bbfab [jeanlyn] use FileOutputFormat.getTaskOutputPath to get the path
8860af0 [jeanlyn] remove the never using code
e19a3bd [jeanlyn] avoid speculative tasks write same file

(cherry picked from commit a1e3649c87)
Signed-off-by: Cheng Lian <lian@databricks.com>
2015-06-21 00:13:55 -07:00
Liang-Chi Hsieh fe59a4a5f5 [SPARK-8468] [ML] Take the negative of some metrics in RegressionEvaluator to get correct cross validation
JIRA: https://issues.apache.org/jira/browse/SPARK-8468

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6905 from viirya/cv_min and squashes the following commits:

930d3db [Liang-Chi Hsieh] Fix python unit test and add document.
d632135 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cv_min
16e3b2c [Liang-Chi Hsieh] Take the negative instead of reciprocal.
c3dd8d9 [Liang-Chi Hsieh] For comments.
b5f52c1 [Liang-Chi Hsieh] Add param to CrossValidator for choosing whether to maximize evaulation value.

(cherry picked from commit 0b8995168f)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-06-20 13:02:14 -07:00
Andrew Or 9b16508d2c [HOTFIX] [SPARK-8489] Correct JIRA number in previous commit
It should be SPARK-8489, not SPARK-8498.
2015-06-19 17:40:21 -07:00
cody koeninger a7b773a8b5 [SPARK-8390] [STREAMING] [KAFKA] fix docs related to HasOffsetRanges
Author: cody koeninger <cody@koeninger.org>

Closes #6863 from koeninger/SPARK-8390 and squashes the following commits:

26a06bd [cody koeninger] Merge branch 'master' into SPARK-8390
3744492 [cody koeninger] [Streaming][Kafka][SPARK-8390] doc changes per TD, test to make sure approach shown in docs actually compiles + runs
b108c9d [cody koeninger] [Streaming][Kafka][SPARK-8390] further doc fixes, clean up spacing
bb4336b [cody koeninger] [Streaming][Kafka][SPARK-8390] fix docs related to HasOffsetRanges, cleanup
3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api
2015-06-19 17:36:59 -07:00
cody koeninger 78d0ceea82 [SPARK-8389] [STREAMING] [KAFKA] Example of getting offset ranges out o…
…f the existing java direct stream api

Author: cody koeninger <cody@koeninger.org>

Closes #6846 from koeninger/SPARK-8389 and squashes the following commits:

3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api
2015-06-19 17:36:54 -07:00