Commit graph

11294 commits

Author SHA1 Message Date
Luca Martinetti 94f65bccee [SPARK-7747] [SQL] [DOCS] spark.sql.planner.externalSort
Add documentation for spark.sql.planner.externalSort

Author: Luca Martinetti <luca@luca.io>

Closes #6272 from lucamartinetti/docs-externalsort and squashes the following commits:

985661b [Luca Martinetti] [SPARK-7747] [SQL] [DOCS] Add documentation for spark.sql.planner.externalSort

(cherry picked from commit 4060526cd3)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-06-05 13:41:52 -07:00
zsxwing 200c980a13 [SPARK-8112] [STREAMING] Fix the negative event count issue
Author: zsxwing <zsxwing@gmail.com>

Closes #6659 from zsxwing/SPARK-8112 and squashes the following commits:

a5d7da6 [zsxwing] Address comments
d255b6e [zsxwing] Fix the negative event count issue

(cherry picked from commit 4f16d3fe2e)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-06-05 12:46:15 -07:00
Andrew Or 429c658519 Revert "[MINOR] [BUILD] Use custom temp directory during build."
This reverts commit 9b3e4c1871.
2015-06-05 10:54:06 -07:00
Shivaram Venkataraman 3e3151e755 [SPARK-8085] [SPARKR] Support user-specified schema in read.df
cc davies sun-rui

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6620 from shivaram/sparkr-read-schema and squashes the following commits:

16a6726 [Shivaram Venkataraman] Fix loadDF to pass schema Also add a unit test
a229877 [Shivaram Venkataraman] Use wrapper function to DataFrameReader
ee70ba8 [Shivaram Venkataraman] Support user-specified schema in read.df

(cherry picked from commit 12f5eaeee1)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-05 10:19:15 -07:00
Akhil Das 0ef2e9d351 [STREAMING] Update streaming-kafka-integration.md
Fixed the broken links (Examples) in the documentation.

Author: Akhil Das <akhld@darktech.ca>

Closes #6666 from akhld/patch-2 and squashes the following commits:

2228b83 [Akhil Das] Update streaming-kafka-integration.md

(cherry picked from commit 019dc9f558)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-06-05 14:24:06 +02:00
Marcelo Vanzin 9b3e4c1871 [MINOR] [BUILD] Use custom temp directory during build.
Even with all the efforts to cleanup the temp directories created by
unit tests, Spark leaves a lot of garbage in /tmp after a test run.
This change overrides java.io.tmpdir to place those files under the
build directory instead.

After an sbt full unit test run, I was left with > 400 MB of temp
files. Since they're now under the build dir, it's much easier to
clean them up.

Also make a slight change to a unit test to make it not pollute the
source directory with test data.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #6653 from vanzin/unit-test-tmp and squashes the following commits:

31e2dd5 [Marcelo Vanzin] Fix tests that depend on each other.
aa92944 [Marcelo Vanzin] [minor] [build] Use custom temp directory during build.

(cherry picked from commit b16b5434ff)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-06-05 14:12:05 +02:00
Sean Owen 90cf686386 [MINOR] remove unused interpolation var in log message
Completely trivial but I noticed this wrinkle in a log message today; `$sender` doesn't refer to anything and isn't interpolated here.

Author: Sean Owen <sowen@cloudera.com>

Closes #6650 from srowen/Interpolation and squashes the following commits:

518687a [Sean Owen] Actually interpolate log string
7edb866 [Sean Owen] Trivial: remove unused interpolation var in log message

(cherry picked from commit 3a5c4da473)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-05 00:32:52 -07:00
Ted Blackman f02af7c8f7 [SPARK-8116][PYSPARK] Allow sc.range() to take a single argument.
Author: Ted Blackman <ted.blackman@gmail.com>

Closes #6656 from belisarius222/branch-1.4 and squashes the following commits:

747cbc2 [Ted Blackman] [SPARK-8116][PYSPARK] Allow sc.range() to take a single argument.
2015-06-04 22:21:11 -07:00
Carson Wang 3ba6fc515d [SPARK-8098] [WEBUI] Show correct length of bytes on log page
The log page should only show desired length of bytes. Currently it shows bytes from the startIndex to the end of the file. The "Next" button on the page is always disabled.

Author: Carson Wang <carson.wang@intel.com>

Closes #6640 from carsonwang/logpage and squashes the following commits:

58cb3fd [Carson Wang] Show correct length of bytes on log page

(cherry picked from commit 63bc0c4430)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-06-04 16:25:05 -07:00
Shivaram Venkataraman 0b71b851de [SPARK-8027] [SPARKR] Move man pages creation to install-dev.sh
This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available

Related to discussion in #6567

cc pwendell srowen -- Let me know if this looks better

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6593 from shivaram/sparkr-pom-cleanup and squashes the following commits:

b282241 [Shivaram Venkataraman] Remove sparkr-docs from release script as well
8f100a5 [Shivaram Venkataraman] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available

(cherry picked from commit 3dc005282a)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-04 12:52:45 -07:00
Mike Dusenberry 81ff7a9012 [SPARK-7969] [SQL] Added a DataFrame.drop function that accepts a Column reference.
Added a `DataFrame.drop` function that accepts a `Column` reference rather than a `String`, and added associated unit tests.  Basically iterates through the `DataFrame` to find a column with an expression that is equivalent to that of the `Column` argument supplied to the function.

Author: Mike Dusenberry <dusenberrymw@gmail.com>

Closes #6585 from dusenberrymw/SPARK-7969_Drop_method_on_Dataframes_should_handle_Column and squashes the following commits:

514727a [Mike Dusenberry] Updating the @since tag of the drop(Column) function doc to reflect version 1.4.1 instead of 1.4.0.
2f1bb4e [Mike Dusenberry] Adding an additional assert statement to the 'drop column after join' unit test in order to make sure the correct column was indeed left over.
6bf7c0e [Mike Dusenberry] Minor code formatting change.
e583888 [Mike Dusenberry] Adding more Python doctests for the df.drop with column reference function to test joined datasets that have columns with the same name.
5f74401 [Mike Dusenberry] Updating DataFrame.drop with column reference function to use logicalPlan.output to prevent ambiguities resulting from columns with the same name. Also added associated unit tests for joined datasets with duplicate column names.
4b8bbe8 [Mike Dusenberry] Adding Python support for Dataframe.drop with a Column reference.
986129c [Mike Dusenberry] Added a DataFrame.drop function that accepts a Column reference rather than a String, and added associated unit tests.  Basically iterates through the DataFrame to find a column with an expression that is equivalent to one supplied to the function.

(cherry picked from commit df7da07a86)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-04 11:30:25 -07:00
Daniel Darabos daf9451a4d Fix maxTaskFailures comment
If maxTaskFailures is 1, the task set is aborted after 1 task failure. Other documentation and the code supports this reading, I think it's just this comment that was off. It's easy to make this mistake — can you please double-check if I'm correct? Thanks!

Author: Daniel Darabos <darabos.daniel@gmail.com>

Closes #6621 from darabos/patch-2 and squashes the following commits:

dfebdec [Daniel Darabos] Fix comment.

(cherry picked from commit 10ba188087)
Signed-off-by: Sean Owen <sowen@cloudera.com>
2015-06-04 13:48:56 +02:00
Andrew Or 84da653192 [BUILD] Fix Maven build for Kinesis
A necessary dependency that is transitively referenced is not
provided, causing compilation failures in builds that provide
the kinesis-asl profile.
2015-06-03 20:47:53 -07:00
Andrew Or bfe74b34a6 [SPARK-7558] Demarcate tests in unit-tests.log (1.4)
This includes the following commits:

original: 9eb222c
hotfix1: 8c99793
hotfix2: a4f2412
scalastyle check: 609c492

---
Original patch #6441
Branch-1.3 patch #6602

Author: Andrew Or <andrew@databricks.com>

Closes #6598 from andrewor14/demarcate-tests-1.4 and squashes the following commits:

4c3c566 [Andrew Or] Merge branch 'branch-1.4' of github.com:apache/spark into demarcate-tests-1.4
e217b78 [Andrew Or] [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike
46d4361 [Andrew Or] Various whitespace changes (minor)
3d9bf04 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
eaa520e [Andrew Or] Fix tests?
b4d93de [Andrew Or] Fix tests
634a777 [Andrew Or] Fix log message
a932e8d [Andrew Or] Fix manual things that cannot be covered through automation
8bc355d [Andrew Or] Add core tests as dependencies in all modules
75d361f [Andrew Or] Introduce base abstract class for all test suites
2015-06-03 20:46:44 -07:00
Andrew Or 584a2ba21c [BUILD] Use right branch when checking against Hive (1.4)
For branch-1.4.

This is identical to #6629 and is strictly not necessary. I'm opening this as a PR since it changes Jenkins test behavior and I want to test it out here.

Author: Andrew Or <andrew@databricks.com>

Closes #6630 from andrewor14/build-check-hive-1.4 and squashes the following commits:

186ec65 [Andrew Or] [BUILD] Use right branch when checking against Hive
2015-06-03 18:09:14 -07:00
Andrew Or 96f71b105a [BUILD] Increase Jenkins test timeout
Currently hive tests alone take 40m. The right thing to do is
to reduce the test time. However, that is a bigger project and
we currently have PRs blocking on tests not timing out.
2015-06-03 17:45:07 -07:00
Shivaram Venkataraman c2c129073f [SPARK-8084] [SPARKR] Make SparkR scripts fail on error
cc shaneknapp pwendell JoshRosen

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6623 from shivaram/SPARK-8084 and squashes the following commits:

0ec5b26 [Shivaram Venkataraman] Make SparkR scripts fail on error

(cherry picked from commit 0576c3c4ff)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-03 17:02:29 -07:00
Ryan Williams 16748694b8 [SPARK-8088] don't attempt to lower number of executors by 0
Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #6624 from ryan-williams/execs and squashes the following commits:

b6f71d4 [Ryan Williams] don't attempt to lower number of executors by 0

(cherry picked from commit 51898b5158)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-06-03 16:54:52 -07:00
Andrew Or 0bc9a3ec42 [HOTFIX] [TYPO] Fix typo in #6546 2015-06-03 16:04:35 -07:00
Andrew Or d0be9508f5 [HOTFIX] Unbreak build from backporting #6546
This is caused by 7e46ea0228.
2015-06-03 15:25:35 -07:00
Xiangrui Meng b2a22a651f [SPARK-8051] [MLLIB] make StringIndexerModel silent if input column does not exist
This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6595 from mengxr/SPARK-8051 and squashes the following commits:

b6a36b9 [Xiangrui Meng] add doc
f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051
8ee7c7e [Xiangrui Meng] use SparkFunSuite
e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist

(cherry picked from commit 26c9d7a0f9)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-06-03 15:16:36 -07:00
Shivaram Venkataraman ca21fff7da [SPARK-3674] [EC2] Clear SPARK_WORKER_INSTANCES when using YARN
cc andrewor14

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6424 from shivaram/spark-worker-instances-yarn-ec2 and squashes the following commits:

db244ae [Shivaram Venkataraman] Make Python Lint happy
0593d1b [Shivaram Venkataraman] Clear SPARK_WORKER_INSTANCES when using YARN

(cherry picked from commit d3e026f879)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-06-03 15:14:44 -07:00
zsxwing 7e46ea0228 [SPARK-7989] [CORE] [TESTS] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite
The flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite will fail if there are not enough executors up before running the jobs.

This PR adds `JobProgressListener.waitUntilExecutorsUp`. The tests for the cluster mode can use it to wait until the expected executors are up.

Author: zsxwing <zsxwing@gmail.com>

Closes #6546 from zsxwing/SPARK-7989 and squashes the following commits:

5560e09 [zsxwing] Fix a typo
3b69840 [zsxwing] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite

(cherry picked from commit f27134782e)
Signed-off-by: Andrew Or <andrew@databricks.com>

Conflicts:
	core/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala
	core/src/test/scala/org/apache/spark/scheduler/SparkListenerWithClusterSuite.scala
2015-06-03 15:05:49 -07:00
zsxwing 306837e4e3 [SPARK-8001] [CORE] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout
Some places forget to call `assert` to check the return value of `AsynchronousListenerBus.waitUntilEmpty`. Instead of adding `assert` in these places, I think it's better to make `AsynchronousListenerBus.waitUntilEmpty` throw `TimeoutException`.

Author: zsxwing <zsxwing@gmail.com>

Closes #6550 from zsxwing/SPARK-8001 and squashes the following commits:

607674a [zsxwing] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout

(cherry picked from commit 1d8669f15c)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-06-03 15:03:15 -07:00
Timothy Chen 59399a8f0c [SPARK-8083] [MESOS] Use the correct base path in mesos driver page.
Author: Timothy Chen <tnachen@gmail.com>

Closes #6615 from tnachen/mesos_driver_path and squashes the following commits:

4f47b7c [Timothy Chen] Use the correct base path in mesos driver page.

(cherry picked from commit bfbf12b349)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-06-03 14:58:33 -07:00
Andrew Or 31e0ae9e1d [MINOR] [UI] Improve confusing message on log page
It's good practice to check if the input path is in the directory
we expect to avoid potentially confusing error messages.
2015-06-03 14:48:15 -07:00
Joseph K. Bradley bfab61f39c [SPARK-8054] [MLLIB] Added several Java-friendly APIs + unit tests
Java-friendly APIs added:
* GaussianMixture.run()
* GaussianMixtureModel.predict()
* DistributedLDAModel.javaTopicDistributions()
* StreamingKMeans: trainOn, predictOn, predictOnValues
* Statistics.corr
* params
  * added doc to w() since Java docs do not inherit doc
  * removed non-Java-friendly w() from StringArrayParam and DoubleArrayParam
  * made DoubleArrayParam Java-friendly w() actually Java-friendly

I generated the doc and verified all changes.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6562 from jkbradley/java-api-1.4 and squashes the following commits:

c16821b [Joseph K. Bradley] Small fixes based on code review.
d955581 [Joseph K. Bradley] unit test fixes
29b6b0d [Joseph K. Bradley] small fixes
fe6dcfe [Joseph K. Bradley] Added several Java-friendly APIs + unit tests: NaiveBayes, GaussianMixture, LDA, StreamingKMeans, Statistics.corr, params

(cherry picked from commit 20a26b595c)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-06-03 14:34:31 -07:00
Reynold Xin 1f90a06bda [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.
Author: Reynold Xin <rxin@databricks.com>

Closes #6608 from rxin/parquet-analysis and squashes the following commits:

b5dc8e2 [Reynold Xin] Code review feedback.
5617cf6 [Reynold Xin] [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.

(cherry picked from commit 939e4f3d8d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-03 13:58:15 -07:00
Sun Rui f67a27d026 [SPARK-8063] [SPARKR] Spark master URL conflict between MASTER env variable and --master command line option.
Author: Sun Rui <rui.sun@intel.com>

Closes #6605 from sun-rui/SPARK-8063 and squashes the following commits:

51ca48b [Sun Rui] [SPARK-8063][SPARKR] Spark master URL conflict between MASTER env variable and --master command line option.

(cherry picked from commit 708c63bbbe)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
2015-06-03 11:57:00 -07:00
animesh 0a1dad6cd4 [SPARK-7980] [SQL] Support SQLContext.range(end)
1. range() overloaded in SQLContext.scala
2. range() modified in python sql context.py
3. Tests added accordingly in DataFrameSuite.scala and python sql tests.py

Author: animesh <animesh@apache.spark>

Closes #6609 from animeshbaranawal/SPARK-7980 and squashes the following commits:

935899c [animesh] SPARK-7980:python+scala changes

(cherry picked from commit d053a31be9)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-03 11:28:38 -07:00
Yin Huai 54a4ea4078 [SPARK-7973] [SQL] Increase the timeout of two CliSuite tests.
https://issues.apache.org/jira/browse/SPARK-7973

Author: Yin Huai <yhuai@databricks.com>

Closes #6525 from yhuai/SPARK-7973 and squashes the following commits:

763b821 [Yin Huai] Also change the timeout of "Single command with -e" to 2 minutes.
e598a08 [Yin Huai] Increase the timeout to 3 minutes.

(cherry picked from commit f1646e1023)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-06-03 09:26:30 -07:00
Reynold Xin ee7f365bd0 [SPARK-8060] Improve DataFrame Python test coverage and documentation.
Author: Reynold Xin <rxin@databricks.com>

Closes #6601 from rxin/python-read-write-test-and-doc and squashes the following commits:

baa8ad5 [Reynold Xin] Code review feedback.
f081d47 [Reynold Xin] More documentation updates.
c9902fa [Reynold Xin] [SPARK-8060] Improve DataFrame Python reader/writer interface doc and testing.

(cherry picked from commit ce320cb2db)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-03 00:23:42 -07:00
MechCoder bd57af3879 [SPARK-8032] [PYSPARK] Make version checking for NumPy in MLlib more robust
The current checking does version `1.x' is less than `1.4' this will fail if x has greater than 1 digit, since x > 4, however `1.x` < `1.4`

It fails in my system since I have version `1.10` :P

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #6579 from MechCoder/np_ver and squashes the following commits:

15430f8 [MechCoder] fix syntax error
893fb7e [MechCoder] remove equal to
e35f0d4 [MechCoder] minor
e89376c [MechCoder] Better checking
22703dd [MechCoder] [SPARK-8032] Make version checking for NumPy in MLlib more robust

(cherry picked from commit 452eb82dd7)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-06-02 23:24:57 -07:00
Yuhao Yang 33edb2b79e [SPARK-8043] [MLLIB] [DOC] update NaiveBayes and SVM examples in doc
jira: https://issues.apache.org/jira/browse/SPARK-8043

I found some issues during testing the save/load examples in markdown Documents, as a part of 1.4 QA plan

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #6584 from hhbyyh/naiveDocExample and squashes the following commits:

a01a206 [Yuhao Yang] fix for Gaussian mixture
2fb8b96 [Yuhao Yang] update NaiveBayes and SVM examples in doc

(cherry picked from commit 43adbd5611)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-06-02 23:16:06 -07:00
Joseph K. Bradley 88399c34b2 [SPARK-8053] [MLLIB] renamed scalingVector to scalingVec
I searched the Spark codebase for all occurrences of "scalingVector"

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6596 from jkbradley/scalingVec-rename and squashes the following commits:

d3812f8 [Joseph K. Bradley] renamed scalingVector to scalingVec

(cherry picked from commit 07c16cb5ba)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-06-02 22:57:12 -07:00
DB Tsai 6391be872d [SPARK-7547] [ML] Scala Example code for ElasticNet
This is scala example code for both linear and logistic regression. Python and Java versions are to be added.

Author: DB Tsai <dbt@netflix.com>

Closes #6576 from dbtsai/elasticNetExample and squashes the following commits:

e7ca406 [DB Tsai] fix test
6bb6d77 [DB Tsai] fix suite and remove duplicated setMaxIter
136e0dd [DB Tsai] address feedback
1ec29d4 [DB Tsai] fix style
9462f5f [DB Tsai] add example

(cherry picked from commit a86b3e9b9b)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-06-02 19:12:19 -07:00
Ram Sriharsha 6a3e32ad1e [SPARK-7387] [ML] [DOC] CrossValidator example code in Python
Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #6358 from harsha2010/SPARK-7387 and squashes the following commits:

63efda2 [Ram Sriharsha] more examples for classifier to distinguish mapreduce from spark properly
aeb6bb6 [Ram Sriharsha] Python Style Fix
54a500c [Ram Sriharsha] Merge branch 'master' into SPARK-7387
615e91c [Ram Sriharsha] cleanup
204c4e3 [Ram Sriharsha] Merge branch 'master' into SPARK-7387
7246d35 [Ram Sriharsha] [SPARK-7387][ml][doc] CrossValidator example code in Python

(cherry picked from commit c3f4c32571)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-06-02 18:53:24 -07:00
Patrick Wendell ab713af564 Preparing development version 1.4.0-SNAPSHOT 2015-06-02 18:06:41 -07:00
Patrick Wendell 22596c534a Preparing Spark release v1.4.0-rc4 2015-06-02 18:06:35 -07:00
Cheng Lian 0d83720990 [SQL] [TEST] [MINOR] Follow-up of PR #6493, use Guava API to ensure Java 6 friendliness
This is a follow-up of PR #6493, which has been reverted in branch-1.4 because it uses Java 7 specific APIs and breaks Java 6 build. This PR replaces those APIs with equivalent Guava ones to ensure Java 6 friendliness.

cc andrewor14 pwendell, this should also be back ported to branch-1.4.

Author: Cheng Lian <lian@databricks.com>

Closes #6547 from liancheng/override-log4j and squashes the following commits:

c900cfd [Cheng Lian] Addresses Shixiong's comment
72da795 [Cheng Lian] Uses Guava API to ensure Java 6 friendliness

(cherry picked from commit 5cd6a63d96)
Signed-off-by: Andrew Or <andrew@databricks.com>
2015-06-02 17:07:20 -07:00
Cheng Lian daeaa0c5ac [SQL] [TEST] [MINOR] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior
The `HiveThriftServer2Test` relies on proper logging behavior to assert whether the Thrift server daemon process is started successfully. However, some other jar files listed in the classpath may potentially contain an unexpected Log4J configuration file which overrides the logging behavior.

This PR writes a temporary `log4j.properties` and prepend it to driver classpath before starting the testing Thrift server process to ensure proper logging behavior.

cc andrewor14 yhuai

Author: Cheng Lian <lian@databricks.com>

Closes #6493 from liancheng/override-log4j and squashes the following commits:

c489e0e [Cheng Lian] Fixes minor Scala styling issue
b46ef0d [Cheng Lian] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior
2015-06-02 17:06:24 -07:00
Patrick Wendell e3c35b217c Preparing development version 1.4.0-SNAPSHOT 2015-06-02 17:01:15 -07:00
Patrick Wendell a14fad11ef Preparing Spark release v1.4.0-rc4 2015-06-02 17:01:10 -07:00
Xiangrui Meng 97d4cd0740 [SPARK-8049] [MLLIB] drop tmp col from OneVsRest output
The temporary column should be dropped after we get the prediction column. harsha2010

Author: Xiangrui Meng <meng@databricks.com>

Closes #6592 from mengxr/SPARK-8049 and squashes the following commits:

1d89107 [Xiangrui Meng] use SparkFunSuite
6ee70de [Xiangrui Meng] drop tmp col from OneVsRest output

(cherry picked from commit 89f21f66b5)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-06-02 16:53:26 -07:00
Patrick Wendell 92ccc5ba39 Preparing development version 1.4.0-SNAPSHOT 2015-06-02 14:02:19 -07:00
Patrick Wendell d630f4d697 Preparing Spark release v1.4.0-rc4 2015-06-02 14:02:14 -07:00
Davies Liu 6b0f61563d [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
Thanks ogirardot, closes #6580

cc rxin JoshRosen

Author: Davies Liu <davies@databricks.com>

Closes #6590 from davies/when and squashes the following commits:

c0f2069 [Davies Liu] fix Column.when() and otherwise()

(cherry picked from commit 605ddbb27c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-06-02 13:38:14 -07:00
Cheng Lian cbaf595447 [SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append
The current code references the schema of the DataFrame to be written before checking save mode. This triggers expensive metadata discovery prematurely. For save mode other than `Append`, this metadata discovery is useless since we either ignore the result (for `Ignore` and `ErrorIfExists`) or delete existing files (for `Overwrite`) later.

This PR fixes this issue by deferring metadata discovery after save mode checking.

Author: Cheng Lian <lian@databricks.com>

Closes #6583 from liancheng/spark-8014 and squashes the following commits:

1aafabd [Cheng Lian] Updates comments
088abaa [Cheng Lian] Avoids schema merging and partition discovery when data schema and partition schema are defined
8fbd93f [Cheng Lian] Fixes SPARK-8014

(cherry picked from commit 686a45f0b9)
Signed-off-by: Yin Huai <yhuai@databricks.com>
2015-06-02 13:32:34 -07:00
Mike Dusenberry 815e056542 [SPARK-7985] [ML] [MLlib] [Docs] Remove "fittingParamMap" references. Updating ML Doc "Estimator, Transformer, and Param" examples.
Updating ML Doc's *"Estimator, Transformer, and Param"* example to use `model.extractParamMap` instead of `model.fittingParamMap`, which no longer exists.

mengxr, I believe this addresses (part of) the *update documentation* TODO list item from [PR 5820](https://github.com/apache/spark/pull/5820).

Author: Mike Dusenberry <dusenberrymw@gmail.com>

Closes #6514 from dusenberrymw/Fix_ML_Doc_Estimator_Transformer_Param_Example and squashes the following commits:

6366e1f [Mike Dusenberry] Updating instances of model.extractParamMap to model.parent.extractParamMap, since the Params of the parent Estimator could possibly differ from thos of the Model.
d850e0e [Mike Dusenberry] Removing all references to "fittingParamMap" throughout Spark, since it has been removed.
0480304 [Mike Dusenberry] Updating the ML Doc "Estimator, Transformer, and Param" Java example to use model.extractParamMap() instead of model.fittingParamMap(), which no longer exists.
7d34939 [Mike Dusenberry] Updating ML Doc "Estimator, Transformer, and Param" example to use model.extractParamMap instead of model.fittingParamMap, which no longer exists.

(cherry picked from commit ad06727fe9)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-06-02 12:38:33 -07:00
Josh Rosen 139c8240fd [MINOR] Enable PySpark SQL readerwriter and window tests
PySpark SQL's `readerwriter` and `window` doctests weren't being run by our test runner script; this patch re-enables them.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6542 from JoshRosen/enable-more-pyspark-sql-tests and squashes the following commits:

9f46ce4 [Josh Rosen] Enable PySpark SQL readerwriter and window tests.
2015-06-02 12:02:07 -07:00