ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Luca Martinetti	94f65bccee	[SPARK-7747] [SQL] [DOCS] spark.sql.planner.externalSort Add documentation for spark.sql.planner.externalSort Author: Luca Martinetti <luca@luca.io> Closes #6272 from lucamartinetti/docs-externalsort and squashes the following commits: 985661b [Luca Martinetti] [SPARK-7747] [SQL] [DOCS] Add documentation for spark.sql.planner.externalSort (cherry picked from commit `4060526cd3`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-06-05 13:41:52 -07:00
zsxwing	200c980a13	[SPARK-8112] [STREAMING] Fix the negative event count issue Author: zsxwing <zsxwing@gmail.com> Closes #6659 from zsxwing/SPARK-8112 and squashes the following commits: a5d7da6 [zsxwing] Address comments d255b6e [zsxwing] Fix the negative event count issue (cherry picked from commit `4f16d3fe2e`) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>	2015-06-05 12:46:15 -07:00
Andrew Or	429c658519	Revert "[MINOR] [BUILD] Use custom temp directory during build." This reverts commit `9b3e4c1871`.	2015-06-05 10:54:06 -07:00
Shivaram Venkataraman	3e3151e755	[SPARK-8085] [SPARKR] Support user-specified schema in read.df cc davies sun-rui Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6620 from shivaram/sparkr-read-schema and squashes the following commits: 16a6726 [Shivaram Venkataraman] Fix loadDF to pass schema Also add a unit test a229877 [Shivaram Venkataraman] Use wrapper function to DataFrameReader ee70ba8 [Shivaram Venkataraman] Support user-specified schema in read.df (cherry picked from commit `12f5eaeee1`) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>	2015-06-05 10:19:15 -07:00
Akhil Das	0ef2e9d351	[STREAMING] Update streaming-kafka-integration.md Fixed the broken links (Examples) in the documentation. Author: Akhil Das <akhld@darktech.ca> Closes #6666 from akhld/patch-2 and squashes the following commits: 2228b83 [Akhil Das] Update streaming-kafka-integration.md (cherry picked from commit `019dc9f558`) Signed-off-by: Sean Owen <sowen@cloudera.com>	2015-06-05 14:24:06 +02:00
Marcelo Vanzin	9b3e4c1871	[MINOR] [BUILD] Use custom temp directory during build. Even with all the efforts to cleanup the temp directories created by unit tests, Spark leaves a lot of garbage in /tmp after a test run. This change overrides java.io.tmpdir to place those files under the build directory instead. After an sbt full unit test run, I was left with > 400 MB of temp files. Since they're now under the build dir, it's much easier to clean them up. Also make a slight change to a unit test to make it not pollute the source directory with test data. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #6653 from vanzin/unit-test-tmp and squashes the following commits: 31e2dd5 [Marcelo Vanzin] Fix tests that depend on each other. aa92944 [Marcelo Vanzin] [minor] [build] Use custom temp directory during build. (cherry picked from commit `b16b5434ff`) Signed-off-by: Sean Owen <sowen@cloudera.com>	2015-06-05 14:12:05 +02:00
Sean Owen	90cf686386	[MINOR] remove unused interpolation var in log message Completely trivial but I noticed this wrinkle in a log message today; `$sender` doesn't refer to anything and isn't interpolated here. Author: Sean Owen <sowen@cloudera.com> Closes #6650 from srowen/Interpolation and squashes the following commits: 518687a [Sean Owen] Actually interpolate log string 7edb866 [Sean Owen] Trivial: remove unused interpolation var in log message (cherry picked from commit `3a5c4da473`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-05 00:32:52 -07:00
Ted Blackman	f02af7c8f7	[SPARK-8116][PYSPARK] Allow sc.range() to take a single argument. Author: Ted Blackman <ted.blackman@gmail.com> Closes #6656 from belisarius222/branch-1.4 and squashes the following commits: 747cbc2 [Ted Blackman] [SPARK-8116][PYSPARK] Allow sc.range() to take a single argument.	2015-06-04 22:21:11 -07:00
Carson Wang	3ba6fc515d	[SPARK-8098] [WEBUI] Show correct length of bytes on log page The log page should only show desired length of bytes. Currently it shows bytes from the startIndex to the end of the file. The "Next" button on the page is always disabled. Author: Carson Wang <carson.wang@intel.com> Closes #6640 from carsonwang/logpage and squashes the following commits: 58cb3fd [Carson Wang] Show correct length of bytes on log page (cherry picked from commit `63bc0c4430`) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>	2015-06-04 16:25:05 -07:00
Shivaram Venkataraman	0b71b851de	[SPARK-8027] [SPARKR] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available Related to discussion in #6567 cc pwendell srowen -- Let me know if this looks better Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6593 from shivaram/sparkr-pom-cleanup and squashes the following commits: b282241 [Shivaram Venkataraman] Remove sparkr-docs from release script as well 8f100a5 [Shivaram Venkataraman] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available (cherry picked from commit `3dc005282a`) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>	2015-06-04 12:52:45 -07:00
Mike Dusenberry	81ff7a9012	[SPARK-7969] [SQL] Added a DataFrame.drop function that accepts a Column reference. Added a `DataFrame.drop` function that accepts a `Column` reference rather than a `String`, and added associated unit tests. Basically iterates through the `DataFrame` to find a column with an expression that is equivalent to that of the `Column` argument supplied to the function. Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6585 from dusenberrymw/SPARK-7969_Drop_method_on_Dataframes_should_handle_Column and squashes the following commits: 514727a [Mike Dusenberry] Updating the @since tag of the drop(Column) function doc to reflect version 1.4.1 instead of 1.4.0. 2f1bb4e [Mike Dusenberry] Adding an additional assert statement to the 'drop column after join' unit test in order to make sure the correct column was indeed left over. 6bf7c0e [Mike Dusenberry] Minor code formatting change. e583888 [Mike Dusenberry] Adding more Python doctests for the df.drop with column reference function to test joined datasets that have columns with the same name. 5f74401 [Mike Dusenberry] Updating DataFrame.drop with column reference function to use logicalPlan.output to prevent ambiguities resulting from columns with the same name. Also added associated unit tests for joined datasets with duplicate column names. 4b8bbe8 [Mike Dusenberry] Adding Python support for Dataframe.drop with a Column reference. 986129c [Mike Dusenberry] Added a DataFrame.drop function that accepts a Column reference rather than a String, and added associated unit tests. Basically iterates through the DataFrame to find a column with an expression that is equivalent to one supplied to the function. (cherry picked from commit `df7da07a86`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-04 11:30:25 -07:00
Daniel Darabos	daf9451a4d	Fix maxTaskFailures comment If maxTaskFailures is 1, the task set is aborted after 1 task failure. Other documentation and the code supports this reading, I think it's just this comment that was off. It's easy to make this mistake — can you please double-check if I'm correct? Thanks! Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #6621 from darabos/patch-2 and squashes the following commits: dfebdec [Daniel Darabos] Fix comment. (cherry picked from commit `10ba188087`) Signed-off-by: Sean Owen <sowen@cloudera.com>	2015-06-04 13:48:56 +02:00
Andrew Or	84da653192	[BUILD] Fix Maven build for Kinesis A necessary dependency that is transitively referenced is not provided, causing compilation failures in builds that provide the kinesis-asl profile.	2015-06-03 20:47:53 -07:00
Andrew Or	bfe74b34a6	[SPARK-7558] Demarcate tests in unit-tests.log (1.4) This includes the following commits: original: `9eb222c` hotfix1: `8c99793` hotfix2: `a4f2412` scalastyle check: `609c492` --- Original patch #6441 Branch-1.3 patch #6602 Author: Andrew Or <andrew@databricks.com> Closes #6598 from andrewor14/demarcate-tests-1.4 and squashes the following commits: 4c3c566 [Andrew Or] Merge branch 'branch-1.4' of github.com:apache/spark into demarcate-tests-1.4 e217b78 [Andrew Or] [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike 46d4361 [Andrew Or] Various whitespace changes (minor) 3d9bf04 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite eaa520e [Andrew Or] Fix tests? b4d93de [Andrew Or] Fix tests 634a777 [Andrew Or] Fix log message a932e8d [Andrew Or] Fix manual things that cannot be covered through automation 8bc355d [Andrew Or] Add core tests as dependencies in all modules 75d361f [Andrew Or] Introduce base abstract class for all test suites	2015-06-03 20:46:44 -07:00
Andrew Or	584a2ba21c	[BUILD] Use right branch when checking against Hive (1.4) For branch-1.4. This is identical to #6629 and is strictly not necessary. I'm opening this as a PR since it changes Jenkins test behavior and I want to test it out here. Author: Andrew Or <andrew@databricks.com> Closes #6630 from andrewor14/build-check-hive-1.4 and squashes the following commits: 186ec65 [Andrew Or] [BUILD] Use right branch when checking against Hive	2015-06-03 18:09:14 -07:00
Andrew Or	96f71b105a	[BUILD] Increase Jenkins test timeout Currently hive tests alone take 40m. The right thing to do is to reduce the test time. However, that is a bigger project and we currently have PRs blocking on tests not timing out.	2015-06-03 17:45:07 -07:00
Shivaram Venkataraman	c2c129073f	[SPARK-8084] [SPARKR] Make SparkR scripts fail on error cc shaneknapp pwendell JoshRosen Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6623 from shivaram/SPARK-8084 and squashes the following commits: 0ec5b26 [Shivaram Venkataraman] Make SparkR scripts fail on error (cherry picked from commit `0576c3c4ff`) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>	2015-06-03 17:02:29 -07:00
Ryan Williams	16748694b8	[SPARK-8088] don't attempt to lower number of executors by 0 Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #6624 from ryan-williams/execs and squashes the following commits: b6f71d4 [Ryan Williams] don't attempt to lower number of executors by 0 (cherry picked from commit `51898b5158`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-03 16:54:52 -07:00
Andrew Or	0bc9a3ec42	[HOTFIX] [TYPO] Fix typo in #6546	2015-06-03 16:04:35 -07:00
Andrew Or	d0be9508f5	[HOTFIX] Unbreak build from backporting #6546 This is caused by `7e46ea0228`.	2015-06-03 15:25:35 -07:00
Xiangrui Meng	b2a22a651f	[SPARK-8051] [MLLIB] make StringIndexerModel silent if input column does not exist This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6595 from mengxr/SPARK-8051 and squashes the following commits: b6a36b9 [Xiangrui Meng] add doc f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051 8ee7c7e [Xiangrui Meng] use SparkFunSuite e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist (cherry picked from commit `26c9d7a0f9`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-03 15:16:36 -07:00
Shivaram Venkataraman	ca21fff7da	[SPARK-3674] [EC2] Clear SPARK_WORKER_INSTANCES when using YARN cc andrewor14 Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6424 from shivaram/spark-worker-instances-yarn-ec2 and squashes the following commits: db244ae [Shivaram Venkataraman] Make Python Lint happy 0593d1b [Shivaram Venkataraman] Clear SPARK_WORKER_INSTANCES when using YARN (cherry picked from commit `d3e026f879`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-03 15:14:44 -07:00
zsxwing	7e46ea0228	[SPARK-7989] [CORE] [TESTS] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite The flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite will fail if there are not enough executors up before running the jobs. This PR adds `JobProgressListener.waitUntilExecutorsUp`. The tests for the cluster mode can use it to wait until the expected executors are up. Author: zsxwing <zsxwing@gmail.com> Closes #6546 from zsxwing/SPARK-7989 and squashes the following commits: 5560e09 [zsxwing] Fix a typo 3b69840 [zsxwing] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite (cherry picked from commit `f27134782e`) Signed-off-by: Andrew Or <andrew@databricks.com> Conflicts: core/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala core/src/test/scala/org/apache/spark/scheduler/SparkListenerWithClusterSuite.scala	2015-06-03 15:05:49 -07:00
zsxwing	306837e4e3	[SPARK-8001] [CORE] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout Some places forget to call `assert` to check the return value of `AsynchronousListenerBus.waitUntilEmpty`. Instead of adding `assert` in these places, I think it's better to make `AsynchronousListenerBus.waitUntilEmpty` throw `TimeoutException`. Author: zsxwing <zsxwing@gmail.com> Closes #6550 from zsxwing/SPARK-8001 and squashes the following commits: 607674a [zsxwing] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout (cherry picked from commit `1d8669f15c`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-03 15:03:15 -07:00
Timothy Chen	59399a8f0c	[SPARK-8083] [MESOS] Use the correct base path in mesos driver page. Author: Timothy Chen <tnachen@gmail.com> Closes #6615 from tnachen/mesos_driver_path and squashes the following commits: 4f47b7c [Timothy Chen] Use the correct base path in mesos driver page. (cherry picked from commit `bfbf12b349`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-03 14:58:33 -07:00
Andrew Or	31e0ae9e1d	[MINOR] [UI] Improve confusing message on log page It's good practice to check if the input path is in the directory we expect to avoid potentially confusing error messages.	2015-06-03 14:48:15 -07:00
Joseph K. Bradley	bfab61f39c	[SPARK-8054] [MLLIB] Added several Java-friendly APIs + unit tests Java-friendly APIs added: * GaussianMixture.run() * GaussianMixtureModel.predict() * DistributedLDAModel.javaTopicDistributions() * StreamingKMeans: trainOn, predictOn, predictOnValues * Statistics.corr * params * added doc to w() since Java docs do not inherit doc * removed non-Java-friendly w() from StringArrayParam and DoubleArrayParam * made DoubleArrayParam Java-friendly w() actually Java-friendly I generated the doc and verified all changes. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #6562 from jkbradley/java-api-1.4 and squashes the following commits: c16821b [Joseph K. Bradley] Small fixes based on code review. d955581 [Joseph K. Bradley] unit test fixes 29b6b0d [Joseph K. Bradley] small fixes fe6dcfe [Joseph K. Bradley] Added several Java-friendly APIs + unit tests: NaiveBayes, GaussianMixture, LDA, StreamingKMeans, Statistics.corr, params (cherry picked from commit `20a26b595c`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-03 14:34:31 -07:00
Reynold Xin	1f90a06bda	[SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures. Author: Reynold Xin <rxin@databricks.com> Closes #6608 from rxin/parquet-analysis and squashes the following commits: b5dc8e2 [Reynold Xin] Code review feedback. 5617cf6 [Reynold Xin] [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures. (cherry picked from commit `939e4f3d8d`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-03 13:58:15 -07:00
Sun Rui	f67a27d026	[SPARK-8063] [SPARKR] Spark master URL conflict between MASTER env variable and --master command line option. Author: Sun Rui <rui.sun@intel.com> Closes #6605 from sun-rui/SPARK-8063 and squashes the following commits: 51ca48b [Sun Rui] [SPARK-8063][SPARKR] Spark master URL conflict between MASTER env variable and --master command line option. (cherry picked from commit `708c63bbbe`) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>	2015-06-03 11:57:00 -07:00
animesh	0a1dad6cd4	[SPARK-7980] [SQL] Support SQLContext.range(end) 1. range() overloaded in SQLContext.scala 2. range() modified in python sql context.py 3. Tests added accordingly in DataFrameSuite.scala and python sql tests.py Author: animesh <animesh@apache.spark> Closes #6609 from animeshbaranawal/SPARK-7980 and squashes the following commits: 935899c [animesh] SPARK-7980:python+scala changes (cherry picked from commit `d053a31be9`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-03 11:28:38 -07:00
Yin Huai	54a4ea4078	[SPARK-7973] [SQL] Increase the timeout of two CliSuite tests. https://issues.apache.org/jira/browse/SPARK-7973 Author: Yin Huai <yhuai@databricks.com> Closes #6525 from yhuai/SPARK-7973 and squashes the following commits: 763b821 [Yin Huai] Also change the timeout of "Single command with -e" to 2 minutes. e598a08 [Yin Huai] Increase the timeout to 3 minutes. (cherry picked from commit `f1646e1023`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-06-03 09:26:30 -07:00
Reynold Xin	ee7f365bd0	[SPARK-8060] Improve DataFrame Python test coverage and documentation. Author: Reynold Xin <rxin@databricks.com> Closes #6601 from rxin/python-read-write-test-and-doc and squashes the following commits: baa8ad5 [Reynold Xin] Code review feedback. f081d47 [Reynold Xin] More documentation updates. c9902fa [Reynold Xin] [SPARK-8060] Improve DataFrame Python reader/writer interface doc and testing. (cherry picked from commit `ce320cb2db`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-03 00:23:42 -07:00
MechCoder	bd57af3879	[SPARK-8032] [PYSPARK] Make version checking for NumPy in MLlib more robust The current checking does version `1.x' is less than `1.4' this will fail if x has greater than 1 digit, since x > 4, however `1.x` < `1.4` It fails in my system since I have version `1.10` :P Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #6579 from MechCoder/np_ver and squashes the following commits: 15430f8 [MechCoder] fix syntax error 893fb7e [MechCoder] remove equal to e35f0d4 [MechCoder] minor e89376c [MechCoder] Better checking 22703dd [MechCoder] [SPARK-8032] Make version checking for NumPy in MLlib more robust (cherry picked from commit `452eb82dd7`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-02 23:24:57 -07:00
Yuhao Yang	33edb2b79e	[SPARK-8043] [MLLIB] [DOC] update NaiveBayes and SVM examples in doc jira: https://issues.apache.org/jira/browse/SPARK-8043 I found some issues during testing the save/load examples in markdown Documents, as a part of 1.4 QA plan Author: Yuhao Yang <hhbyyh@gmail.com> Closes #6584 from hhbyyh/naiveDocExample and squashes the following commits: a01a206 [Yuhao Yang] fix for Gaussian mixture 2fb8b96 [Yuhao Yang] update NaiveBayes and SVM examples in doc (cherry picked from commit `43adbd5611`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-02 23:16:06 -07:00
Joseph K. Bradley	88399c34b2	[SPARK-8053] [MLLIB] renamed scalingVector to scalingVec I searched the Spark codebase for all occurrences of "scalingVector" CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #6596 from jkbradley/scalingVec-rename and squashes the following commits: d3812f8 [Joseph K. Bradley] renamed scalingVector to scalingVec (cherry picked from commit `07c16cb5ba`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-02 22:57:12 -07:00
DB Tsai	6391be872d	[SPARK-7547] [ML] Scala Example code for ElasticNet This is scala example code for both linear and logistic regression. Python and Java versions are to be added. Author: DB Tsai <dbt@netflix.com> Closes #6576 from dbtsai/elasticNetExample and squashes the following commits: e7ca406 [DB Tsai] fix test 6bb6d77 [DB Tsai] fix suite and remove duplicated setMaxIter 136e0dd [DB Tsai] address feedback 1ec29d4 [DB Tsai] fix style 9462f5f [DB Tsai] add example (cherry picked from commit `a86b3e9b9b`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-02 19:12:19 -07:00
Ram Sriharsha	6a3e32ad1e	[SPARK-7387] [ML] [DOC] CrossValidator example code in Python Author: Ram Sriharsha <rsriharsha@hw11853.local> Closes #6358 from harsha2010/SPARK-7387 and squashes the following commits: 63efda2 [Ram Sriharsha] more examples for classifier to distinguish mapreduce from spark properly aeb6bb6 [Ram Sriharsha] Python Style Fix 54a500c [Ram Sriharsha] Merge branch 'master' into SPARK-7387 615e91c [Ram Sriharsha] cleanup 204c4e3 [Ram Sriharsha] Merge branch 'master' into SPARK-7387 7246d35 [Ram Sriharsha] [SPARK-7387][ml][doc] CrossValidator example code in Python (cherry picked from commit `c3f4c32571`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-02 18:53:24 -07:00
Patrick Wendell	ab713af564	Preparing development version 1.4.0-SNAPSHOT	2015-06-02 18:06:41 -07:00
Patrick Wendell	22596c534a	Preparing Spark release v1.4.0-rc4	2015-06-02 18:06:35 -07:00
Cheng Lian	0d83720990	[SQL] [TEST] [MINOR] Follow-up of PR #6493 , use Guava API to ensure Java 6 friendliness This is a follow-up of PR #6493, which has been reverted in branch-1.4 because it uses Java 7 specific APIs and breaks Java 6 build. This PR replaces those APIs with equivalent Guava ones to ensure Java 6 friendliness. cc andrewor14 pwendell, this should also be back ported to branch-1.4. Author: Cheng Lian <lian@databricks.com> Closes #6547 from liancheng/override-log4j and squashes the following commits: c900cfd [Cheng Lian] Addresses Shixiong's comment 72da795 [Cheng Lian] Uses Guava API to ensure Java 6 friendliness (cherry picked from commit `5cd6a63d96`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-02 17:07:20 -07:00
Cheng Lian	daeaa0c5ac	[SQL] [TEST] [MINOR] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior The `HiveThriftServer2Test` relies on proper logging behavior to assert whether the Thrift server daemon process is started successfully. However, some other jar files listed in the classpath may potentially contain an unexpected Log4J configuration file which overrides the logging behavior. This PR writes a temporary `log4j.properties` and prepend it to driver classpath before starting the testing Thrift server process to ensure proper logging behavior. cc andrewor14 yhuai Author: Cheng Lian <lian@databricks.com> Closes #6493 from liancheng/override-log4j and squashes the following commits: c489e0e [Cheng Lian] Fixes minor Scala styling issue b46ef0d [Cheng Lian] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior	2015-06-02 17:06:24 -07:00
Patrick Wendell	e3c35b217c	Preparing development version 1.4.0-SNAPSHOT	2015-06-02 17:01:15 -07:00
Patrick Wendell	a14fad11ef	Preparing Spark release v1.4.0-rc4	2015-06-02 17:01:10 -07:00
Xiangrui Meng	97d4cd0740	[SPARK-8049] [MLLIB] drop tmp col from OneVsRest output The temporary column should be dropped after we get the prediction column. harsha2010 Author: Xiangrui Meng <meng@databricks.com> Closes #6592 from mengxr/SPARK-8049 and squashes the following commits: 1d89107 [Xiangrui Meng] use SparkFunSuite 6ee70de [Xiangrui Meng] drop tmp col from OneVsRest output (cherry picked from commit `89f21f66b5`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-02 16:53:26 -07:00
Patrick Wendell	92ccc5ba39	Preparing development version 1.4.0-SNAPSHOT	2015-06-02 14:02:19 -07:00
Patrick Wendell	d630f4d697	Preparing Spark release v1.4.0-rc4	2015-06-02 14:02:14 -07:00
Davies Liu	6b0f61563d	[SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise() Thanks ogirardot, closes #6580 cc rxin JoshRosen Author: Davies Liu <davies@databricks.com> Closes #6590 from davies/when and squashes the following commits: c0f2069 [Davies Liu] fix Column.when() and otherwise() (cherry picked from commit `605ddbb27c`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-02 13:38:14 -07:00
Cheng Lian	cbaf595447	[SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append The current code references the schema of the DataFrame to be written before checking save mode. This triggers expensive metadata discovery prematurely. For save mode other than `Append`, this metadata discovery is useless since we either ignore the result (for `Ignore` and `ErrorIfExists`) or delete existing files (for `Overwrite`) later. This PR fixes this issue by deferring metadata discovery after save mode checking. Author: Cheng Lian <lian@databricks.com> Closes #6583 from liancheng/spark-8014 and squashes the following commits: 1aafabd [Cheng Lian] Updates comments 088abaa [Cheng Lian] Avoids schema merging and partition discovery when data schema and partition schema are defined 8fbd93f [Cheng Lian] Fixes SPARK-8014 (cherry picked from commit `686a45f0b9`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-06-02 13:32:34 -07:00
Mike Dusenberry	815e056542	[SPARK-7985] [ML] [MLlib] [Docs] Remove "fittingParamMap" references. Updating ML Doc "Estimator, Transformer, and Param" examples. Updating ML Doc's "Estimator, Transformer, and Param" example to use `model.extractParamMap` instead of `model.fittingParamMap`, which no longer exists. mengxr, I believe this addresses (part of) the update documentation TODO list item from [PR 5820](https://github.com/apache/spark/pull/5820). Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6514 from dusenberrymw/Fix_ML_Doc_Estimator_Transformer_Param_Example and squashes the following commits: 6366e1f [Mike Dusenberry] Updating instances of model.extractParamMap to model.parent.extractParamMap, since the Params of the parent Estimator could possibly differ from thos of the Model. d850e0e [Mike Dusenberry] Removing all references to "fittingParamMap" throughout Spark, since it has been removed. 0480304 [Mike Dusenberry] Updating the ML Doc "Estimator, Transformer, and Param" Java example to use model.extractParamMap() instead of model.fittingParamMap(), which no longer exists. 7d34939 [Mike Dusenberry] Updating ML Doc "Estimator, Transformer, and Param" example to use model.extractParamMap instead of model.fittingParamMap, which no longer exists. (cherry picked from commit `ad06727fe9`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-02 12:38:33 -07:00
Josh Rosen	139c8240fd	[MINOR] Enable PySpark SQL readerwriter and window tests PySpark SQL's `readerwriter` and `window` doctests weren't being run by our test runner script; this patch re-enables them. Author: Josh Rosen <joshrosen@databricks.com> Closes #6542 from JoshRosen/enable-more-pyspark-sql-tests and squashes the following commits: 9f46ce4 [Josh Rosen] Enable PySpark SQL readerwriter and window tests.	2015-06-02 12:02:07 -07:00

1 2 3 4 5 ...

11294 commits