ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Daniel Darabos	daf9451a4d	Fix maxTaskFailures comment If maxTaskFailures is 1, the task set is aborted after 1 task failure. Other documentation and the code supports this reading, I think it's just this comment that was off. It's easy to make this mistake — can you please double-check if I'm correct? Thanks! Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #6621 from darabos/patch-2 and squashes the following commits: dfebdec [Daniel Darabos] Fix comment. (cherry picked from commit `10ba188087`) Signed-off-by: Sean Owen <sowen@cloudera.com>	2015-06-04 13:48:56 +02:00
Andrew Or	84da653192	[BUILD] Fix Maven build for Kinesis A necessary dependency that is transitively referenced is not provided, causing compilation failures in builds that provide the kinesis-asl profile.	2015-06-03 20:47:53 -07:00
Andrew Or	bfe74b34a6	[SPARK-7558] Demarcate tests in unit-tests.log (1.4) This includes the following commits: original: `9eb222c` hotfix1: `8c99793` hotfix2: `a4f2412` scalastyle check: `609c492` --- Original patch #6441 Branch-1.3 patch #6602 Author: Andrew Or <andrew@databricks.com> Closes #6598 from andrewor14/demarcate-tests-1.4 and squashes the following commits: 4c3c566 [Andrew Or] Merge branch 'branch-1.4' of github.com:apache/spark into demarcate-tests-1.4 e217b78 [Andrew Or] [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike 46d4361 [Andrew Or] Various whitespace changes (minor) 3d9bf04 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite eaa520e [Andrew Or] Fix tests? b4d93de [Andrew Or] Fix tests 634a777 [Andrew Or] Fix log message a932e8d [Andrew Or] Fix manual things that cannot be covered through automation 8bc355d [Andrew Or] Add core tests as dependencies in all modules 75d361f [Andrew Or] Introduce base abstract class for all test suites	2015-06-03 20:46:44 -07:00
Andrew Or	584a2ba21c	[BUILD] Use right branch when checking against Hive (1.4) For branch-1.4. This is identical to #6629 and is strictly not necessary. I'm opening this as a PR since it changes Jenkins test behavior and I want to test it out here. Author: Andrew Or <andrew@databricks.com> Closes #6630 from andrewor14/build-check-hive-1.4 and squashes the following commits: 186ec65 [Andrew Or] [BUILD] Use right branch when checking against Hive	2015-06-03 18:09:14 -07:00
Andrew Or	96f71b105a	[BUILD] Increase Jenkins test timeout Currently hive tests alone take 40m. The right thing to do is to reduce the test time. However, that is a bigger project and we currently have PRs blocking on tests not timing out.	2015-06-03 17:45:07 -07:00
Shivaram Venkataraman	c2c129073f	[SPARK-8084] [SPARKR] Make SparkR scripts fail on error cc shaneknapp pwendell JoshRosen Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6623 from shivaram/SPARK-8084 and squashes the following commits: 0ec5b26 [Shivaram Venkataraman] Make SparkR scripts fail on error (cherry picked from commit `0576c3c4ff`) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>	2015-06-03 17:02:29 -07:00
Ryan Williams	16748694b8	[SPARK-8088] don't attempt to lower number of executors by 0 Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #6624 from ryan-williams/execs and squashes the following commits: b6f71d4 [Ryan Williams] don't attempt to lower number of executors by 0 (cherry picked from commit `51898b5158`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-03 16:54:52 -07:00
Andrew Or	0bc9a3ec42	[HOTFIX] [TYPO] Fix typo in #6546	2015-06-03 16:04:35 -07:00
Andrew Or	d0be9508f5	[HOTFIX] Unbreak build from backporting #6546 This is caused by `7e46ea0228`.	2015-06-03 15:25:35 -07:00
Xiangrui Meng	b2a22a651f	[SPARK-8051] [MLLIB] make StringIndexerModel silent if input column does not exist This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6595 from mengxr/SPARK-8051 and squashes the following commits: b6a36b9 [Xiangrui Meng] add doc f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051 8ee7c7e [Xiangrui Meng] use SparkFunSuite e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist (cherry picked from commit `26c9d7a0f9`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-03 15:16:36 -07:00
Shivaram Venkataraman	ca21fff7da	[SPARK-3674] [EC2] Clear SPARK_WORKER_INSTANCES when using YARN cc andrewor14 Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6424 from shivaram/spark-worker-instances-yarn-ec2 and squashes the following commits: db244ae [Shivaram Venkataraman] Make Python Lint happy 0593d1b [Shivaram Venkataraman] Clear SPARK_WORKER_INSTANCES when using YARN (cherry picked from commit `d3e026f879`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-03 15:14:44 -07:00
zsxwing	7e46ea0228	[SPARK-7989] [CORE] [TESTS] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite The flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite will fail if there are not enough executors up before running the jobs. This PR adds `JobProgressListener.waitUntilExecutorsUp`. The tests for the cluster mode can use it to wait until the expected executors are up. Author: zsxwing <zsxwing@gmail.com> Closes #6546 from zsxwing/SPARK-7989 and squashes the following commits: 5560e09 [zsxwing] Fix a typo 3b69840 [zsxwing] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite (cherry picked from commit `f27134782e`) Signed-off-by: Andrew Or <andrew@databricks.com> Conflicts: core/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala core/src/test/scala/org/apache/spark/scheduler/SparkListenerWithClusterSuite.scala	2015-06-03 15:05:49 -07:00
zsxwing	306837e4e3	[SPARK-8001] [CORE] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout Some places forget to call `assert` to check the return value of `AsynchronousListenerBus.waitUntilEmpty`. Instead of adding `assert` in these places, I think it's better to make `AsynchronousListenerBus.waitUntilEmpty` throw `TimeoutException`. Author: zsxwing <zsxwing@gmail.com> Closes #6550 from zsxwing/SPARK-8001 and squashes the following commits: 607674a [zsxwing] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout (cherry picked from commit `1d8669f15c`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-03 15:03:15 -07:00
Timothy Chen	59399a8f0c	[SPARK-8083] [MESOS] Use the correct base path in mesos driver page. Author: Timothy Chen <tnachen@gmail.com> Closes #6615 from tnachen/mesos_driver_path and squashes the following commits: 4f47b7c [Timothy Chen] Use the correct base path in mesos driver page. (cherry picked from commit `bfbf12b349`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-03 14:58:33 -07:00
Andrew Or	31e0ae9e1d	[MINOR] [UI] Improve confusing message on log page It's good practice to check if the input path is in the directory we expect to avoid potentially confusing error messages.	2015-06-03 14:48:15 -07:00
Joseph K. Bradley	bfab61f39c	[SPARK-8054] [MLLIB] Added several Java-friendly APIs + unit tests Java-friendly APIs added: * GaussianMixture.run() * GaussianMixtureModel.predict() * DistributedLDAModel.javaTopicDistributions() * StreamingKMeans: trainOn, predictOn, predictOnValues * Statistics.corr * params * added doc to w() since Java docs do not inherit doc * removed non-Java-friendly w() from StringArrayParam and DoubleArrayParam * made DoubleArrayParam Java-friendly w() actually Java-friendly I generated the doc and verified all changes. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #6562 from jkbradley/java-api-1.4 and squashes the following commits: c16821b [Joseph K. Bradley] Small fixes based on code review. d955581 [Joseph K. Bradley] unit test fixes 29b6b0d [Joseph K. Bradley] small fixes fe6dcfe [Joseph K. Bradley] Added several Java-friendly APIs + unit tests: NaiveBayes, GaussianMixture, LDA, StreamingKMeans, Statistics.corr, params (cherry picked from commit `20a26b595c`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-03 14:34:31 -07:00
Reynold Xin	1f90a06bda	[SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures. Author: Reynold Xin <rxin@databricks.com> Closes #6608 from rxin/parquet-analysis and squashes the following commits: b5dc8e2 [Reynold Xin] Code review feedback. 5617cf6 [Reynold Xin] [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures. (cherry picked from commit `939e4f3d8d`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-03 13:58:15 -07:00
Sun Rui	f67a27d026	[SPARK-8063] [SPARKR] Spark master URL conflict between MASTER env variable and --master command line option. Author: Sun Rui <rui.sun@intel.com> Closes #6605 from sun-rui/SPARK-8063 and squashes the following commits: 51ca48b [Sun Rui] [SPARK-8063][SPARKR] Spark master URL conflict between MASTER env variable and --master command line option. (cherry picked from commit `708c63bbbe`) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>	2015-06-03 11:57:00 -07:00
animesh	0a1dad6cd4	[SPARK-7980] [SQL] Support SQLContext.range(end) 1. range() overloaded in SQLContext.scala 2. range() modified in python sql context.py 3. Tests added accordingly in DataFrameSuite.scala and python sql tests.py Author: animesh <animesh@apache.spark> Closes #6609 from animeshbaranawal/SPARK-7980 and squashes the following commits: 935899c [animesh] SPARK-7980:python+scala changes (cherry picked from commit `d053a31be9`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-03 11:28:38 -07:00
Yin Huai	54a4ea4078	[SPARK-7973] [SQL] Increase the timeout of two CliSuite tests. https://issues.apache.org/jira/browse/SPARK-7973 Author: Yin Huai <yhuai@databricks.com> Closes #6525 from yhuai/SPARK-7973 and squashes the following commits: 763b821 [Yin Huai] Also change the timeout of "Single command with -e" to 2 minutes. e598a08 [Yin Huai] Increase the timeout to 3 minutes. (cherry picked from commit `f1646e1023`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-06-03 09:26:30 -07:00
Reynold Xin	ee7f365bd0	[SPARK-8060] Improve DataFrame Python test coverage and documentation. Author: Reynold Xin <rxin@databricks.com> Closes #6601 from rxin/python-read-write-test-and-doc and squashes the following commits: baa8ad5 [Reynold Xin] Code review feedback. f081d47 [Reynold Xin] More documentation updates. c9902fa [Reynold Xin] [SPARK-8060] Improve DataFrame Python reader/writer interface doc and testing. (cherry picked from commit `ce320cb2db`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-03 00:23:42 -07:00
MechCoder	bd57af3879	[SPARK-8032] [PYSPARK] Make version checking for NumPy in MLlib more robust The current checking does version `1.x' is less than `1.4' this will fail if x has greater than 1 digit, since x > 4, however `1.x` < `1.4` It fails in my system since I have version `1.10` :P Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #6579 from MechCoder/np_ver and squashes the following commits: 15430f8 [MechCoder] fix syntax error 893fb7e [MechCoder] remove equal to e35f0d4 [MechCoder] minor e89376c [MechCoder] Better checking 22703dd [MechCoder] [SPARK-8032] Make version checking for NumPy in MLlib more robust (cherry picked from commit `452eb82dd7`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-02 23:24:57 -07:00
Yuhao Yang	33edb2b79e	[SPARK-8043] [MLLIB] [DOC] update NaiveBayes and SVM examples in doc jira: https://issues.apache.org/jira/browse/SPARK-8043 I found some issues during testing the save/load examples in markdown Documents, as a part of 1.4 QA plan Author: Yuhao Yang <hhbyyh@gmail.com> Closes #6584 from hhbyyh/naiveDocExample and squashes the following commits: a01a206 [Yuhao Yang] fix for Gaussian mixture 2fb8b96 [Yuhao Yang] update NaiveBayes and SVM examples in doc (cherry picked from commit `43adbd5611`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-02 23:16:06 -07:00
Joseph K. Bradley	88399c34b2	[SPARK-8053] [MLLIB] renamed scalingVector to scalingVec I searched the Spark codebase for all occurrences of "scalingVector" CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #6596 from jkbradley/scalingVec-rename and squashes the following commits: d3812f8 [Joseph K. Bradley] renamed scalingVector to scalingVec (cherry picked from commit `07c16cb5ba`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-02 22:57:12 -07:00
DB Tsai	6391be872d	[SPARK-7547] [ML] Scala Example code for ElasticNet This is scala example code for both linear and logistic regression. Python and Java versions are to be added. Author: DB Tsai <dbt@netflix.com> Closes #6576 from dbtsai/elasticNetExample and squashes the following commits: e7ca406 [DB Tsai] fix test 6bb6d77 [DB Tsai] fix suite and remove duplicated setMaxIter 136e0dd [DB Tsai] address feedback 1ec29d4 [DB Tsai] fix style 9462f5f [DB Tsai] add example (cherry picked from commit `a86b3e9b9b`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-02 19:12:19 -07:00
Ram Sriharsha	6a3e32ad1e	[SPARK-7387] [ML] [DOC] CrossValidator example code in Python Author: Ram Sriharsha <rsriharsha@hw11853.local> Closes #6358 from harsha2010/SPARK-7387 and squashes the following commits: 63efda2 [Ram Sriharsha] more examples for classifier to distinguish mapreduce from spark properly aeb6bb6 [Ram Sriharsha] Python Style Fix 54a500c [Ram Sriharsha] Merge branch 'master' into SPARK-7387 615e91c [Ram Sriharsha] cleanup 204c4e3 [Ram Sriharsha] Merge branch 'master' into SPARK-7387 7246d35 [Ram Sriharsha] [SPARK-7387][ml][doc] CrossValidator example code in Python (cherry picked from commit `c3f4c32571`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-02 18:53:24 -07:00
Patrick Wendell	ab713af564	Preparing development version 1.4.0-SNAPSHOT	2015-06-02 18:06:41 -07:00
Patrick Wendell	22596c534a	Preparing Spark release v1.4.0-rc4	2015-06-02 18:06:35 -07:00
Cheng Lian	0d83720990	[SQL] [TEST] [MINOR] Follow-up of PR #6493 , use Guava API to ensure Java 6 friendliness This is a follow-up of PR #6493, which has been reverted in branch-1.4 because it uses Java 7 specific APIs and breaks Java 6 build. This PR replaces those APIs with equivalent Guava ones to ensure Java 6 friendliness. cc andrewor14 pwendell, this should also be back ported to branch-1.4. Author: Cheng Lian <lian@databricks.com> Closes #6547 from liancheng/override-log4j and squashes the following commits: c900cfd [Cheng Lian] Addresses Shixiong's comment 72da795 [Cheng Lian] Uses Guava API to ensure Java 6 friendliness (cherry picked from commit `5cd6a63d96`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-02 17:07:20 -07:00
Cheng Lian	daeaa0c5ac	[SQL] [TEST] [MINOR] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior The `HiveThriftServer2Test` relies on proper logging behavior to assert whether the Thrift server daemon process is started successfully. However, some other jar files listed in the classpath may potentially contain an unexpected Log4J configuration file which overrides the logging behavior. This PR writes a temporary `log4j.properties` and prepend it to driver classpath before starting the testing Thrift server process to ensure proper logging behavior. cc andrewor14 yhuai Author: Cheng Lian <lian@databricks.com> Closes #6493 from liancheng/override-log4j and squashes the following commits: c489e0e [Cheng Lian] Fixes minor Scala styling issue b46ef0d [Cheng Lian] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior	2015-06-02 17:06:24 -07:00
Patrick Wendell	e3c35b217c	Preparing development version 1.4.0-SNAPSHOT	2015-06-02 17:01:15 -07:00
Patrick Wendell	a14fad11ef	Preparing Spark release v1.4.0-rc4	2015-06-02 17:01:10 -07:00
Xiangrui Meng	97d4cd0740	[SPARK-8049] [MLLIB] drop tmp col from OneVsRest output The temporary column should be dropped after we get the prediction column. harsha2010 Author: Xiangrui Meng <meng@databricks.com> Closes #6592 from mengxr/SPARK-8049 and squashes the following commits: 1d89107 [Xiangrui Meng] use SparkFunSuite 6ee70de [Xiangrui Meng] drop tmp col from OneVsRest output (cherry picked from commit `89f21f66b5`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-02 16:53:26 -07:00
Patrick Wendell	92ccc5ba39	Preparing development version 1.4.0-SNAPSHOT	2015-06-02 14:02:19 -07:00
Patrick Wendell	d630f4d697	Preparing Spark release v1.4.0-rc4	2015-06-02 14:02:14 -07:00
Davies Liu	6b0f61563d	[SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise() Thanks ogirardot, closes #6580 cc rxin JoshRosen Author: Davies Liu <davies@databricks.com> Closes #6590 from davies/when and squashes the following commits: c0f2069 [Davies Liu] fix Column.when() and otherwise() (cherry picked from commit `605ddbb27c`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-02 13:38:14 -07:00
Cheng Lian	cbaf595447	[SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append The current code references the schema of the DataFrame to be written before checking save mode. This triggers expensive metadata discovery prematurely. For save mode other than `Append`, this metadata discovery is useless since we either ignore the result (for `Ignore` and `ErrorIfExists`) or delete existing files (for `Overwrite`) later. This PR fixes this issue by deferring metadata discovery after save mode checking. Author: Cheng Lian <lian@databricks.com> Closes #6583 from liancheng/spark-8014 and squashes the following commits: 1aafabd [Cheng Lian] Updates comments 088abaa [Cheng Lian] Avoids schema merging and partition discovery when data schema and partition schema are defined 8fbd93f [Cheng Lian] Fixes SPARK-8014 (cherry picked from commit `686a45f0b9`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-06-02 13:32:34 -07:00
Mike Dusenberry	815e056542	[SPARK-7985] [ML] [MLlib] [Docs] Remove "fittingParamMap" references. Updating ML Doc "Estimator, Transformer, and Param" examples. Updating ML Doc's "Estimator, Transformer, and Param" example to use `model.extractParamMap` instead of `model.fittingParamMap`, which no longer exists. mengxr, I believe this addresses (part of) the update documentation TODO list item from [PR 5820](https://github.com/apache/spark/pull/5820). Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6514 from dusenberrymw/Fix_ML_Doc_Estimator_Transformer_Param_Example and squashes the following commits: 6366e1f [Mike Dusenberry] Updating instances of model.extractParamMap to model.parent.extractParamMap, since the Params of the parent Estimator could possibly differ from thos of the Model. d850e0e [Mike Dusenberry] Removing all references to "fittingParamMap" throughout Spark, since it has been removed. 0480304 [Mike Dusenberry] Updating the ML Doc "Estimator, Transformer, and Param" Java example to use model.extractParamMap() instead of model.fittingParamMap(), which no longer exists. 7d34939 [Mike Dusenberry] Updating ML Doc "Estimator, Transformer, and Param" example to use model.extractParamMap instead of model.fittingParamMap, which no longer exists. (cherry picked from commit `ad06727fe9`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-02 12:38:33 -07:00
Josh Rosen	139c8240fd	[MINOR] Enable PySpark SQL readerwriter and window tests PySpark SQL's `readerwriter` and `window` doctests weren't being run by our test runner script; this patch re-enables them. Author: Josh Rosen <joshrosen@databricks.com> Closes #6542 from JoshRosen/enable-more-pyspark-sql-tests and squashes the following commits: 9f46ce4 [Josh Rosen] Enable PySpark SQL readerwriter and window tests.	2015-06-02 12:02:07 -07:00
Marcelo Vanzin	fa292dc3db	[SPARK-8015] [FLUME] Remove Guava dependency from flume-sink. The minimal change would be to disable shading of Guava in the module, and rely on the transitive dependency from other libraries instead. But since Guava's use is so localized, I think it's better to just not use it instead, so I replaced that code and removed all traces of Guava from the module's build. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #6555 from vanzin/SPARK-8015 and squashes the following commits: c0ceea8 [Marcelo Vanzin] Add comments about dependency management. c38228d [Marcelo Vanzin] Add guava dep in test scope. b7a0349 [Marcelo Vanzin] Add libthrift exclusion. 6e0942d [Marcelo Vanzin] Add comment in pom. 2d79260 [Marcelo Vanzin] [SPARK-8015] [flume] Remove Guava dependency from flume-sink. (cherry picked from commit `0071bd8d31`) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>	2015-06-02 11:20:45 -07:00
Cheng Lian	f71a09de6e	[SPARK-8037] [SQL] Ignores files whose name starts with dot in HadoopFsRelation Author: Cheng Lian <lian@databricks.com> Closes #6581 from liancheng/spark-8037 and squashes the following commits: d08e97b [Cheng Lian] Ignores files whose name starts with dot in HadoopFsRelation (cherry picked from commit `1bb5d716c0`) Signed-off-by: Cheng Lian <lian@databricks.com>	2015-06-03 01:09:19 +08:00
Yin Huai	8c3fc3a6cd	[HOT-FIX] Add EvaluatedType back to RDG `87941ff8c4` accidentally removed the EvaluatedType. Author: Yin Huai <yhuai@databricks.com> Closes #6589 from yhuai/getBackEvaluatedType and squashes the following commits: 618c2eb [Yin Huai] Add EvaluatedType back.	2015-06-02 09:59:19 -07:00
Xiangrui Meng	97fedf1a02	[SPARK-7432] [MLLIB] fix flaky CrossValidator doctest The new test uses CV to compare `maxIter=0` and `maxIter=1`, and validate on the evaluation result. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6572 from mengxr/SPARK-7432 and squashes the following commits: c236bb8 [Xiangrui Meng] fix flacky cv doctest (cherry picked from commit `bd97840d5c`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-02 08:51:07 -07:00
Patrick Wendell	92a677891c	Preparing development version 1.4.0-SNAPSHOT	2015-06-02 08:41:15 -07:00
Patrick Wendell	48c506724a	Preparing Spark release v1.4.0-rc4	2015-06-02 08:41:10 -07:00
Davies Liu	292ee1a994	[SPARK-8021] [SQL] [PYSPARK] make Python read/write API consistent with Scala add schema()/format()/options() for reader, add mode()/format()/options()/partitionBy() for writer cc rxin yhuai pwendell Author: Davies Liu <davies@databricks.com> Closes #6578 from davies/readwrite and squashes the following commits: 720d293 [Davies Liu] address comments b65dfa2 [Davies Liu] Update readwriter.py 1299ab6 [Davies Liu] make Python API consistent with Scala (cherry picked from commit `445647a1a3`) Signed-off-by: Patrick Wendell <patrick@databricks.com>	2015-06-02 08:37:26 -07:00
Yin Huai	87941ff8c4	[SPARK-8023][SQL] Add "deterministic" attribute to Expression to avoid collapsing nondeterministic projects. This closes #6570. Author: Yin Huai <yhuai@databricks.com> Author: Reynold Xin <rxin@databricks.com> Closes #6573 from rxin/deterministic and squashes the following commits: 356cd22 [Reynold Xin] Added unit test for the optimizer. da3fde1 [Reynold Xin] Merge pull request #6570 from yhuai/SPARK-8023 da56200 [Yin Huai] Comments. e38f264 [Yin Huai] Comment. f9d6a73 [Yin Huai] Add a deterministic method to Expression. (cherry picked from commit `0f80990bfa`) Signed-off-by: Reynold Xin <rxin@databricks.com> Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/random.scala	2015-06-02 00:21:27 -07:00
Yin Huai	4940630f56	[SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get constructed too early https://issues.apache.org/jira/browse/SPARK-8020 Author: Yin Huai <yhuai@databricks.com> Closes #6571 from yhuai/SPARK-8020-1 and squashes the following commits: 0398f5b [Yin Huai] First populate the SQLConf and then construct executionHive and metadataHive. (cherry picked from commit `7b7f7b6c6f`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-06-02 00:17:09 -07:00
Davies Liu	9d6475b93d	[SPARK-6917] [SQL] DecimalType is not read back when non-native type exists cc yhuai Author: Davies Liu <davies@databricks.com> Closes #6558 from davies/decimalType and squashes the following commits: c877ca8 [Davies Liu] Update ParquetConverter.scala 48cc57c [Davies Liu] Update ParquetConverter.scala b43845c [Davies Liu] add test 3b4a94f [Davies Liu] DecimalType is not read back when non-native type exists (cherry picked from commit `bcb47ad771`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-01 23:12:37 -07:00
Xiangrui Meng	011b07e238	[SPARK-7582] [MLLIB] user guide for StringIndexer This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6561 from mengxr/SPARK-7582 and squashes the following commits: 4bba4f1 [Xiangrui Meng] fix example ba1cd1b [Xiangrui Meng] fix style 7fa18d1 [Xiangrui Meng] add user guide for StringIndexer 136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer (cherry picked from commit `0221c7f0ef`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-01 22:03:37 -07:00

1 2 3 4 5 ...

11333 commits