ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Yin Huai	cdfa388dd0	[SPARK-7287] [SPARK-8567] [TEST] Add sc.stop to applications in SparkSubmitSuite Hopefully, this suite will not be flaky anymore. Author: Yin Huai <yhuai@databricks.com> Closes #7027 from yhuai/SPARK-8567 and squashes the following commits: c0167e2 [Yin Huai] Add sc.stop(). (cherry picked from commit `fbf75738fe`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-29 17:20:14 -07:00
zsxwing	f84f24769a	[SPARK-8634] [STREAMING] [TESTS] Fix flaky test StreamingListenerSuite "receiver info reporting" As per the unit test log in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35754/ ``` 15/06/24 23:09:10.210 Thread-3495 INFO ReceiverTracker: Starting 1 receivers 15/06/24 23:09:10.270 Thread-3495 INFO SparkContext: Starting job: apply at Transformer.scala:22 ... 15/06/24 23:09:14.259 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Started receiver and sleeping 15/06/24 23:09:14.270 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Reporting error and sleeping ``` it needs at least 4 seconds to receive all receiver events in this slow machine, but `timeout` for `eventually` is only 2 seconds. This PR increases `timeout` to make this test stable. Author: zsxwing <zsxwing@gmail.com> Closes #7017 from zsxwing/SPARK-8634 and squashes the following commits: 719cae4 [zsxwing] Fix flaky test StreamingListenerSuite "receiver info reporting" (cherry picked from commit `cec98525fd`) Signed-off-by: Andrew Or <andrew@databricks.com>	2015-06-29 17:19:24 -07:00
Yin Huai	6a45d86dbd	[SPARK-8710] [SQL] Change ScalaReflection.mirror from a val to a def. jira: https://issues.apache.org/jira/browse/SPARK-8710 Author: Yin Huai <yhuai@databricks.com> Closes #7094 from yhuai/SPARK-8710 and squashes the following commits: c854baa [Yin Huai] Change ScalaReflection.mirror from a val to a def. (cherry picked from commit `4b497a724a`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-29 16:26:13 -07:00
Michael Armbrust	0c3f7fc88a	[HOTFIX] Fix whitespace style error Author: Michael Armbrust <michael@databricks.com> Closes #7102 from marmbrus/fixStyle and squashes the following commits: 8c08124 [Michael Armbrust] [HOTFIX] Fix whitespace style error	2015-06-29 15:57:36 -07:00
Yin Huai	0de1737a8a	[SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmitSuite (1.4 branch) Cherry-pick `f9b397f54d` to branch 1.4. Author: Yin Huai <yhuai@databricks.com> Closes #7092 from yhuai/SPARK-8567-1.4 and squashes the following commits: 0ae2e14 [Yin Huai] [SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmitSuite.	2015-06-29 15:20:35 -07:00
Ai He	187015f671	[SPARK-7810] [PYSPARK] solve python rdd socket connection problem Method "_load_from_socket" in rdd.py cannot load data from jvm socket when ipv6 is used. The current method only works well with ipv4. New modification should work around both two protocols. Author: Ai He <ai.he@ussuning.com> Author: AiHe <ai.he@ussuning.com> Closes #6338 from AiHe/pyspark-networking-issue and squashes the following commits: d4fc9c4 [Ai He] handle code review 2 e75c5c8 [Ai He] handle code review 5644953 [AiHe] solve python rdd socket connection problem to jvm (cherry picked from commit `ecd3aacf28`) Signed-off-by: Davies Liu <davies@databricks.com>	2015-06-29 14:36:48 -07:00
scwf	457d07eaa0	[SQL] [MINOR] Skip unresolved expression for InConversion Author: scwf <wangfei1@huawei.com> Closes #6145 from scwf/InConversion and squashes the following commits: 5c8ac6b [scwf] minir fix for InConversion (cherry picked from commit `edf09ea1bd`) Signed-off-by: Cheng Lian <lian@databricks.com>	2015-06-29 13:30:47 -07:00
Burak Yavuz	6b9f3831a8	[SPARK-8681] fixed wrong ordering of columns in crosstab I specifically randomized the test. What crosstab does is equivalent to a countByKey, therefore if this test fails again for any reason, we will know that we hit a corner case or something. cc rxin marmbrus Author: Burak Yavuz <brkyvz@gmail.com> Closes #7060 from brkyvz/crosstab-fixes and squashes the following commits: 0a65234 [Burak Yavuz] addressed comments v1 d96da7e [Burak Yavuz] fixed wrong ordering of columns in crosstab (cherry picked from commit `be7ef06762`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-29 13:15:12 -07:00
Kousuke Saruta	da51cf58fc	[SQL][DOCS] Remove wrong example from DataFrame.scala In DataFrame.scala, there are examples like as follows. ``` * // The following are equivalent: * peopleDf.filter($"age" > 15) * peopleDf.where($"age" > 15) * peopleDf($"age" > 15) ``` But, I think the last example doesn't work. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #6977 from sarutak/fix-dataframe-example and squashes the following commits: 46efbd7 [Kousuke Saruta] Removed wrong example (cherry picked from commit `94e040d059`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-29 12:16:44 -07:00
Andrew Or	f7c200e6ac	Revert "[SPARK-8372] History server shows incorrect information for application not started" This reverts commit `f0513733d4`.	2015-06-29 10:52:23 -07:00
Cheolsoo Park	9d9c4b476d	[SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror Scala I compared PySpark DataFrameReader/Writer against Scala ones. `Option` function is missing in both reader and writer, but the rest seems to all match. I added `Option` to reader and writer and updated the `pyspark-sql` test. Author: Cheolsoo Park <cheolsoop@netflix.com> Closes #7078 from piaozhexiu/SPARK-8355 and squashes the following commits: c63d419 [Cheolsoo Park] Fix version 524e0aa [Cheolsoo Park] Add option function to df reader and writer (cherry picked from commit `ac2e17b01c`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-06-29 00:21:31 -07:00
Josh Rosen	e1bbf1a080	[SPARK-8606] Prevent exceptions in RDD.getPreferredLocations() from crashing DAGScheduler If `RDD.getPreferredLocations()` throws an exception it may crash the DAGScheduler and SparkContext. This patch addresses this by adding a try-catch block. Author: Josh Rosen <joshrosen@databricks.com> Closes #7023 from JoshRosen/SPARK-8606 and squashes the following commits: 770b169 [Josh Rosen] Fix getPreferredLocations() DAGScheduler crash with try block. 44a9b55 [Josh Rosen] Add test of a buggy getPartitions() method 19aa9f7 [Josh Rosen] Add (failing) regression test for getPreferredLocations() DAGScheduler crash (cherry picked from commit `0b5abbf5f9`) Signed-off-by: Josh Rosen <joshrosen@databricks.com>	2015-06-27 14:41:03 -07:00
Neelesh Srinivas Salian	a2dbb48071	[SPARK-3629] [YARN] [DOCS]: Improvement of the "Running Spark on YARN" document As per the description in the JIRA, I moved the contents of the page and added a few additional content. Author: Neelesh Srinivas Salian <nsalian@cloudera.com> Closes #6924 from nssalian/SPARK-3629 and squashes the following commits: 944b7a0 [Neelesh Srinivas Salian] Changed the lines about deploy-mode and added backticks to all parameters 40dbc0b [Neelesh Srinivas Salian] Changed dfs to HDFS, deploy-mode in backticks and updated the master yarn line 9cbc072 [Neelesh Srinivas Salian] Updated a few lines in the Launching Spark on YARN Section 8e8db7f [Neelesh Srinivas Salian] Removed the changes in this commit to help clearly distinguish movement from update 151c298 [Neelesh Srinivas Salian] SPARK-3629: Improvement of the Spark on YARN document (cherry picked from commit `d48e78934a`) Signed-off-by: Sean Owen <sowen@cloudera.com>	2015-06-27 09:08:26 +03:00
Rosstin	68907d2720	[SPARK-8639] [DOCS] Fixed Minor Typos in Documentation Ticket: [SPARK-8639](https://issues.apache.org/jira/browse/SPARK-8639) fixed minor typos in docs/README.md and docs/api.md Author: Rosstin <asterazul@gmail.com> Closes #7046 from Rosstin/SPARK-8639 and squashes the following commits: 6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md (cherry picked from commit `b5a6663da2`) Signed-off-by: Sean Owen <sowen@cloudera.com>	2015-06-27 08:49:13 +03:00
cafreeman	2579948bf5	[SPARK-8607] SparkR -- jars not being added to application classpath correctly Add `getStaticClass` method in SparkR's `RBackendHandler` This is a fix for the problem referenced in [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185). cc shivaram Author: cafreeman <cfreeman@alteryx.com> Closes #7001 from cafreeman/branch-1.4 and squashes the following commits: 8f81194 [cafreeman] Add missing license 31aedcf [cafreeman] Refactor test to call an external R script 2c22073 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4 0bea809 [cafreeman] Fixed relative path issue and added smaller JAR ee25e60 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4 9a5c362 [cafreeman] test for including JAR when launching sparkContext 9101223 [cafreeman] Merge branch 'branch-1.4' of github.com:apache/spark into branch-1.4 5a80844 [cafreeman] Fix style nits 7c6bd0c [cafreeman] [SPARK-8607] SparkR	2015-06-26 17:06:02 -07:00
cafreeman	78b31a2a63	[SPARK-8662] SparkR Update SparkSQL Test Test `infer_type` using a more fine-grained approach rather than comparing environments. Since `all.equal`'s behavior has changed in R 3.2, the test became unpassable. JIRA here: https://issues.apache.org/jira/browse/SPARK-8662 Author: cafreeman <cfreeman@alteryx.com> Closes #7045 from cafreeman/R32_Test and squashes the following commits: b97cc52 [cafreeman] Add `checkStructField` utility 3381e5c [cafreeman] Update SparkSQL Test	2015-06-26 10:07:35 -07:00
Shivaram Venkataraman	6abb4fc8a4	[SPARK-8637] [SPARKR] [HOTFIX] Fix packages argument, sparkSubmitBinName cc cafreeman Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #7022 from shivaram/sparkr-init-hotfix and squashes the following commits: 9178d15 [Shivaram Venkataraman] Fix packages argument, sparkSubmitBinName (cherry picked from commit `c392a9efab`) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>	2015-06-25 10:56:08 -07:00
Tom Graves	13802163de	[SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/ta… …rget versions. I basically copied the compatibility rules from the top level pom.xml into here. Someone more familiar with all the options in the top level pom may want to make sure nothing else should be copied on down. With this is allows me to build with jdk8 and run with lower versions. Source shows compiled for jdk6 as its supposed to. Author: Tom Graves <tgraves@yahoo-inc.com> Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes #6989 from tgravescs/SPARK-8574 and squashes the following commits: e1ea2d4 [Thomas Graves] Change to use combine.children="append" 150d645 [Tom Graves] [SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/target versions (cherry picked from commit `e988adb58f`) Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>	2015-06-25 08:27:56 -05:00
Joshi	74001db04f	[SPARK-5768] [WEB UI] Fix for incorrect memory in Spark UI Fix for incorrect memory in Spark UI as per SPARK-5768 Author: Joshi <rekhajoshm@gmail.com> Author: Rekha Joshi <rekhajoshm@gmail.com> Closes #6972 from rekhajoshm/SPARK-5768 and squashes the following commits: b678a91 [Joshi] Fix for incorrect memory in Spark UI 2fe53d9 [Joshi] Fix for incorrect memory in Spark UI eb823b8 [Joshi] SPARK-5768: Fix for incorrect memory in Spark UI 0be142d [Rekha Joshi] Merge pull request #3 from apache/master 106fd8e [Rekha Joshi] Merge pull request #2 from apache/master e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master (cherry picked from commit `085a7216bf`) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>	2015-06-25 20:22:04 +09:00
Cheng Lian	0605e08434	[SPARK-8604] [SQL] HadoopFsRelation subclasses should set their output format class `HadoopFsRelation` subclasses, especially `ParquetRelation2` should set its own output format class, so that the default output committer can be setup correctly when doing appending (where we ignore user defined output committers). Author: Cheng Lian <lian@databricks.com> Closes #6998 from liancheng/spark-8604 and squashes the following commits: 9be51d1 [Cheng Lian] Adds more comments 6db1368 [Cheng Lian] HadoopFsRelation subclasses should set their output format class (cherry picked from commit `c337844ed7`) Signed-off-by: Cheng Lian <lian@databricks.com>	2015-06-25 00:07:01 -07:00
Yin Huai	792ed7a4ba	[SPARK-8567] [SQL] Increase the timeout of HiveSparkSubmitSuite https://issues.apache.org/jira/browse/SPARK-8567 Author: Yin Huai <yhuai@databricks.com> Closes #6957 from yhuai/SPARK-8567 and squashes the following commits: 62dff5b [Yin Huai] Increase the timeout. Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala	2015-06-24 15:56:43 -07:00
BenFradet	93793237ee	[SPARK-8399] [STREAMING] [WEB UI] Overlap between histograms and axis' name in Spark Streaming UI Moved where the X axis' name (#batches) is written in histograms in the spark streaming web ui so the histograms and the axis' name do not overlap. Author: BenFradet <benjamin.fradet@gmail.com> Closes #6845 from BenFradet/SPARK-8399 and squashes the following commits: b63695f [BenFradet] adjusted inner histograms eb610ee [BenFradet] readjusted #batches on the x axis dd46f98 [BenFradet] aligned all unit labels and ticks 0564b62 [BenFradet] readjusted #batches placement edd0936 [BenFradet] moved where the X axis' name (#batches) is written in histograms in the spark streaming web ui (cherry picked from commit `1173483f3f`) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>	2015-06-24 12:03:23 -07:00
Holden Karau	f6682dd6e8	[SPARK-8506] Add pakages to R context created through init. Author: Holden Karau <holden@pigscanfly.ca> Closes #6928 from holdenk/SPARK-8506-sparkr-does-not-provide-an-easy-way-to-depend-on-spark-packages-when-performing-init-from-inside-of-r and squashes the following commits: b60dd63 [Holden Karau] Add an example with the spark-csv package fa8bc92 [Holden Karau] typo: sparm -> spark 865a90c [Holden Karau] strip spaces for comparision c7a4471 [Holden Karau] Add some documentation c1a9233 [Holden Karau] refactor for testing c818556 [Holden Karau] Add pakages to R (cherry picked from commit `43e66192f4`) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>	2015-06-24 11:55:29 -07:00
Yin Huai	7e53ff2581	[SPARK-8578] [SQL] Should ignore user defined output committer when appending data (branch 1.4) This is https://github.com/apache/spark/pull/6964 for branch 1.4. Author: Yin Huai <yhuai@databricks.com> Closes #6966 from yhuai/SPARK-8578-branch-1.4 and squashes the following commits: 9c3947b [Yin Huai] Do not use a custom output commiter when appendiing data.	2015-06-24 09:51:18 -07:00
Patrick Wendell	eafbe13459	Preparing development version 1.4.2-SNAPSHOT	2015-06-23 19:48:44 -07:00
Patrick Wendell	60e08e5075	Preparing Spark release v1.4.1-rc1	2015-06-23 19:48:39 -07:00
Davies Liu	13f7b0a910	[SPARK-8573] [SPARK-8568] [SQL] [PYSPARK] raise Exception if column is used in booelan expression It's a common mistake that user will put Column in a boolean expression (together with `and` , `or`), which does not work as expected, we should raise a exception in that case, and suggest user to use `&`, `\|` instead. Author: Davies Liu <davies@databricks.com> Closes #6961 from davies/column_bool and squashes the following commits: 9f19beb [Davies Liu] update message af74bd6 [Davies Liu] fix tests 07dff84 [Davies Liu] address comments, fix tests f70c08e [Davies Liu] raise Exception if column is used in booelan expression (cherry picked from commit `7fb5ae5024`) Signed-off-by: Davies Liu <davies@databricks.com>	2015-06-23 15:51:35 -07:00
Oleksiy Dyagilev	8d6e3636e9	[SPARK-8525] [MLLIB] fix LabeledPoint parser when there is a whitespace between label and features vector fix LabeledPoint parser when there is a whitespace between label and features vector, e.g. (y, [x1, x2, x3]) Author: Oleksiy Dyagilev <oleksiy_dyagilev@epam.com> Closes #6954 from fe2s/SPARK-8525 and squashes the following commits: 0755b9d [Oleksiy Dyagilev] [SPARK-8525][MLLIB] addressing comment, removing dep on commons-lang c1abc2b [Oleksiy Dyagilev] [SPARK-8525][MLLIB] fix LabeledPoint parser when there is a whitespace on specific position (cherry picked from commit `a8031183af`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-23 13:15:27 -07:00
lockwobr	27693e1757	[SQL] [DOCS] updated the documentation for explode the syntax was incorrect in the example in explode Author: lockwobr <lockwobr@gmail.com> Closes #6943 from lockwobr/master and squashes the following commits: 3d864d1 [lockwobr] updated the documentation for explode (cherry picked from commit `4f7fbefb8d`) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>	2015-06-24 02:51:36 +09:00
Josh Rosen	77cb1d5ed1	Revert "[SPARK-8498] [TUNGSTEN] fix npe in errorhandling path in unsafeshuffle writer" This reverts commit `3348245055`. Reverting because `catch (Exception e) ... throw e` doesn't compile under Java 6 unless the method declares that it throws Exception.	2015-06-23 09:19:11 -07:00
Holden Karau	3348245055	[SPARK-8498] [TUNGSTEN] fix npe in errorhandling path in unsafeshuffle writer Author: Holden Karau <holden@pigscanfly.ca> Closes #6918 from holdenk/SPARK-8498-fix-npe-in-errorhandling-path-in-unsafeshuffle-writer and squashes the following commits: f807832 [Holden Karau] Log error if we can't throw it 855f9aa [Holden Karau] Spelling - not my strongest suite. Fix Propegates to Propagates. 039d620 [Holden Karau] Add missing closeandwriteoutput 30e558d [Holden Karau] go back to try/finally e503b8c [Holden Karau] Improve the test to ensure we aren't masking the underlying exception ae0b7a7 [Holden Karau] Fix the test 2e6abf7 [Holden Karau] Be more cautious when cleaning up during failed write and re-throw user exceptions (cherry picked from commit `0f92be5b5f`) Signed-off-by: Josh Rosen <joshrosen@databricks.com>	2015-06-23 09:08:49 -07:00
Hari Shreedharan	9294796750	[SPARK-8483] [STREAMING] Remove commons-lang3 dependency from Flume Sink Author: Hari Shreedharan <hshreedharan@apache.org> Closes #6910 from harishreedharan/remove-commons-lang3 and squashes the following commits: 9875f7d [Hari Shreedharan] Revert back to Flume 1.4.0 ca35eb0 [Hari Shreedharan] [SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Sink. Also bump Flume version to 1.6.0	2015-06-22 23:41:35 -07:00
Scott Taylor	d0943afbcf	[SPARK-8541] [PYSPARK] test the absolute error in approx doctests A minor change but one which is (presumably) visible on the public api docs webpage. Author: Scott Taylor <github@megatron.me.uk> Closes #6942 from megatron-me-uk/patch-3 and squashes the following commits: fbed000 [Scott Taylor] test the absolute error in approx doctests (cherry picked from commit `f0dcbe8a7c`) Signed-off-by: Josh Rosen <joshrosen@databricks.com>	2015-06-22 23:38:21 -07:00
Holden Karau	22cc1ab66e	[SPARK-7781] [MLLIB] gradient boosted trees.train regressor missing max bins Author: Holden Karau <holden@pigscanfly.ca> Closes #6331 from holdenk/SPARK-7781-GradientBoostedTrees.trainRegressor-missing-max-bins and squashes the following commits: 2894695 [Holden Karau] remove extra blank line 2573e8d [Holden Karau] Update the scala side of the pythonmllibapi and make the test a bit nicer too 3a09170 [Holden Karau] add maxBins to to the train method as well af7f274 [Holden Karau] Add maxBins to GradientBoostedTrees.trainRegressor and correctly mention the default of 32 in other places where it mentioned 100 (cherry picked from commit `164fe2aa44`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-22 22:40:31 -07:00
Patrick Wendell	1cfa7302ee	Preparing development version 1.4.2-SNAPSHOT	2015-06-22 22:21:31 -07:00
Patrick Wendell	d0a5560ce4	Preparing Spark release v1.4.1-rc1	2015-06-22 22:21:26 -07:00
Patrick Wendell	48d6830144	[BUILD] Preparing Spark release 1.4.1	2015-06-22 22:18:52 -07:00
Yu ISHIKAWA	250179485b	[SPARK-8548] [SPARKR] Remove the trailing whitespaces from the SparkR files [[SPARK-8548] Remove the trailing whitespaces from the SparkR files - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8548) - This is the result of `lint-r` https://gist.github.com/yu-iskw/0019b37a2c1167f33986 Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #6945 from yu-iskw/SPARK-8548 and squashes the following commits: 0bd567a [Yu ISHIKAWA] [SPARK-8548][SparkR] Remove the trailing whitespaces from the SparkR files (cherry picked from commit `44fa7df64d`) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>	2015-06-22 20:55:55 -07:00
Cheng Hao	d73900a903	[SPARK-7859] [SQL] Collect_set() behavior differences which fails the unit test under jdk8 To reproduce that: ``` JAVA_HOME=/home/hcheng/Java/jdk1.8.0_45 \| build/sbt -Phadoop-2.3 -Phive 'test-only org.apache.spark.sql.hive.execution.HiveWindowFunctionQueryWithoutCodeGenSuite' ``` A simple workaround to fix that is update the original query, for getting the output size instead of the exact elements of the array (output by collect_set()) Author: Cheng Hao <hao.cheng@intel.com> Closes #6402 from chenghao-intel/windowing and squashes the following commits: 99312ad [Cheng Hao] add order by for the select clause edf8ce3 [Cheng Hao] update the code as suggested 7062da7 [Cheng Hao] fix the collect_set() behaviour differences under different versions of JDK (cherry picked from commit `13321e6555`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-06-22 20:05:00 -07:00
Yin Huai	994abbaeb3	[SPARK-8532] [SQL] In Python's DataFrameWriter, save/saveAsTable/json/parquet/jdbc always override mode https://issues.apache.org/jira/browse/SPARK-8532 This PR has two changes. First, it fixes the bug that save actions (i.e. `save/saveAsTable/json/parquet/jdbc`) always override mode. Second, it adds input argument `partitionBy` to `save/saveAsTable/parquet`. Author: Yin Huai <yhuai@databricks.com> Closes #6937 from yhuai/SPARK-8532 and squashes the following commits: f972d5d [Yin Huai] davies's comment. d37abd2 [Yin Huai] style. d21290a [Yin Huai] Python doc. 889eb25 [Yin Huai] Minor refactoring and add partitionBy to save, saveAsTable, and parquet. 7fbc24b [Yin Huai] Use None instead of "error" as the default value of mode since JVM-side already uses "error" as the default value. d696dff [Yin Huai] Python style. 88eb6c4 [Yin Huai] If mode is "error", do not call mode method. c40c461 [Yin Huai] Regression test. (cherry picked from commit `5ab9fcfb01`) Signed-off-by: Yin Huai <yhuai@databricks.com>	2015-06-22 13:51:34 -07:00
Yu ISHIKAWA	507381d393	[SPARK-8511] [PYSPARK] Modify a test to remove a saved model in `regression.py` [[SPARK-8511] Modify a test to remove a saved model in `regression.py` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8511) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #6926 from yu-iskw/SPARK-8511 and squashes the following commits: 7cd0948 [Yu ISHIKAWA] Use `shutil.rmtree()` to temporary directories for saving model testings, instead of `os.removedirs()` 4a01c9e [Yu ISHIKAWA] [SPARK-8511][pyspark] Modify a test to remove a saved model in `regression.py` (cherry picked from commit `5d89d9f00b`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com> Conflicts: python/pyspark/mllib/tests.py	2015-06-22 11:59:53 -07:00
Michael Armbrust	65981619b2	[SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings (branch-1.4) This is branch 1.4 backport of https://github.com/apache/spark/pull/6888. Below is the original description. In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to `StringType` when it was involved in a binary comparison with a `StringType`. This allowed comparing a timestamp with a partial date as a user would expect. - `time > "2014-06-10"` - `time > "2014"` In 1.4.0 we tried to cast the String instead into a Timestamp. However, since partial dates are not a valid complete timestamp this results in `null` which results in the tuple being filtered. This PR restores the earlier behavior. Note that we still special case equality so that these comparisons are not affected by not printing zeros for subsecond precision. Author: Michael Armbrust <michaeldatabricks.com> Closes #6888 from marmbrus/timeCompareString and squashes the following commits: bdef29c [Michael Armbrust] test partial date 1f09adf [Michael Armbrust] special handling of equality 1172c60 [Michael Armbrust] more test fixing 4dfc412 [Michael Armbrust] fix tests aaa9508 [Michael Armbrust] newline 04d908f [Michael Armbrust] [SPARK-8420][SQL] Fix comparision of timestamps/dates with strings Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala Author: Michael Armbrust <michael@databricks.com> Closes #6914 from yhuai/timeCompareString-1.4 and squashes the following commits: 9882915 [Michael Armbrust] [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings	2015-06-22 10:45:33 -07:00
Cheng Lian	451c8722af	[SPARK-8406] [SQL] Backports SPARK-8406 and PR #6864 to branch-1.4 Author: Cheng Lian <lian@databricks.com> Closes #6932 from liancheng/spark-8406-for-1.4 and squashes the following commits: a0168fe [Cheng Lian] Backports SPARK-8406 and PR #6864 to branch-1.4	2015-06-22 10:04:29 -07:00
Liang-Chi Hsieh	b836bac3fe	[HOTFIX] Hotfix branch-1.4 building by removing avgMetrics in CrossValidatorSuite Ref. #6905 ping yhuai Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6929 from viirya/hot_fix_cv_test and squashes the following commits: `b1aec53` [Liang-Chi Hsieh] Hotfix branch-1.4 by removing avgMetrics in CrossValidatorSuite.	2015-06-21 22:25:08 -07:00
Joseph K. Bradley	2a7ea31a9e	[SPARK-7715] [MLLIB] [ML] [DOC] Updated MLlib programming guide for release 1.4 Reorganized docs a bit. Added migration guides. Q: Do we want to say more for the 1.3 -> 1.4 migration guide for ```spark.ml```? It would be a lot. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #6897 from jkbradley/ml-guide-1.4 and squashes the following commits: 4bf26d6 [Joseph K. Bradley] tiny fix 8085067 [Joseph K. Bradley] fixed spacing/layout issues in ml guide from previous commit in this PR 6cd5c78 [Joseph K. Bradley] Updated MLlib programming guide for release 1.4 (cherry picked from commit `a1894422ad`) Signed-off-by: Xiangrui Meng <meng@databricks.com>	2015-06-21 16:27:14 -07:00
jeanlyn	f0e4040202	[SPARK-8379] [SQL] avoid speculative tasks write to the same file The issue link [SPARK-8379](https://issues.apache.org/jira/browse/SPARK-8379) Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception ``` org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-10000/ds=2015-06-15/type=2/part-00301.lzo owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53 but is accessed by DFSClient_attempt_201506031520_0011_m_000042_0_-1275047721_57 ``` This pr try to write the data to temporary dir when using dynamic parition avoid the speculative tasks writing the same file Author: jeanlyn <jeanlyn92@gmail.com> Closes #6833 from jeanlyn/speculation and squashes the following commits: 64bbfab [jeanlyn] use FileOutputFormat.getTaskOutputPath to get the path 8860af0 [jeanlyn] remove the never using code e19a3bd [jeanlyn] avoid speculative tasks write same file (cherry picked from commit `a1e3649c87`) Signed-off-by: Cheng Lian <lian@databricks.com>	2015-06-21 00:13:55 -07:00
Liang-Chi Hsieh	fe59a4a5f5	[SPARK-8468] [ML] Take the negative of some metrics in RegressionEvaluator to get correct cross validation JIRA: https://issues.apache.org/jira/browse/SPARK-8468 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6905 from viirya/cv_min and squashes the following commits: 930d3db [Liang-Chi Hsieh] Fix python unit test and add document. d632135 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cv_min 16e3b2c [Liang-Chi Hsieh] Take the negative instead of reciprocal. c3dd8d9 [Liang-Chi Hsieh] For comments. b5f52c1 [Liang-Chi Hsieh] Add param to CrossValidator for choosing whether to maximize evaulation value. (cherry picked from commit `0b8995168f`) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>	2015-06-20 13:02:14 -07:00
Andrew Or	9b16508d2c	[HOTFIX] [SPARK-8489] Correct JIRA number in previous commit It should be SPARK-8489, not SPARK-8498.	2015-06-19 17:40:21 -07:00
cody koeninger	a7b773a8b5	[SPARK-8390] [STREAMING] [KAFKA] fix docs related to HasOffsetRanges Author: cody koeninger <cody@koeninger.org> Closes #6863 from koeninger/SPARK-8390 and squashes the following commits: 26a06bd [cody koeninger] Merge branch 'master' into SPARK-8390 3744492 [cody koeninger] [Streaming][Kafka][SPARK-8390] doc changes per TD, test to make sure approach shown in docs actually compiles + runs b108c9d [cody koeninger] [Streaming][Kafka][SPARK-8390] further doc fixes, clean up spacing bb4336b [cody koeninger] [Streaming][Kafka][SPARK-8390] fix docs related to HasOffsetRanges, cleanup 3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api	2015-06-19 17:36:59 -07:00
cody koeninger	78d0ceea82	[SPARK-8389] [STREAMING] [KAFKA] Example of getting offset ranges out o… …f the existing java direct stream api Author: cody koeninger <cody@koeninger.org> Closes #6846 from koeninger/SPARK-8389 and squashes the following commits: 3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api	2015-06-19 17:36:54 -07:00

1 2 3 4 5 ...

11413 commits