ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Josh Rosen	6c5a6db4d5	[SPARK-5161] [HOTFIX] Fix bug in Python test failure reporting This patch fixes a bug introduced in #7031 which can cause Jenkins to incorrectly report a build with failed Python tests as passing if an error occurred while printing the test failure message. Author: Josh Rosen <joshrosen@databricks.com> Closes #7112 from JoshRosen/python-tests-hotfix and squashes the following commits: c3f2961 [Josh Rosen] Hotfix for bug in Python test failure reporting	2015-06-29 23:08:51 -07:00
Yadong Qi	e6c3f7462b	[SPARK-8650] [SQL] Use the user-specified app name priority in SparkSQLCLIDriver or HiveThriftServer2 When run `./bin/spark-sql --name query1.sql` [Before] ![before](https://cloud.githubusercontent.com/assets/1400819/8370336/fa20b75a-1bf8-11e5-9171-040049a53240.png) [After] ![after](https://cloud.githubusercontent.com/assets/1400819/8370189/dcc35cb4-1bf6-11e5-8796-a0694140bffb.png) Author: Yadong Qi <qiyadong2010@gmail.com> Closes #7030 from watermen/SPARK-8650 and squashes the following commits: 51b5134 [Yadong Qi] Improve code and add comment. e3d7647 [Yadong Qi] use spark.app.name priority.	2015-06-29 22:34:38 -07:00
Reynold Xin	f79410c49b	[SPARK-8721][SQL] Rename ExpectsInputTypes => AutoCastInputTypes. Author: Reynold Xin <rxin@databricks.com> Closes #7109 from rxin/auto-cast and squashes the following commits: a914cc3 [Reynold Xin] [SPARK-8721][SQL] Rename ExpectsInputTypes => AutoCastInputTypes.	2015-06-29 22:32:43 -07:00
Patrick Wendell	ea775b0662	MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #1767 (close requested by 'andrewor14') Closes #6952 (close requested by 'andrewor14') Closes #7051 (close requested by 'andrewor14') Closes #5357 (close requested by 'marmbrus') Closes #5233 (close requested by 'andrewor14') Closes #6930 (close requested by 'JoshRosen') Closes #5502 (close requested by 'andrewor14') Closes #6778 (close requested by 'andrewor14') Closes #7006 (close requested by 'andrewor14')	2015-06-29 21:41:59 -07:00
Josh Rosen	7bbbe380c5	[SPARK-5161] Parallelize Python test execution This commit parallelizes the Python unit test execution, significantly reducing Jenkins build times. Parallelism is now configurable by passing the `-p` or `--parallelism` flags to either `dev/run-tests` or `python/run-tests` (the default parallelism is 4, but I've successfully tested with higher parallelism). To avoid flakiness, I've disabled the Spark Web UI for the Python tests, similar to what we've done for the JVM tests. Author: Josh Rosen <joshrosen@databricks.com> Closes #7031 from JoshRosen/parallelize-python-tests and squashes the following commits: feb3763 [Josh Rosen] Re-enable other tests f87ea81 [Josh Rosen] Only log output from failed tests d4ded73 [Josh Rosen] Logging improvements a2717e1 [Josh Rosen] Make parallelism configurable via dev/run-tests 1bacf1b [Josh Rosen] Merge remote-tracking branch 'origin/master' into parallelize-python-tests 110cd9d [Josh Rosen] Fix universal_newlines for Python 3 cd13db8 [Josh Rosen] Also log python_implementation 9e31127 [Josh Rosen] Log Python --version output for each executable. a2b9094 [Josh Rosen] Bump up parallelism. 5552380 [Josh Rosen] Python 3 fix 866b5b9 [Josh Rosen] Fix lazy logging warnings in Prospector checks 87cb988 [Josh Rosen] Skip MLLib tests for PyPy 8309bfe [Josh Rosen] Temporarily disable parallelism to debug a failure 9129027 [Josh Rosen] Disable Spark UI in Python tests 037b686 [Josh Rosen] Temporarily disable JVM tests so we can test Python speedup in Jenkins. af4cef4 [Josh Rosen] Initial attempt at parallelizing Python test execution	2015-06-29 21:32:40 -07:00
Yanbo Liang	f9b6bf2f83	[SPARK-7667] [MLLIB] MLlib Python API consistency check MLlib Python API consistency check Author: Yanbo Liang <ybliang8@gmail.com> Closes #6856 from yanboliang/spark-7667 and squashes the following commits: 21bae35 [Yanbo Liang] remove duplicate code eb12f95 [Yanbo Liang] fix doc inherit problem 9e7ec3c [Yanbo Liang] address comments e763d32 [Yanbo Liang] MLlib Python API consistency check	2015-06-29 18:50:23 -07:00
Steven She	4915e9e3bf	[SPARK-8669] [SQL] Fix crash with BINARY (ENUM) fields with Parquet 1.7 Patch to fix crash with BINARY fields with ENUM original types. Author: Steven She <steven@canopylabs.com> Closes #7048 from stevencanopy/SPARK-8669 and squashes the following commits: 2e72979 [Steven She] [SPARK-8669] [SQL] Fix crash with BINARY (ENUM) fields with Parquet 1.7	2015-06-29 18:50:09 -07:00
Burak Yavuz	ecacb1e88a	[SPARK-8715] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab cc yhuai Author: Burak Yavuz <brkyvz@gmail.com> Closes #7100 from brkyvz/ct-flakiness-fix and squashes the following commits: abc299a [Burak Yavuz] change 'to' to until 7e96d7c [Burak Yavuz] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab	2015-06-29 18:48:28 -07:00
Feynman Liang	620605a4a1	[SPARK-8456] [ML] Ngram featurizer python Python API for N-gram feature transformer Author: Feynman Liang <fliang@databricks.com> Closes #6960 from feynmanliang/ngram-featurizer-python and squashes the following commits: f9e37c9 [Feynman Liang] Remove debugging code 4dd81f4 [Feynman Liang] Fix typo and doctest 06c79ac [Feynman Liang] Style guide 26c1175 [Feynman Liang] Add python NGram API	2015-06-29 18:40:30 -07:00
Andrew Or	4c1808be4d	Revert "[SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles" This reverts commit `5d30eae560`.	2015-06-29 18:32:31 -07:00
Michael Sannella x268	4a9e03fa85	[SPARK-8019] [SPARKR] Support SparkR spawning worker R processes with a command other then Rscript This is a simple change to add a new environment variable "spark.sparkr.r.command" that specifies the command that SparkR will use when creating an R engine process. If this is not specified, "Rscript" will be used by default. I did not add any documentation, since I couldn't find any place where environment variables (such as "spark.sparkr.use.daemon") are documented. I also did not add a unit test. The only test that would work generally would be one starting SparkR with sparkR.init(sparkEnvir=list(spark.sparkr.r.command="Rscript")), just using the default value. I think that this is a low-risk change. Likely committers: shivaram Author: Michael Sannella x268 <msannell@tibco.com> Closes #6557 from msannell/altR and squashes the following commits: 7eac142 [Michael Sannella x268] add spark.sparkr.r.command config parameter	2015-06-29 17:28:28 -07:00
Burak Yavuz	d7f796da45	[SPARK-8410] [SPARK-8475] remove previous ivy resolution when using spark-submit This PR also includes re-ordering the order that repositories are used when resolving packages. User provided repositories will be prioritized. cc andrewor14 Author: Burak Yavuz <brkyvz@gmail.com> Closes #7089 from brkyvz/delete-prev-ivy-resolution and squashes the following commits: a21f95a [Burak Yavuz] remove previous ivy resolution when using spark-submit	2015-06-29 17:27:02 -07:00
Sean Owen	5d30eae560	[SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles Note that 'dir/' can be more efficient in some Hadoop FS implementations that 'dir/' Author: Sean Owen <sowen@cloudera.com> Closes #7036 from srowen/SPARK-8437 and squashes the following commits: 0e813ae [Sean Owen] Note that 'dir/' can be more efficient in some Hadoop FS implementations that 'dir/'	2015-06-29 17:21:35 -07:00
Yin Huai	fbf75738fe	[SPARK-7287] [SPARK-8567] [TEST] Add sc.stop to applications in SparkSubmitSuite Hopefully, this suite will not be flaky anymore. Author: Yin Huai <yhuai@databricks.com> Closes #7027 from yhuai/SPARK-8567 and squashes the following commits: c0167e2 [Yin Huai] Add sc.stop().	2015-06-29 17:20:05 -07:00
zsxwing	cec98525fd	[SPARK-8634] [STREAMING] [TESTS] Fix flaky test StreamingListenerSuite "receiver info reporting" As per the unit test log in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35754/ ``` 15/06/24 23:09:10.210 Thread-3495 INFO ReceiverTracker: Starting 1 receivers 15/06/24 23:09:10.270 Thread-3495 INFO SparkContext: Starting job: apply at Transformer.scala:22 ... 15/06/24 23:09:14.259 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Started receiver and sleeping 15/06/24 23:09:14.270 ForkJoinPool-4-worker-29 INFO StreamingListenerSuiteReceiver: Reporting error and sleeping ``` it needs at least 4 seconds to receive all receiver events in this slow machine, but `timeout` for `eventually` is only 2 seconds. This PR increases `timeout` to make this test stable. Author: zsxwing <zsxwing@gmail.com> Closes #7017 from zsxwing/SPARK-8634 and squashes the following commits: 719cae4 [zsxwing] Fix flaky test StreamingListenerSuite "receiver info reporting"	2015-06-29 17:19:05 -07:00
Wenchen Fan	881662e9c9	[SPARK-8589] [SQL] cleanup DateTimeUtils move date time related operations into `DateTimeUtils` and rename some methods to make it more clear. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #6980 from cloud-fan/datetime and squashes the following commits: 9373a9d [Wenchen Fan] cleanup DateTimeUtil	2015-06-29 16:34:50 -07:00
Yin Huai	4b497a724a	[SPARK-8710] [SQL] Change ScalaReflection.mirror from a val to a def. jira: https://issues.apache.org/jira/browse/SPARK-8710 Author: Yin Huai <yhuai@databricks.com> Closes #7094 from yhuai/SPARK-8710 and squashes the following commits: c854baa [Yin Huai] Change ScalaReflection.mirror from a val to a def.	2015-06-29 16:26:05 -07:00
Rosstin	4e880cf596	[SPARK-8661][ML] for LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments, to make copy-pasting R code more simple for mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments, to make copy-pasting R code more simple Author: Rosstin <asterazul@gmail.com> Closes #7098 from Rosstin/SPARK-8661 and squashes the following commits: 5a05dee [Rosstin] SPARK-8661 for LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments to make it easier to copy-paste the R code. bb9a4b1 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8660 242aedd [Rosstin] SPARK-8660, changed comment style from JavaDoc style to normal multiline comment in order to make copypaste into R easier, in file classification/LogisticRegressionSuite.scala 2cd2985 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639 21ac1e5 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639 6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md	2015-06-29 16:09:29 -07:00
Davies Liu	ed359de595	[SPARK-8579] [SQL] support arbitrary object in UnsafeRow This PR brings arbitrary object support in UnsafeRow (both in grouping key and aggregation buffer). Two object pools will be created to hold those non-primitive objects, and put the index of them into UnsafeRow. In order to compare the grouping key as bytes, the objects in key will be stored in a unique object pool, to make sure same objects will have same index (used as hashCode). For StringType and BinaryType, we still put them as var-length in UnsafeRow when initializing for better performance. But for update, they will be an object inside object pools (there will be some garbages left in the buffer). BTW: Will create a JIRA once issue.apache.org is available. cc JoshRosen rxin Author: Davies Liu <davies@databricks.com> Closes #6959 from davies/unsafe_obj and squashes the following commits: 5ce39da [Davies Liu] fix comment 5e797bf [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj 5803d64 [Davies Liu] fix conflict 461d304 [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj 2f41c90 [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj b04d69c [Davies Liu] address comments 4859b80 [Davies Liu] fix comments f38011c [Davies Liu] add a test for grouping by decimal d2cf7ab [Davies Liu] add more tests for null checking 71983c5 [Davies Liu] add test for timestamp e8a1649 [Davies Liu] reuse buffer for string 39f09ca [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_obj 035501e [Davies Liu] fix style 236d6de [Davies Liu] support arbitrary object in UnsafeRow	2015-06-29 15:59:20 -07:00
BenFradet	931da5c8ab	[SPARK-8478] [SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf Follow-up of #6902 for being coherent between ```Udf``` and ```UDF``` Author: BenFradet <benjamin.fradet@gmail.com> Closes #6920 from BenFradet/SPARK-8478 and squashes the following commits: c500f29 [BenFradet] renamed a few variables in functions to use UDF 8ab0f2d [BenFradet] renamed idUdf to idUDF in SQLQuerySuite 98696c2 [BenFradet] renamed originalUdfs in TestHive to originalUDFs 7738f74 [BenFradet] modified HiveUDFSuite to use only UDF c52608d [BenFradet] renamed HiveUdfSuite to HiveUDFSuite e51b9ac [BenFradet] renamed ExtractPythonUdfs to ExtractPythonUDFs 8c756f1 [BenFradet] renamed Hive UDF related code 2a1ca76 [BenFradet] renamed pythonUdfs to pythonUDFs 261e6fb [BenFradet] renamed ScalaUdf to ScalaUDF	2015-06-29 15:27:13 -07:00
Rosstin	c8ae887ef0	[SPARK-8660][ML] Convert JavaDoc style comments inLogisticRegressionSuite.scala to regular multiline comments, to make copy-pasting R commands easier Converted JavaDoc style comments in mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala to regular multiline comments, to make copy-pasting R commands easier. Author: Rosstin <asterazul@gmail.com> Closes #7096 from Rosstin/SPARK-8660 and squashes the following commits: 242aedd [Rosstin] SPARK-8660, changed comment style from JavaDoc style to normal multiline comment in order to make copypaste into R easier, in file classification/LogisticRegressionSuite.scala 2cd2985 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639 21ac1e5 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639 6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md	2015-06-29 14:45:08 -07:00
Ai He	ecd3aacf28	[SPARK-7810] [PYSPARK] solve python rdd socket connection problem Method "_load_from_socket" in rdd.py cannot load data from jvm socket when ipv6 is used. The current method only works well with ipv4. New modification should work around both two protocols. Author: Ai He <ai.he@ussuning.com> Author: AiHe <ai.he@ussuning.com> Closes #6338 from AiHe/pyspark-networking-issue and squashes the following commits: d4fc9c4 [Ai He] handle code review 2 e75c5c8 [Ai He] handle code review 5644953 [AiHe] solve python rdd socket connection problem to jvm	2015-06-29 14:36:26 -07:00
Ilya Ganelin	f6fc254ec4	[SPARK-8056][SQL] Design an easier way to construct schema for both Scala and Python I've added functionality to create new StructType similar to how we add parameters to a new SparkContext. I've also added tests for this type of creation. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #6686 from ilganeli/SPARK-8056B and squashes the following commits: 27c1de1 [Ilya Ganelin] Rename 467d836 [Ilya Ganelin] Removed from_string in favor of _parse_Datatype_json_value 5fef5a4 [Ilya Ganelin] Updates for type parsing 4085489 [Ilya Ganelin] Style errors 3670cf5 [Ilya Ganelin] added string to DataType conversion 8109e00 [Ilya Ganelin] Fixed error in tests 41ab686 [Ilya Ganelin] Fixed style errors e7ba7e0 [Ilya Ganelin] Moved some python tests to tests.py. Added cleaner handling of null data type and added test for correctness of input format 15868fa [Ilya Ganelin] Fixed python errors b79b992 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-8056B a3369fc [Ilya Ganelin] Fixing space errors e240040 [Ilya Ganelin] Style bab7823 [Ilya Ganelin] Constructor error 73d4677 [Ilya Ganelin] Style 4ed00d9 [Ilya Ganelin] Fixed default arg 67df57a [Ilya Ganelin] Removed Foo 04cbf0c [Ilya Ganelin] Added comments for single object 0484d7a [Ilya Ganelin] Restored second method 6aeb740 [Ilya Ganelin] Style 689e54d [Ilya Ganelin] Style f497e9e [Ilya Ganelin] Got rid of old code e3c7a88 [Ilya Ganelin] Fixed doctest failure a62ccde [Ilya Ganelin] Style 966ac06 [Ilya Ganelin] style checks dabb7e6 [Ilya Ganelin] Added Python tests a3f4152 [Ilya Ganelin] added python bindings and better comments e6e536c [Ilya Ganelin] Added extra space 7529a2e [Ilya Ganelin] Fixed formatting d388f86 [Ilya Ganelin] Fixed small bug c4e3bf5 [Ilya Ganelin] Reverted to using parse. Updated parse to support long d7634b6 [Ilya Ganelin] Reverted to fromString to properly support types 22c39d5 [Ilya Ganelin] replaced FromString with DataTypeParser.parse. Replaced empty constructor initializing a null to have it instead create a new array to allow appends to it. faca398 [Ilya Ganelin] [SPARK-8056] Replaced default argument usage. Updated usage and code for DataType.fromString 1acf76e [Ilya Ganelin] Scala style e31c674 [Ilya Ganelin] Fixed bug in test 8dc0795 [Ilya Ganelin] Added tests for creation of StructType object with new methods fdf7e9f [Ilya Ganelin] [SPARK-8056] Created add methods to facilitate building new StructType objects.	2015-06-29 14:15:15 -07:00
Josh Rosen	27ef85451c	[SPARK-8709] Exclude hadoop-client's mockito-all dependency This patch excludes `hadoop-client`'s dependency on `mockito-all`. As of #7061, Spark depends on `mockito-core` instead of `mockito-all`, so the dependency from Hadoop was leading to test compilation failures for some of the Hadoop 2 SBT builds. Author: Josh Rosen <joshrosen@databricks.com> Closes #7090 from JoshRosen/SPARK-8709 and squashes the following commits: e190122 [Josh Rosen] [SPARK-8709] Exclude hadoop-client's mockito-all dependency.	2015-06-29 14:07:55 -07:00
Davies Liu	afae9766f2	[SPARK-8070] [SQL] [PYSPARK] avoid spark jobs in createDataFrame Avoid the unnecessary jobs when infer schema from list. cc yhuai mengxr Author: Davies Liu <davies@databricks.com> Closes #6606 from davies/improve_create and squashes the following commits: a5928bf [Davies Liu] Update MimaExcludes.scala 62da911 [Davies Liu] fix mima bab4d7d [Davies Liu] Merge branch 'improve_create' of github.com:davies/spark into improve_create eee44a8 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_create 8d9292d [Davies Liu] Update context.py eb24531 [Davies Liu] Update context.py c969997 [Davies Liu] bug fix d5a8ab0 [Davies Liu] fix tests 8c3f10d [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_create 6ea5925 [Davies Liu] address comments 6ceaeff [Davies Liu] avoid spark jobs in createDataFrame	2015-06-29 13:20:55 -07:00
Burak Yavuz	be7ef06762	[SPARK-8681] fixed wrong ordering of columns in crosstab I specifically randomized the test. What crosstab does is equivalent to a countByKey, therefore if this test fails again for any reason, we will know that we hit a corner case or something. cc rxin marmbrus Author: Burak Yavuz <brkyvz@gmail.com> Closes #7060 from brkyvz/crosstab-fixes and squashes the following commits: 0a65234 [Burak Yavuz] addressed comments v1 d96da7e [Burak Yavuz] fixed wrong ordering of columns in crosstab	2015-06-29 13:15:04 -07:00
Cheng Hao	c6ba2ea341	[SPARK-7862] [SQL] Disable the error message redirect to stderr This is a follow up of #6404, the ScriptTransformation prints the error msg into stderr directly, probably be a disaster for application log. Author: Cheng Hao <hao.cheng@intel.com> Closes #6882 from chenghao-intel/verbose and squashes the following commits: bfedd77 [Cheng Hao] revert the write 76ff46b [Cheng Hao] update the CircularBuffer 692b19e [Cheng Hao] check the process exitValue for ScriptTransform 47e0970 [Cheng Hao] Use the RedirectThread instead 1de771d [Cheng Hao] naming the threads in ScriptTransformation 8536e81 [Cheng Hao] disable the error message redirection for stderr	2015-06-29 12:46:33 -07:00
zhichao.li	637b4eedad	[SPARK-8214] [SQL] Add function hex cc chenghao-intel adrian-wang Author: zhichao.li <zhichao.li@intel.com> Closes #6976 from zhichao-li/hex and squashes the following commits: e218d1b [zhichao.li] turn off scalastyle for non-ascii de3f5ea [zhichao.li] non-ascii char cf9c936 [zhichao.li] give separated buffer for each hex method 967ec90 [zhichao.li] Make 'value' as a feild of Hex 3b2fa13 [zhichao.li] tiny fix a647641 [zhichao.li] remove duplicate null check 7cab020 [zhichao.li] tiny refactoring 35ecfe5 [zhichao.li] add function hex	2015-06-29 12:25:16 -07:00
Kousuke Saruta	94e040d059	[SQL][DOCS] Remove wrong example from DataFrame.scala In DataFrame.scala, there are examples like as follows. ``` * // The following are equivalent: * peopleDf.filter($"age" > 15) * peopleDf.where($"age" > 15) * peopleDf($"age" > 15) ``` But, I think the last example doesn't work. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #6977 from sarutak/fix-dataframe-example and squashes the following commits: 46efbd7 [Kousuke Saruta] Removed wrong example	2015-06-29 12:16:12 -07:00
Vladimir Vladimirov	492dca3a73	[SPARK-8528] Expose SparkContext.applicationId in PySpark Use case - we want to log applicationId (YARN in hour case) to request help with troubleshooting from the DevOps Author: Vladimir Vladimirov <vladimir.vladimirov@magnetic.com> Closes #6936 from smartkiwi/master and squashes the following commits: 870338b [Vladimir Vladimirov] this would make doctest to run in python3 0eae619 [Vladimir Vladimirov] Scala doesn't use u'...' for unicode literals 14d77a8 [Vladimir Vladimirov] stop using ELLIPSIS b4ebfc5 [Vladimir Vladimirov] addressed PR feedback - updated docstring 223a32f [Vladimir Vladimirov] fixed test - applicationId is property that returns the string 3221f5a [Vladimir Vladimirov] [SPARK-8528] added documentation for Scala 2cff090 [Vladimir Vladimirov] [SPARK-8528] add applicationId property for SparkContext object in pyspark	2015-06-29 12:03:41 -07:00
Tarek Auel	a5c2961caa	[SPARK-8235] [SQL] misc function sha / sha1 Jira: https://issues.apache.org/jira/browse/SPARK-8235 I added the support for sha1. If I understood rxin correctly, sha and sha1 should execute the same algorithm, shouldn't they? Please take a close look on the Python part. This is adopted from #6934 Author: Tarek Auel <tarek.auel@gmail.com> Author: Tarek Auel <tarek.auel@googlemail.com> Closes #6963 from tarekauel/SPARK-8235 and squashes the following commits: f064563 [Tarek Auel] change to shaHex 7ce3cdc [Tarek Auel] rely on automatic cast a1251d6 [Tarek Auel] Merge remote-tracking branch 'upstream/master' into SPARK-8235 68eb043 [Tarek Auel] added docstring be5aff1 [Tarek Auel] improved error message 7336c96 [Tarek Auel] added type check cf23a80 [Tarek Auel] simplified example ebf75ef [Tarek Auel] [SPARK-8301] updated the python documentation. Removed sha in python and scala 6d6ff0d [Tarek Auel] [SPARK-8233] added docstring ea191a9 [Tarek Auel] [SPARK-8233] fixed signatureof python function. Added expected type to misc e3fd7c3 [Tarek Auel] SPARK[8235] added sha to the list of __all__ e5dad4e [Tarek Auel] SPARK[8235] sha / sha1	2015-06-29 11:57:19 -07:00
Marcelo Vanzin	3664ee25f0	[SPARK-8066, SPARK-8067] [hive] Add support for Hive 1.0, 1.1 and 1.2. Allow HiveContext to connect to metastores of those versions; some new shims had to be added to account for changing internal APIs. A new test was added to exercise the "reset()" path which now also requires a shim; and the test code was changed to use a directory under the build's target to store ivy dependencies. Without that, at least I consistently run into issues with Ivy messing up (or being confused) by my existing caches. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7026 from vanzin/SPARK-8067 and squashes the following commits: 3e2e67b [Marcelo Vanzin] [SPARK-8066, SPARK-8067] [hive] Add support for Hive 1.0, 1.1 and 1.2.	2015-06-29 11:53:17 -07:00
Wenchen Fan	ed413bcc78	[SPARK-8692] [SQL] re-order the case statements that handling catalyst data types use same order: boolean, byte, short, int, date, long, timestamp, float, double, string, binary, decimal. Then we can easily check whether some data types are missing by just one glance, and make sure we handle data/timestamp just as int/long. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7073 from cloud-fan/fix-date and squashes the following commits: 463044d [Wenchen Fan] fix style 51cd347 [Wenchen Fan] refactor handling of date and timestmap	2015-06-29 11:41:26 -07:00
Andrew Or	ea88b1a507	Revert "[SPARK-8372] History server shows incorrect information for application not started" This reverts commit `2837e06709`.	2015-06-29 10:52:05 -07:00
Yu ISHIKAWA	715f084ca0	[SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license` [[SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license` - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8554) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #6947 from yu-iskw/SPARK-8554 and squashes the following commits: 5ca240c [Yu ISHIKAWA] [SPARK-8554] Add the SparkR document files to `.rat-excludes` for `./dev/check-license`	2015-06-29 09:22:55 -07:00
Brennon York	5c796d576e	[SPARK-8693] [PROJECT INFRA] profiles and goals are not printed in a nice way Hotfix to correct formatting errors of print statements within the dev and jenkins builds. Error looks like: ``` -Phadoop-1[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Dhadoop.version=1.0.4[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Pkinesis-asl[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Phive-thriftserver[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Phive[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: package[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: assembly/assembly[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: streaming-kafka-assembly/assembly ``` Author: Brennon York <brennon.york@capitalone.com> Closes #7085 from brennonyork/SPARK-8693 and squashes the following commits: c5575f1 [Brennon York] added commas to end of print statements for proper printing	2015-06-29 08:55:06 -07:00
zsxwing	630bd5fd80	[SPARK-8702] [WEBUI] Avoid massive concating strings in Javascript When there are massive tasks, such as `sc.parallelize(1 to 100000, 10000).count()`, the generated JS codes have a lot of string concatenations in the stage page, nearly 40 string concatenations for one task. We can generate the whole string for a task instead of execution string concatenations in the browser. Before this patch, the load time of the page is about 21 seconds. ![screen shot 2015-06-29 at 6 44 04 pm](https://cloud.githubusercontent.com/assets/1000778/8406644/eb55ed18-1e90-11e5-9ad5-50d27ad1dff1.png) After this patch, it reduces to about 17 seconds. ![screen shot 2015-06-29 at 6 47 34 pm](https://cloud.githubusercontent.com/assets/1000778/8406665/087003ca-1e91-11e5-80a8-3485aa9adafa.png) One disadvantage is that the generated JS codes become hard to read. Author: zsxwing <zsxwing@gmail.com> Closes #7082 from zsxwing/js-string and squashes the following commits: b29231d [zsxwing] Avoid massive concating strings in Javascript	2015-06-30 00:46:55 +09:00
Reynold Xin	660c6cec75	[SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple. Author: Reynold Xin <rxin@databricks.com> Closes #7079 from rxin/SPARK-8698 and squashes the following commits: 8513e1c [Reynold Xin] [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple.	2015-06-29 00:22:44 -07:00
Cheolsoo Park	ac2e17b01c	[SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror Scala I compared PySpark DataFrameReader/Writer against Scala ones. `Option` function is missing in both reader and writer, but the rest seems to all match. I added `Option` to reader and writer and updated the `pyspark-sql` test. Author: Cheolsoo Park <cheolsoop@netflix.com> Closes #7078 from piaozhexiu/SPARK-8355 and squashes the following commits: c63d419 [Cheolsoo Park] Fix version 524e0aa [Cheolsoo Park] Add option function to df reader and writer	2015-06-29 00:13:39 -07:00
BenFradet	0b10662fef	[SPARK-8575] [SQL] Deprecate callUDF in favor of udf Follow up of [SPARK-8356](https://issues.apache.org/jira/browse/SPARK-8356) and #6902. Removes the unit test for the now deprecated ```callUdf``` Unit test in SQLQuerySuite now uses ```udf``` instead of ```callUDF``` Replaced ```callUDF``` by ```udf``` where possible in mllib Author: BenFradet <benjamin.fradet@gmail.com> Closes #6993 from BenFradet/SPARK-8575 and squashes the following commits: 26f5a7a [BenFradet] 2 spaces instead of 1 1ddb452 [BenFradet] renamed initUDF in order to be consistent in OneVsRest 48ca15e [BenFradet] used vector type tag for udf call in VectorIndexer 0ebd0da [BenFradet] replace the now deprecated callUDF by udf in VectorIndexer 8013409 [BenFradet] replaced the now deprecated callUDF by udf in Predictor 94345b5 [BenFradet] unifomized udf calls in ProbabilisticClassifier 1305492 [BenFradet] uniformized udf calls in Classifier a672228 [BenFradet] uniformized udf calls in OneVsRest 49e4904 [BenFradet] Revert "removal of the unit test for the now deprecated callUdf" bbdeaf3 [BenFradet] fixed syntax for init udf in OneVsRest fe2a10b [BenFradet] callUDF => udf in ProbabilisticClassifier 0ea30b3 [BenFradet] callUDF => udf in Classifier where possible 197ec82 [BenFradet] callUDF => udf in OneVsRest 84d6780 [BenFradet] modified unit test in SQLQuerySuite to use udf instead of callUDF 477709f [BenFradet] removal of the unit test for the now deprecated callUdf	2015-06-28 22:43:47 -07:00
Yanbo Liang	dfde31da5c	[SPARK-5962] [MLLIB] Python support for Power Iteration Clustering Python support for Power Iteration Clustering https://issues.apache.org/jira/browse/SPARK-5962 Author: Yanbo Liang <ybliang8@gmail.com> Closes #6992 from yanboliang/pyspark-pic and squashes the following commits: 6b03d82 [Yanbo Liang] address comments 4be4423 [Yanbo Liang] Python support for Power Iteration Clustering	2015-06-28 22:38:04 -07:00
Feynman Liang	25f574eb9a	[SPARK-7212] [MLLIB] Add sequence learning flag Support mining of ordered frequent item sequences. Author: Feynman Liang <fliang@databricks.com> Closes #6997 from feynmanliang/fp-sequence and squashes the following commits: 7c14e15 [Feynman Liang] Improve scalatests with R code and Seq 0d3e4b6 [Feynman Liang] Fix python test ce987cb [Feynman Liang] Backwards compatibility aux constructor 34ef8f2 [Feynman Liang] Fix failing test due to reverse orderering f04bd50 [Feynman Liang] Naming, add ordered to FreqItemsets, test ordering using Seq 648d4d4 [Feynman Liang] Test case for frequent item sequences 252a36a [Feynman Liang] Add sequence learning flag	2015-06-28 22:26:07 -07:00
Cheng Lian	00a9d22bd6	[SPARK-7845] [BUILD] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1 PR #5694 reverted PR #6384 while refactoring `dev/run-tests` to `dev/run-tests.py`. Also, PR #6384 didn't bump Hadoop 1 version defined in POM. Author: Cheng Lian <lian@databricks.com> Closes #7062 from liancheng/spark-7845 and squashes the following commits: c088b72 [Cheng Lian] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1	2015-06-28 19:34:59 -07:00
Liang-Chi Hsieh	24fda73811	[SPARK-8677] [SQL] Fix non-terminating decimal expansion for decimal divide operation JIRA: https://issues.apache.org/jira/browse/SPARK-8677 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #7056 from viirya/fix_decimal3 and squashes the following commits: 34d7419 [Liang-Chi Hsieh] Fix Non-terminating decimal expansion for decimal divide operation.	2015-06-28 14:48:44 -07:00
Vincent D. Warmerdam	9ce78b4343	[SPARK-8596] [EC2] Added port for Rstudio This would otherwise need to be set manually by R users in AWS. https://issues.apache.org/jira/browse/SPARK-8596 Author: Vincent D. Warmerdam <vincentwarmerdam@gmail.com> Author: vincent <vincentwarmerdam@gmail.com> Closes #7068 from koaning/rstudio-port-number and squashes the following commits: ac8100d [vincent] Update spark_ec2.py ce6ad88 [Vincent D. Warmerdam] added port number for rstudio	2015-06-28 13:33:33 -07:00
Kousuke Saruta	ec78438196	[SPARK-8686] [SQL] DataFrame should support `where` with expression represented by String DataFrame supports `filter` function with two types of argument, `Column` and `String`. But `where` doesn't. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #7063 from sarutak/SPARK-8686 and squashes the following commits: 180f9a4 [Kousuke Saruta] Added test d61aec4 [Kousuke Saruta] Add "where" method with String argument to DataFrame	2015-06-28 08:29:07 -07:00
Davies Liu	77da5be6f1	[SPARK-8610] [SQL] Separate Row and InternalRow (part 2) Currently, we use GenericRow both for Row and InternalRow, which is confusing because it could contain Scala type also Catalyst types. This PR changes to use GenericInternalRow for InternalRow (contains catalyst types), GenericRow for Row (contains Scala types). Also fixes some incorrect use of InternalRow or Row. Author: Davies Liu <davies@databricks.com> Closes #7003 from davies/internalrow and squashes the following commits: d05866c [Davies Liu] fix test: rollback changes for pyspark 72878dd [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow efd0b25 [Davies Liu] fix copy of MutableRow 87b13cf [Davies Liu] fix test d2ebd72 [Davies Liu] fix style eb4b473 [Davies Liu] mark expensive API as final bd4e99c [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow bdfb78f [Davies Liu] remove BaseMutableRow 6f99a97 [Davies Liu] fix catalyst test defe931 [Davies Liu] remove BaseRow 288b31f [Davies Liu] Merge branch 'master' of github.com:apache/spark into internalrow 9d24350 [Davies Liu] separate Row and InternalRow (part 2)	2015-06-28 08:03:58 -07:00
Thomas Szymanski	52d1281801	[SPARK-8649] [BUILD] Mapr repository is not defined properly The previous commiter on this part was pwendell The previous url gives 404, the new one seems to be OK. This patch is added under the Apache License 2.0. The JIRA link: https://issues.apache.org/jira/browse/SPARK-8649 Author: Thomas Szymanski <develop@tszymanski.com> Closes #7054 from tszym/SPARK-8649 and squashes the following commits: bfda9c4 [Thomas Szymanski] [SPARK-8649] [BUILD] Mapr repository is not defined properly	2015-06-28 01:06:49 -07:00
Josh Rosen	f51004519c	[SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers. See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details. Author: Josh Rosen <joshrosen@databricks.com> Closes #7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits: 70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.	2015-06-27 23:27:52 -07:00
Josh Rosen	42db3a1c2f	[HOTFIX] Fix pull request builder bug in #6967	2015-06-27 23:07:20 -07:00

1 2 3 4 5 ...

11730 commits