ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Daoyuan Wang	f6df609dcc	[SPARK-4593][SQL] Return null when denominator is 0 SELECT max(1/0) FROM src would return a very large number, which is obviously not right. For hive-0.12, hive would return `Infinity` for 1/0, while for hive-0.13.1, it is `NULL` for 1/0. I think it is better to keep our behavior with newer Hive version. This PR ensures that when the divider is 0, the result of expression should be NULL, same with hive-0.13.1 Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3443 from adrian-wang/div and squashes the following commits: 2e98677 [Daoyuan Wang] fix code gen for divide 0 85c28ba [Daoyuan Wang] temp 36236a5 [Daoyuan Wang] add test cases 6f5716f [Daoyuan Wang] fix comments cee92bd [Daoyuan Wang] avoid evaluation 2 times 22ecd9a [Daoyuan Wang] fix style cf28c58 [Daoyuan Wang] divide fix 2dfe50f [Daoyuan Wang] return null when divider is 0 of Double type	2014-12-02 14:21:47 -08:00
YanTangZhai	1066427600	[SPARK-4676][SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null val jsc = new org.apache.spark.api.java.JavaSparkContext(sc) val jhc = new org.apache.spark.sql.hive.api.java.JavaHiveContext(jsc) val nrdd = jhc.hql("select null from spark_test.for_test") println(nrdd.schema) Then the error is thrown as follows: scala.MatchError: NullType (of class org.apache.spark.sql.catalyst.types.NullType$) at org.apache.spark.sql.types.util.DataTypeConversions$.asJavaDataType(DataTypeConversions.scala:43) Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Author: Michael Armbrust <michael@databricks.com> Closes #3538 from YanTangZhai/MatchNullType and squashes the following commits: e052dff [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 4b4bb34 [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 896c7b7 [yantangzhai] fix NullType MatchError in JavaSchemaRDD when sql has null 6e643f8 [YanTangZhai] Merge pull request #11 from apache/master e249846 [YanTangZhai] Merge pull request #10 from apache/master d26d982 [YanTangZhai] Merge pull request #9 from apache/master 76d4027 [YanTangZhai] Merge pull request #8 from apache/master 03b62b0 [YanTangZhai] Merge pull request #7 from apache/master 8a00106 [YanTangZhai] Merge pull request #6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master	2014-12-02 14:15:12 -08:00
baishuo	69b6fed206	[SPARK-4663][sql]add finally to avoid resource leak Author: baishuo <vc_java@hotmail.com> Closes #3526 from baishuo/master-trycatch and squashes the following commits: d446e14 [baishuo] correct the code style b36bf96 [baishuo] correct the code style ae0e447 [baishuo] add finally to avoid resource leak	2014-12-02 12:12:03 -08:00
Kousuke Saruta	e75e04f980	[SPARK-4536][SQL] Add sqrt and abs to Spark SQL DSL Spark SQL has embeded sqrt and abs but DSL doesn't support those functions. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3401 from sarutak/dsl-missing-operator and squashes the following commits: 07700cf [Kousuke Saruta] Modified Literal(null, NullType) to Literal(null) in DslQuerySuite 8f366f8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator 1b88e2e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator 0396f89 [Kousuke Saruta] Added sqrt and abs to Spark SQL DSL	2014-12-02 12:07:52 -08:00
Reynold Xin	b1f8fe316a	Indent license header properly for interfaces.scala. A very small nit update. Author: Reynold Xin <rxin@databricks.com> Closes #3552 from rxin/license-header and squashes the following commits: df8d1a4 [Reynold Xin] Indent license header properly for interfaces.scala.	2014-12-02 11:59:15 -08:00
Kay Ousterhout	d9a148ba6a	[SPARK-4686] Link to allowed master URLs is broken The link points to the old scala programming guide; it should point to the submitting applications page. This should be backported to 1.1.2 (it's been broken as of 1.0). Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #3542 from kayousterhout/SPARK-4686 and squashes the following commits: a8fc43b [Kay Ousterhout] [SPARK-4686] Link to allowed master URLs is broken	2014-12-02 09:06:02 -08:00
zsxwing	6dfe38a03a	[SPARK-4397][Core] Cleanup 'import SparkContext._' in core This PR cleans up `import SparkContext._` in core for SPARK-4397(#3262) to prove it really works well. Author: zsxwing <zsxwing@gmail.com> Closes #3530 from zsxwing/SPARK-4397-cleanup and squashes the following commits: 04e2273 [zsxwing] Cleanup 'import SparkContext._' in core	2014-12-02 00:18:41 -08:00
DB Tsai	64f3175bf9	[SPARK-4611][MLlib] Implement the efficient vector norm The vector norm in breeze is implemented by `activeIterator` which is known to be very slow. In this PR, an efficient vector norm is implemented, and with this API, `Normalizer` and `k-means` have big performance improvement. Here is the benchmark against mnist8m dataset. a) `Normalizer` Before DenseVector: 68.25secs SparseVector: 17.01secs With this PR DenseVector: 12.71secs SparseVector: 2.73secs b) `k-means` Before DenseVector: 83.46secs SparseVector: 61.60secs With this PR DenseVector: 70.04secs SparseVector: 59.05secs Author: DB Tsai <dbtsai@alpinenow.com> Closes #3462 from dbtsai/norm and squashes the following commits: 63c7165 [DB Tsai] typo 0c3637f [DB Tsai] add import org.apache.spark.SparkContext._ back 6fa616c [DB Tsai] address feedback 9b7cb56 [DB Tsai] move norm to static method 0b632e6 [DB Tsai] kmeans dbed124 [DB Tsai] style c1a877c [DB Tsai] first commit	2014-12-02 11:40:43 +08:00
Patrick Wendell	b0a46d8995	MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #1612 (close requested by 'marmbrus') Closes #2723 (close requested by 'marmbrus') Closes #1737 (close requested by 'marmbrus') Closes #2252 (close requested by 'marmbrus') Closes #2029 (close requested by 'marmbrus') Closes #2386 (close requested by 'marmbrus') Closes #2997 (close requested by 'marmbrus')	2014-12-01 17:27:14 -08:00
zsxwing	d3e02dddf0	[SPARK-4268][SQL] Use #::: to get benefit from Stream in SqlLexical.allCaseVersions In addition, using `s.isEmpty` to eliminate the string comparison. Author: zsxwing <zsxwing@gmail.com> Closes #3132 from zsxwing/SPARK-4268 and squashes the following commits: 358e235 [zsxwing] Improvement of allCaseVersions	2014-12-01 16:39:54 -08:00
Daoyuan Wang	4df60a8cbc	[SPARK-4529] [SQL] support view with column alias Support view definition like CREATE VIEW view3(valoo) TBLPROPERTIES ("fear" = "factor") AS SELECT upper(value) FROM src WHERE key=86; [valoo as the alias of upper(value)]. This is missing part of SPARK-4239, for a fully view support. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3396 from adrian-wang/viewcolumn and squashes the following commits: 4d001d0 [Daoyuan Wang] support view with column alias	2014-12-01 16:08:51 -08:00
Daoyuan Wang	5edbcbfb61	[SQL][DOC] Date type in SQL programming guide Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3535 from adrian-wang/datedoc and squashes the following commits: 18ff1ed [Daoyuan Wang] [DOC] Date type	2014-12-01 14:04:07 -08:00
wangfei	7b79957879	[SQL] Minor fix for doc and comment Author: wangfei <wangfei1@huawei.com> Closes #3533 from scwf/sql-doc1 and squashes the following commits: 962910b [wangfei] doc and comment fix	2014-12-01 14:02:02 -08:00
ravipesala	bc353819cc	[SPARK-4658][SQL] Code documentation issue in DDL of datasource API Author: ravipesala <ravindra.pesala@huawei.com> Closes #3516 from ravipesala/ddl_doc and squashes the following commits: d101fdf [ravipesala] Style issues fixed d2238cd [ravipesala] Corrected documentation	2014-12-01 13:31:27 -08:00
ravipesala	6a9ff19dc0	[SPARK-4650][SQL] Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL Author: ravipesala <ravindra.pesala@huawei.com> Author: Michael Armbrust <michael@databricks.com> Closes #3511 from ravipesala/countdistinct and squashes the following commits: cc4dbb1 [ravipesala] style 070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL	2014-12-01 13:28:04 -08:00
Liang-Chi Hsieh	b57365a1ec	[SPARK-4358][SQL] Let BigDecimal do checking type compatibility Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3208 from viirya/more_numericLit and squashes the following commits: e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal. 1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer. cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast. 91fe489 [Liang-Chi Hsieh] add Byte and Short. 1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility.	2014-12-01 13:17:56 -08:00
Jacky Li	bafee67eba	[SQL] add @group tab in limit() and count() group tab is missing for scaladoc Author: Jacky Li <jacky.likun@gmail.com> Closes #3458 from jackylk/patch-7 and squashes the following commits: 0121a70 [Jacky Li] add @group tab in limit() and count()	2014-12-01 13:12:30 -08:00
Cheng Lian	5db8dcaf49	[SPARK-4258][SQL][DOC] Documents spark.sql.parquet.filterPushdown Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits: 2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown	2014-12-01 13:09:51 -08:00
Madhu Siddalingaiah	2b233f5fc4	Documentation: add description for repartitionAndSortWithinPartitions Author: Madhu Siddalingaiah <madhu@madhu.com> Closes #3390 from msiddalingaiah/master and squashes the following commits: cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again) 332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code> cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions	2014-12-01 08:45:34 -08:00
zsxwing	30a86acdef	[SPARK-4661][Core] Minor code and docs cleanup Author: zsxwing <zsxwing@gmail.com> Closes #3521 from zsxwing/SPARK-4661 and squashes the following commits: 03cbe3f [zsxwing] Minor code and docs cleanup	2014-12-01 00:35:01 -08:00
zsxwing	1d238f221c	[SPARK-4664][Core] Throw an exception when spark.akka.frameSize > 2047 If `spark.akka.frameSize` > 2047, it will overflow and become negative. Should have some assertion in `maxFrameSizeBytes` to warn people. Author: zsxwing <zsxwing@gmail.com> Closes #3527 from zsxwing/SPARK-4664 and squashes the following commits: 0089c7a [zsxwing] Throw an exception when spark.akka.frameSize > 2047	2014-12-01 00:32:54 -08:00
Sean Owen	6384f42ab2	SPARK-2192 [BUILD] Examples Data Not in Binary Distribution Simply, add data/ to distributions. This adds about 291KB (compressed) to the tarball, FYI. Author: Sean Owen <sowen@cloudera.com> Closes #3480 from srowen/SPARK-2192 and squashes the following commits: 47688f1 [Sean Owen] Add data/ to distributions	2014-12-01 16:31:04 +08:00
Kousuke Saruta	97eb6d7f51	Fix wrong file name pattern in .gitignore In .gitignore, there is an entry for spark--bin.tar.gz but considering make-distribution.sh, the name pattern should be spark--bin-*.tgz. This change is really small so I don't open issue in JIRA. If it's needed, please let me know. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3529 from sarutak/fix-wrong-tgz-pattern and squashes the following commits: de3c70a [Kousuke Saruta] Fixed wrong file name pattern in .gitignore	2014-12-01 00:29:28 -08:00
Prabeesh K	5e7a6dcb8f	[SPARK-4632] version update Author: Prabeesh K <prabsmails@gmail.com> Closes #3495 from prabeesh/master and squashes the following commits: ab03d50 [Prabeesh K] Update pom.xml 8c6437e [Prabeesh K] Revert e10b40a [Prabeesh K] version update dbac9eb [Prabeesh K] Revert ec0b1c3 [Prabeesh K] [SPARK-4632] version update a835505 [Prabeesh K] [SPARK-4632] version update 831391b [Prabeesh K] [SPARK-4632] version update	2014-11-30 20:51:53 -08:00
Patrick Wendell	06dc1b15e4	MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #2915 (close requested by 'JoshRosen') Closes #3140 (close requested by 'JoshRosen') Closes #3366 (close requested by 'JoshRosen')	2014-11-30 20:51:13 -08:00
Cheng Lian	2a4d389f70	[DOC] Fixes formatting typo in SQL programming guide <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3498 from liancheng/fix-sql-doc-typo and squashes the following commits: 865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide	2014-11-30 19:04:07 -08:00
lewuathe	a217ec5fd5	[SPARK-4656][Doc] Typo in Programming Guide markdown Grammatical error in Programming Guide document Author: lewuathe <lewuathe@me.com> Closes #3412 from Lewuathe/typo-programming-guide and squashes the following commits: a3e2f00 [lewuathe] Typo in Programming Guide markdown	2014-11-30 17:18:50 -08:00
carlmartin	aea7a99761	[SPARK-4623]Add the some error infomation if using spark-sql in yarn-cluster mode If using spark-sql in yarn-cluster mode, print an error infomation just as the spark shell in yarn-cluster mode. Author: carlmartin <carlmartinmax@gmail.com> Author: huangzhaowei <carlmartinmax@gmail.com> Closes #3479 from SaintBacchus/sparkSqlShell and squashes the following commits: 35829a9 [carlmartin] improve the description of comment e6c1eb7 [carlmartin] add a comment in bin/spark-sql to remind user who wants to change the class f1c5c8d [carlmartin] Merge branch 'master' into sparkSqlShell 8e112c5 [huangzhaowei] singular form ec957bc [carlmartin] Add the some error infomation if using spark-sql in yarn-cluster mode 7bcecc2 [carlmartin] Merge branch 'master' of https://github.com/apache/spark into codereview 4fad75a [carlmartin] Add the Error infomation using spark-sql in yarn-cluster mode	2014-11-30 16:19:41 -08:00
Sean Owen	048ecca625	SPARK-2143 [WEB UI] Add Spark version to UI footer This PR adds the Spark version number to the UI footer; this is how it looks: ![screen shot 2014-11-21 at 22 58 40](https://cloud.githubusercontent.com/assets/822522/5157738/f4822094-7316-11e4-98f1-333a535fdcfa.png) Author: Sean Owen <sowen@cloudera.com> Closes #3410 from srowen/SPARK-2143 and squashes the following commits: e9b3a7a [Sean Owen] Add Spark version to footer	2014-11-30 11:41:38 -08:00
Takuya UESHIN	0fcd24cc54	[DOCS][BUILD] Add instruction to use change-version-to-2.11.sh in 'Building for Scala 2.11'. To build with Scala 2.11, we have to execute `change-version-to-2.11.sh` before Maven execute, otherwise inter-module dependencies are broken. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3361 from ueshin/docs/building-spark_2.11 and squashes the following commits: 1d29126 [Takuya UESHIN] Add instruction to use change-version-to-2.11.sh in 'Building for Scala 2.11'.	2014-11-30 00:10:31 -05:00
Takayuki Hasegawa	4316a7b010	SPARK-4507: PR merge script should support closing multiple JIRA tickets This will fix SPARK-4507. For pull requests that reference multiple JIRAs in their titles, it would be helpful if the PR merge script offered to close all of them. Author: Takayuki Hasegawa <takayuki.hasegawa0311@gmail.com> Closes #3428 from hase1031/SPARK-4507 and squashes the following commits: bf6d64b [Takayuki Hasegawa] SPARK-4507: try to resolve issue when no JIRAs in title 401224c [Takayuki Hasegawa] SPARK-4507: moved codes as before ce89021 [Takayuki Hasegawa] SPARK-4507: PR merge script should support closing multiple JIRA tickets	2014-11-29 23:12:10 -05:00
zsxwing	c06222427f	[SPARK-4505][Core] Add a ClassTag parameter to CompactBuffer[T] Added a ClassTag parameter to CompactBuffer. So CompactBuffer[T] can create primitive arrays for primitive types. It will reduce the memory usage for primitive types significantly and only pay minor performance lost. Here is my test code: ```Scala // Call org.apache.spark.util.SizeEstimator.estimate def estimateSize(obj: AnyRef): Long = { val c = Class.forName("org.apache.spark.util.SizeEstimator$") val f = c.getField("MODULE$") val o = f.get(c) val m = c.getMethod("estimate", classOf[Object]) m.setAccessible(true) m.invoke(o, obj).asInstanceOf[Long] } sc.parallelize(1 to 10000).groupBy(_ => 1).foreach { case (k, v) => println(v.getClass() + " size: " + estimateSize(v)) } ``` Using the previous CompactBuffer outputed ``` class org.apache.spark.util.collection.CompactBuffer size: 313358 ``` Using the new CompactBuffer outputed ``` class org.apache.spark.util.collection.CompactBuffer size: 65712 ``` In this case, the new `CompactBuffer` only used 20% memory of the previous one. It's really helpful for `groupByKey` when using a primitive value. Author: zsxwing <zsxwing@gmail.com> Closes #3378 from zsxwing/SPARK-4505 and squashes the following commits: 4abdbba [zsxwing] Add a ClassTag parameter to reduce the memory usage of CompactBuffer[T] when T is a primitive type	2014-11-29 20:23:08 -05:00
Kousuke Saruta	938dc141ee	[SPARK-4057] Use -agentlib instead of -Xdebug in sbt-launch-lib.bash for debugging In -launch-lib.bash, -Xdebug option is used for debugging. We should use -agentlib option for Java 6+. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2904 from sarutak/SPARK-4057 and squashes the following commits: 39b5320 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4057 26b4af8 [Kousuke Saruta] Improved java option for debugging	2014-11-29 20:14:14 -05:00
Stephen Haberman	95290bf4c4	Include the key name when failing on an invalid value. Admittedly a really small tweak. Author: Stephen Haberman <stephen@exigencecorp.com> Closes #3514 from stephenh/include-key-name-in-npe and squashes the following commits: 937740a [Stephen Haberman] Include the key name when failing on an invalid value.	2014-11-29 20:12:05 -05:00
Nicholas Chammas	317e114e11	[SPARK-3398] [SPARK-4325] [EC2] Use EC2 status checks. This PR re-introduces [0e648bc](`0e648bc2be`) from PR #2339, which somehow never made it into the codebase. Additionally, it removes a now-unnecessary linear backoff on the SSH checks since we are blocking on EC2 status checks before testing SSH. Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #3195 from nchammas/remove-ec2-ssh-backoff and squashes the following commits: efb29e1 [Nicholas Chammas] Revert "Remove linear backoff." ef3ca99 [Nicholas Chammas] reuse conn adb4eaa [Nicholas Chammas] Remove linear backoff. 55caa24 [Nicholas Chammas] Check EC2 status checks before SSH.	2014-11-29 00:31:06 -08:00
Patrick Wendell	047ff573f7	MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #3451 (close requested by 'pwendell') Closes #1310 (close requested by 'pwendell') Closes #3207 (close requested by 'JoshRosen')	2014-11-29 00:24:35 -05:00
Liang-Chi Hsieh	49fe8797e6	[SPARK-4597] Use proper exception and reset variable in Utils.createTempDir() `File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3449 from viirya/fix_createtempdir and squashes the following commits: 36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable.	2014-11-28 18:04:15 -08:00
Sean Owen	48223d8815	SPARK-1450 [EC2] Specify the default zone in the EC2 script help This looks like a one-liner, so I took a shot at it. There can be no fixed default availability zone since the names are different per region. But the default behavior can be documented: ``` if opts.zone == "": opts.zone = random.choice(conn.get_all_zones()).name ``` Author: Sean Owen <sowen@cloudera.com> Closes #3454 from srowen/SPARK-1450 and squashes the following commits: 9193cf3 [Sean Owen] Document that --zone defaults to a single random zone	2014-11-28 17:43:38 -05:00
Marcelo Vanzin	915f8eeb3a	[SPARK-4584] [yarn] Remove security manager from Yarn AM. The security manager adds a lot of overhead to the runtime of the app, and causes a severe performance regression. Even stubbing out all unneeded methods (all except checkExit()) does not help. So, instead, penalize users who do an explicit System.exit() by leaving them in "undefined behavior" territory: if they do that, the Yarn backend won't be able to report the final app status to the RM. The result is that the final status of the application might not match the user's expectations. One side-effect of the change is that users who do an explicit System.exit() will lose the AM retry functionality. Since there is no way to know if the exit was because of success or failure, the AM right now errs on the side of it being a successful exit. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3484 from vanzin/SPARK-4584 and squashes the following commits: 21f2502 [Marcelo Vanzin] Do not retry apps that use System.exit(). 4198b3b [Marcelo Vanzin] [SPARK-4584] [yarn] Remove security manager from Yarn AM.	2014-11-28 15:16:05 -05:00
Takuya UESHIN	e464f0ac2d	[SPARK-4193][BUILD] Disable doclint in Java 8 to prevent from build error. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3058 from ueshin/issues/SPARK-4193 and squashes the following commits: e096bb1 [Takuya UESHIN] Add a plugin declaration to pluginManagement. 6762ec2 [Takuya UESHIN] Fix usage of -Xdoclint javadoc option. fdb280a [Takuya UESHIN] Fix Javadoc errors. 4745f3c [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4193 923e2f0 [Takuya UESHIN] Use doclint option `-missing` instead of `none`. 30d6718 [Takuya UESHIN] Fix Javadoc errors. b548017 [Takuya UESHIN] Disable doclint in Java 8 to prevent from build error.	2014-11-28 13:00:15 -05:00
Daoyuan Wang	53ed7f1c7f	[SPARK-4643] [Build] Remove unneeded staging repositories from build The old location will return a 404. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3504 from adrian-wang/repo and squashes the following commits: f604e05 [Daoyuan Wang] already in maven central, remove at all f494fac [Daoyuan Wang] spark staging repo outdated	2014-11-28 12:41:51 -05:00
KaiXinXiaoLei	052e65815f	Delete unnecessary function when building spark by sbt, the function “runAlternateBoot" in sbt/sbt-launch-lib.bash is not used. And this function is not used by spark code. So I think this function is not necessary. And the option of "sbt.boot.properties" can be configured in the command line when building spark, eg: sbt/sbt assembly -Dsbt.boot.properties=$bootpropsfile. The file from https://github.com/sbt/sbt-launcher-package is changed. And the function “runAlternateBoot" is deleted in upstream project. I think spark project should delete this function in file sbt/sbt-launch-lib.bash. Thanks. Author: KaiXinXiaoLei <huleilei1@huawei.com> Closes #3224 from KaiXinXiaoLei/deleteFunction and squashes the following commits: e8eac49 [KaiXinXiaoLei] Delete blank lines. efe36d4 [KaiXinXiaoLei] Delete unnecessary function	2014-11-28 12:34:07 -05:00
Cheng Lian	5b99bf243e	[SPARK-4645][SQL] Disables asynchronous execution in Hive 0.13.1 HiveThriftServer2 This PR disables HiveThriftServer2 asynchronous execution by setting `runInBackground` argument in `ExecuteStatementOperation` to `false`, and reverting `SparkExecuteStatementOperation.run` in Hive 13 shim to Hive 12 version. This change makes Simba ODBC driver v1.0.0.1000 work. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3506) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3506 from liancheng/disable-async-exec and squashes the following commits: 593804d [Cheng Lian] Disables asynchronous execution in Hive 0.13.1 HiveThriftServer2	2014-11-28 11:42:40 -05:00
maji2014	ceb6281970	[SPARK-4619][Storage]delete redundant time suffix Time suffix exists in Utils.getUsedTimeMs(startTime), no need to append again, delete that Author: maji2014 <maji3@asiainfo.com> Closes #3475 from maji2014/SPARK-4619 and squashes the following commits: df0da4e [maji2014] delete redundant time suffix	2014-11-28 00:36:22 -08:00
Cheng Lian	120a350240	[SPARK-4613][Core] Java API for JdbcRDD This PR introduces a set of Java APIs for using `JdbcRDD`: 1. Trait (interface) `JdbcRDD.ConnectionFactory`: equivalent to the `getConnection: () => Connection` parameter in `JdbcRDD` constructor. 2. Two overloaded versions of `Jdbc.create`: used to create `JavaRDD` that wraps a `JdbcRDD`. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3478) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3478 from liancheng/japi-jdbc-rdd and squashes the following commits: 9a54625 [Cheng Lian] Only shutdowns a single DB rather than the whole Derby driver d4cedc5 [Cheng Lian] Moves Java JdbcRDD test case to a separate test suite ffcdf2e [Cheng Lian] Java API for JdbcRDD	2014-11-27 18:01:14 -08:00
roxchkplusony	84376d3139	[SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler Author: roxchkplusony <roxchkplusony@gmail.com> Closes #3483 from roxchkplusony/bugfix/4626 and squashes the following commits: aba9184 [roxchkplusony] replace warning message per review 5e7fdea [roxchkplusony] [SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler	2014-11-27 15:54:40 -08:00
Sean Owen	5d7fe178b3	SPARK-4170 [CORE] Closure problems when running Scala app that "extends App" Warn against subclassing scala.App, and remove one instance of this in examples Author: Sean Owen <sowen@cloudera.com> Closes #3497 from srowen/SPARK-4170 and squashes the following commits: 4a6131f [Sean Owen] Restore multiline string formatting a8ca895 [Sean Owen] Warn against subclassing scala.App, and remove one instance of this in examples	2014-11-27 09:03:17 -08:00
Andrew Or	c86e9bc4fd	[Release] Automate generation of contributors list This commit provides a script that computes the contributors list by linking the github commits with JIRA issues. Automatically translating github usernames remains a TODO at this point.	2014-11-26 23:16:23 -08:00
CodingCat	5af53ada65	[SPARK-732][SPARK-3628][CORE][RESUBMIT] eliminate duplicate update on accmulator https://issues.apache.org/jira/browse/SPARK-3628 In current implementation, the accumulator will be updated for every successfully finished task, even the task is from a resubmitted stage, which makes the accumulator counter-intuitive In this patch, I changed the way for the DAGScheduler to update the accumulator, DAGScheduler maintains a HashTable, mapping the stage id to the received <accumulator_id , value> pairs. Only when the stage becomes independent, (no job needs it any more), we accumulate the values of the <accumulator_id , value> pairs, when a task finished, we check if the HashTable has contained such stageId, it saves the accumulator_id, value only when the task is the first finished task of a new stage or the stage is running for the first attempt... Author: CodingCat <zhunansjtu@gmail.com> Closes #2524 from CodingCat/SPARK-732-1 and squashes the following commits: 701a1e8 [CodingCat] roll back change on Accumulator.scala 1433e6f [CodingCat] make MIMA happy b233737 [CodingCat] address Matei's comments 02261b8 [CodingCat] rollback some changes 6b0aff9 [CodingCat] update document 2b2e8cf [CodingCat] updateAccumulator 83b75f8 [CodingCat] style fix 84570d2 [CodingCat] re-enable the bad accumulator guard 1e9e14d [CodingCat] add NPE guard 21b6840 [CodingCat] simplify the patch 88d1f03 [CodingCat] fix rebase error f74266b [CodingCat] add test case for resubmitted result stage 5cf586f [CodingCat] de-duplicate on task level 138f9b3 [CodingCat] make MIMA happy 67593d2 [CodingCat] make if allowing duplicate update as an option of accumulator	2014-11-26 16:52:04 -08:00
Xiangrui Meng	561d31d2f1	[SPARK-4614][MLLIB] Slight API changes in Matrix and Matrices Before we have a full picture of the operators we want to add, it might be safer to hide `Matrix.transposeMultiply` in 1.2.0. Another update we want to change is `Matrix.randn` and `Matrix.rand`, both of which should take a `Random` implementation. Otherwise, it is very likely to produce inconsistent RDDs. I also added some unit tests for matrix factory methods. All APIs are new in 1.2, so there is no incompatible changes. brkyvz Author: Xiangrui Meng <meng@databricks.com> Closes #3468 from mengxr/SPARK-4614 and squashes the following commits: 3b0e4e2 [Xiangrui Meng] add mima excludes 6bfd8a4 [Xiangrui Meng] hide transposeMultiply; add rng to rand and randn; add unit tests	2014-11-26 08:22:50 -08:00

... 3 4 5 6 7 ...

9153 commits