ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matthew Farrellee	be0c7563ea	[SPARK-1701] Clarify slice vs partition in the programming guide This is a partial solution to SPARK-1701, only addressing the documentation confusion. Additional work can be to actually change the numSlices parameter name across languages, with care required for scala & python to maintain backward compatibility for named parameters. Author: Matthew Farrellee <matt@redhat.com> Closes #2305 from mattf/SPARK-1701 and squashes the following commits: c0af05d [Matthew Farrellee] Further tweak 06f80fc [Matthew Farrellee] Wording tweak from Josh Rosen's review 7b045e0 [Matthew Farrellee] [SPARK-1701] Clarify slice vs partition in the programming guide	2014-09-19 14:31:50 -07:00
Patrick Wendell	a48956f582	MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #726 (close requested by 'pwendell') Closes #151 (close requested by 'pwendell')	2014-09-19 10:49:42 -07:00
Larry Xiao	3bbbdd8180	[SPARK-2062][GraphX] VertexRDD.apply does not use the mergeFunc VertexRDD.apply had a bug where it ignored the merge function for duplicate vertices and instead used whichever vertex attribute occurred first. This commit fixes the bug by passing the merge function through to ShippableVertexPartition.apply, which merges any duplicates using the merge function and then fills in missing vertices using the specified default vertex attribute. This commit also adds a unit test for VertexRDD.apply. Author: Larry Xiao <xiaodi@sjtu.edu.cn> Author: Blie Arkansol <xiaodi@sjtu.edu.cn> Author: Ankur Dave <ankurdave@gmail.com> Closes #1903 from larryxiao/2062 and squashes the following commits: 625aa9d [Blie Arkansol] Merge pull request #1 from ankurdave/SPARK-2062 476770b [Ankur Dave] ShippableVertexPartition.initFrom: Don't run mergeFunc on default values 614059f [Larry Xiao] doc update: note about the default null value vertices construction dfdb3c9 [Larry Xiao] minor fix 1c70366 [Larry Xiao] scalastyle check: wrap line, parameter list indent 4 spaces e4ca697 [Larry Xiao] [TEST] VertexRDD.apply mergeFunc 6a35ea8 [Larry Xiao] [TEST] VertexRDD.apply mergeFunc 4fbc29c [Blie Arkansol] undo unnecessary change efae765 [Larry Xiao] fix mistakes: should be able to call with or without mergeFunc b2422f9 [Larry Xiao] Merge branch '2062' of github.com:larryxiao/spark into 2062 52dc7f7 [Larry Xiao] pass mergeFunc to VertexPartitionBase, where merge is handled 581e9ee [Larry Xiao] TODO: VertexRDDSuite 20d80a3 [Larry Xiao] [SPARK-2062][GraphX] VertexRDD.apply does not use the mergeFunc	2014-09-18 23:33:18 -07:00
Burak	e76ef5cb8e	[SPARK-3418] Sparse Matrix support (CCS) and additional native BLAS operations added Local `SparseMatrix` support added in Compressed Column Storage (CCS) format in addition to Level-2 and Level-3 BLAS operations such as dgemv and dgemm respectively. BLAS doesn't support sparse matrix operations, therefore support for `SparseMatrix`-`DenseMatrix` multiplication and `SparseMatrix`-`DenseVector` implementations have been added. I will post performance comparisons in the comments momentarily. Author: Burak <brkyvz@gmail.com> Closes #2294 from brkyvz/SPARK-3418 and squashes the following commits: 88814ed [Burak] Hopefully fixed MiMa this time 47e49d5 [Burak] really fixed MiMa issue f0bae57 [Burak] [SPARK-3418] Fixed MiMa compatibility issues (excluded from check) 4b7dbec [Burak] 9/17 comments addressed 7af2f83 [Burak] sealed traits Vector and Matrix d3a8a16 [Burak] [SPARK-3418] Squashed missing alpha bug. 421045f [Burak] [SPARK-3418] New code review comments addressed f35a161 [Burak] [SPARK-3418] Code review comments addressed and multiplication further optimized 2508577 [Burak] [SPARK-3418] Fixed one more style issue d16e8a0 [Burak] [SPARK-3418] Fixed style issues and added documentation for methods 204a3f7 [Burak] [SPARK-3418] Fixed failing Matrix unit test 6025297 [Burak] [SPARK-3418] Fixed Scala-style errors dc7be71 [Burak] [SPARK-3418][MLlib] Matrix unit tests expanded with indexing and updating d2d5851 [Burak] [SPARK-3418][MLlib] Sparse Matrix support and additional native BLAS operations added	2014-09-18 22:18:51 -07:00
Davies Liu	e77fa81a61	[SPARK-3554] [PySpark] use broadcast automatically for large closure Py4j can not handle large string efficiently, so we should use broadcast for large closure automatically. (Broadcast use local filesystem to pass through data). Author: Davies Liu <davies.liu@gmail.com> Closes #2417 from davies/command and squashes the following commits: fbf4e97 [Davies Liu] bugfix aefd508 [Davies Liu] use broadcast automatically for large closure	2014-09-18 18:11:48 -07:00
Andrew Or	9306297d1d	[Minor Hot Fix] Move a line in SparkSubmit to the right place This was introduced in #2449 Author: Andrew Or <andrewor14@gmail.com> Closes #2452 from andrewor14/standalone-hot-fix and squashes the following commits: d5190ca [Andrew Or] Put that line in the right place	2014-09-18 17:49:28 -07:00
Victsm	b3ed37e5ba	[SPARK-3560] Fixed setting spark.jars system property in yarn-cluster mode Author: Victsm <victor.nju@gmail.com> Author: Min Shen <mshen@linkedin.com> Closes #2449 from Victsm/SPARK-3560 and squashes the following commits: 918405a [Victsm] Removed the additional space 4502a2a [Min Shen] [SPARK-3560] Fixed setting spark.jars system property in yarn-cluster mode. (cherry picked from commit `832dff64dd`) Signed-off-by: Andrew Or <andrewor14@gmail.com>	2014-09-18 15:58:29 -07:00
WangTaoTheTonic	471e6a3a47	[SPARK-3589][Minor]remove redundant code https://issues.apache.org/jira/browse/SPARK-3589 "export CLASSPATH" in spark-class is redundant since same variable is exported before. We could reuse defined value "isYarnCluster" in SparkSubmit.scala. Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #2445 from WangTaoTheTonic/removeRedundant and squashes the following commits: 6fb6872 [WangTaoTheTonic] remove redundant code	2014-09-18 12:07:53 -07:00
Kousuke Saruta	6cab838b98	[SPARK-3566] [BUILD] .gitignore and .rat-excludes should consider Windows cmd file and Emacs' backup files Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2426 from sarutak/emacs-metafiles-ignore and squashes the following commits: a306020 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into emacs-metafiles-ignore 6a0a5eb [Kousuke Saruta] Added cmd file entry to .rat-excludes and .gitignore 897da63 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into emacs-metafiles-ignore 8cade06 [Kousuke Saruta] Modified .gitignore to ignore emacs lock file and backup file	2014-09-18 12:04:32 -07:00
Patrick Wendell	3ad4176cf9	SPARK-3579 Jekyll doc generation is different across environments. This patch makes some small changes to fix this problem: 1. We document specific versions of Jekyll/Kramdown to use that match those used when building the upstream docs. 2. We add a configuration for a property that for some reason varies across packages of Jekyll/Kramdown even with the same version. Author: Patrick Wendell <pwendell@gmail.com> Closes #2443 from pwendell/jekyll and squashes the following commits: 54ee2ab [Patrick Wendell] SPARK-3579 Jekyll doc generation is different across environments.	2014-09-18 10:30:17 -07:00
WangTaoTheTonic	3447d10090	[SPARK-3547]Using a special exit code instead of 1 to represent ClassNotFoundExcepti... ...on As improvement of https://github.com/apache/spark/pull/1944, we should use more special exit code to represent ClassNotFoundException. Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #2421 from WangTaoTheTonic/classnotfoundExitCode and squashes the following commits: 645a22a [WangTaoTheTonic] Serveral typos to trigger Jenkins d6ae559 [WangTaoTheTonic] use 101 instead a2d6465 [WangTaoTheTonic] use 127 instead fbb232f [WangTaoTheTonic] Using a special exit code instead of 1 to represent ClassNotFoundException	2014-09-18 10:17:18 -07:00
GuoQiang Li	6772afec2f	[Minor] rat exclude dependency-reduced-pom.xml Author: GuoQiang Li <witgo@qq.com> Closes #2326 from witgo/rat-excludes and squashes the following commits: 860904e [GuoQiang Li] rat exclude dependency-reduced-pom.xml	2014-09-17 22:54:34 -07:00
Nicholas Chammas	5547fa1ee9	[SPARK-3534] Add hive-thriftserver to SQL tests Addresses the problem pointed out in [this comment](https://github.com/apache/spark/pull/2441#issuecomment-55990116). Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #2442 from nchammas/patch-1 and squashes the following commits: 7e68b60 [Nicholas Chammas] [SPARK-3534] Add hive-thriftserver to SQL tests	2014-09-17 22:37:11 -07:00
WangTaoTheTonic	3f169bfe3c	[SPARK-3565]Fix configuration item not consistent with document https://issues.apache.org/jira/browse/SPARK-3565 "spark.ports.maxRetries" should be "spark.port.maxRetries". Make the configuration keys in document and code consistent. Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #2427 from WangTaoTheTonic/fixPortRetries and squashes the following commits: c178813 [WangTaoTheTonic] Use blank lines trigger Jenkins 646f3fe [WangTaoTheTonic] also in SparkBuild.scala 3700dba [WangTaoTheTonic] Fix configuration item not consistent with document	2014-09-17 21:59:23 -07:00
Kousuke Saruta	1147973f1c	[SPARK-3567] appId field in SparkDeploySchedulerBackend should be volatile Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2428 from sarutak/appid-volatile-modification and squashes the following commits: c7d890d [Kousuke Saruta] Added volatile modifier to appId field in SparkDeploySchedulerBackend	2014-09-17 16:52:27 -07:00
Kousuke Saruta	6688a266f2	[SPARK-3564][WebUI] Display App ID on HistoryPage Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2424 from sarutak/display-appid-on-webui and squashes the following commits: 417fe90 [Kousuke Saruta] Added "App ID column" to HistoryPage	2014-09-17 16:31:58 -07:00
Kousuke Saruta	cbc065039f	[SPARK-3571] Spark standalone cluster mode doesn't work. I think, this issue is caused by #1106 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2436 from sarutak/SPARK-3571 and squashes the following commits: 7a4deea [Kousuke Saruta] Modified Master.scala to use numWorkersVisited and numWorkersAlive instead of stopPos 4e51e35 [Kousuke Saruta] Modified Master to prevent from 0 divide 4817ecd [Kousuke Saruta] Brushed up previous change 71e84b6 [Kousuke Saruta] Modified Master to enable schedule normally	2014-09-17 16:23:50 -07:00
Nicholas Chammas	7fc3bb7c88	[SPARK-3534] Fix expansion of testing arguments to sbt Testing arguments to `sbt` need to be passed as an array, not a single, long string. Fixes a bug introduced in #2420. Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #2437 from nchammas/selective-testing and squashes the following commits: a9f9c1c [Nicholas Chammas] fix printing of sbt test arguments cf57cbf [Nicholas Chammas] fix sbt test arguments e33b978 [Nicholas Chammas] Merge pull request #2 from apache/master 0b47ca4 [Nicholas Chammas] Merge branch 'master' of github.com:nchammas/spark 8051486 [Nicholas Chammas] Merge pull request #1 from apache/master 03180a4 [Nicholas Chammas] Merge branch 'master' of github.com:nchammas/spark d4c5f43 [Nicholas Chammas] Merge pull request #6 from apache/master	2014-09-17 15:14:04 -07:00
Andrew Ash	b3830b28f8	Docs: move HA subsections to a deeper indentation level Makes the table of contents read better Author: Andrew Ash <andrew@andrewash.com> Closes #2402 from ash211/docs/better-indentation and squashes the following commits: ea0e130 [Andrew Ash] Move HA subsections to a deeper indentation level	2014-09-17 15:07:57 -07:00
Nicholas Chammas	5044e4953a	[SPARK-1455] [SPARK-3534] [Build] When possible, run SQL tests only. If the only files changed are related to SQL, then only run the SQL tests. This patch includes some cosmetic/maintainability refactoring. I would be more than happy to undo some of these changes if they are inappropriate. We can accept this patch mostly as-is and address the immediate need documented in [SPARK-3534](https://issues.apache.org/jira/browse/SPARK-3534), or we can keep it open until a satisfactory solution along the lines [discussed here](https://issues.apache.org/jira/browse/SPARK-1455?focusedCommentId=14136424&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14136424) is reached. Note: I had to hack this patch up to test it locally, so what I'm submitting here and what I tested are technically different. Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #2420 from nchammas/selective-testing and squashes the following commits: db3fa2d [Nicholas Chammas] diff against master! f9e23f6 [Nicholas Chammas] when possible, run SQL tests only	2014-09-17 12:44:44 -07:00
Michael Armbrust	cbf983bb4a	[SQL][DOCS] Improve table caching section Author: Michael Armbrust <michael@databricks.com> Closes #2434 from marmbrus/patch-1 and squashes the following commits: 67215be [Michael Armbrust] [SQL][DOCS] Improve table caching section	2014-09-17 12:41:49 -07:00
Nicholas Chammas	8fbd5f4a90	[Docs] minor grammar fix Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #2430 from nchammas/patch-2 and squashes the following commits: d476bfb [Nicholas Chammas] [Docs] minor grammar fix	2014-09-17 12:33:09 -07:00
chesterxgchen	7d1a37239c	SPARK-3177 (on Master Branch) The JIRA and PR was original created for branch-1.1, and move to master branch now. Chester The Issue is due to that yarn-alpha and yarn have different APIs for certain class fields. In this particular case, the ClientBase using reflection to to address this issue, and we need to different way to test the ClientBase's method. Original ClientBaseSuite using getFieldValue() method to do this. But it doesn't work for yarn-alpha as the API returns an array of String instead of just String (which is the case for Yarn-stable API). To fix the test, I add a new method def getFieldValue2[A: ClassTag, A1: ClassTag, B](clazz: Class[_], field: String, defaults: => B) (mapTo: A => B)(mapTo1: A1 => B) : B = Try(clazz.getField(field)).map(_.get(null)).map { case v: A => mapTo(v) case v1: A1 => mapTo1(v1) case _ => defaults }.toOption.getOrElse(defaults) to handle the cases where the field type can be either type A or A1. In this new method the type A or A1 is pattern matched and corresponding mapTo function (mapTo or mapTo1) is used. Author: chesterxgchen <chester@alpinenow.com> Closes #2204 from chesterxgchen/SPARK-3177-master and squashes the following commits: e72a6ea [chesterxgchen] The Issue is due to that yarn-alpha and yarn have different APIs for certain class fields. In this particular case, the ClientBase using reflection to to address this issue, and we need to different way to test the ClientBase's method. Original ClientBaseSuite using getFieldValue() method to do this. But it doesn't work for yarn-alpha as the API returns an array of String instead of just String (which is the case for Yarn-stable API).	2014-09-17 10:25:52 -05:00
viper-kun	983609a4dd	[Docs] Correct spark.files.fetchTimeout default value change the value of spark.files.fetchTimeout Author: viper-kun <xukun.xu@huawei.com> Closes #2406 from viper-kun/master and squashes the following commits: ecb0d46 [viper-kun] [Docs] Correct spark.files.fetchTimeout default value 7cf4c7a [viper-kun] Update configuration.md	2014-09-17 00:09:57 -07:00
wangfei	008a5ed480	[Minor]ignore all config files in conf Some config files in ```conf``` should ignore, such as conf/fairscheduler.xml conf/hive-log4j.properties conf/metrics.properties ... So ignore all ```sh```/```properties```/```conf```/```xml``` files Author: wangfei <wangfei1@huawei.com> Closes #2395 from scwf/patch-2 and squashes the following commits: 3dc53f2 [wangfei] duplicate ```conf/*.conf``` 3c2986f [wangfei] ignore all config files	2014-09-16 21:57:33 -07:00
Andrew Or	0a7091e689	[SPARK-3555] Fix UISuite race condition The test "jetty selects different port under contention" is flaky. If another process binds to 4040 before the test starts, then the first server we start there will fail, and the subsequent servers we start thereafter may successfully bind to 4040 if it was released between the servers starting. Instead, we should just let Java find a random free port for us and hold onto it for the duration of the test. Author: Andrew Or <andrewor14@gmail.com> Closes #2418 from andrewor14/fix-port-contention and squashes the following commits: 0cd4974 [Andrew Or] Stop them servers a7071fe [Andrew Or] Pick random port instead of 4040	2014-09-16 16:03:20 -07:00
Evan Chan	a6e1712f1e	Add a Community Projects page This adds a new page to the docs listing community projects -- those created outside of Apache Spark that are of interest to the community of Spark users. Anybody can add to it just by submitting a PR. There was a discussion thread about alternatives: * Creating a Github organization for Spark projects - we could not find any sponsors for this, and it would be difficult to organize since many folks just create repos in their company organization or personal accounts * Apache has some place for storing community projects, but it was deemed difficult to work with, and again would be some permissions issues -- not everyone could update it. Author: Evan Chan <velvia@gmail.com> Closes #2219 from velvia/community-projects-page and squashes the following commits: 7316822 [Evan Chan] Point to Spark wiki: supplemental projects page 613b021 [Evan Chan] Add a few more projects a85eaaf [Evan Chan] Add a Community Projects page	2014-09-16 13:46:06 -07:00
Dan Osipov	b20171267d	[SPARK-787] Add S3 configuration parameters to the EC2 deploy scripts When deploying to AWS, there is additional configuration that is required to read S3 files. EMR creates it automatically, there is no reason that the Spark EC2 script shouldn't. This PR requires a corresponding PR to the mesos/spark-ec2 to be merged, as it gets cloned in the process of setting up machines: https://github.com/mesos/spark-ec2/pull/58 Author: Dan Osipov <daniil.osipov@shazam.com> Closes #1120 from danosipov/s3_credentials and squashes the following commits: 758da8b [Dan Osipov] Modify documentation to include the new parameter 71fab14 [Dan Osipov] Use a parameter --copy-aws-credentials to enable S3 credential deployment 7e0da26 [Dan Osipov] Get AWS credentials out of boto connection instance 39bdf30 [Dan Osipov] Add S3 configuration parameters to the EC2 deploy scripts	2014-09-16 13:40:16 -07:00
Davies Liu	ec1adecbb7	[SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx Using Sphinx to generate API docs for PySpark. requirement: Sphinx ``` $ cd python/docs/ $ make html ``` The generated API docs will be located at python/docs/_build/html/index.html It can co-exists with those generated by Epydoc. This is the first working version, after merging in, then we can continue to improve it and replace the epydoc finally. Author: Davies Liu <davies.liu@gmail.com> Closes #2292 from davies/sphinx and squashes the following commits: `425a3b1` [Davies Liu] cleanup 1573298 [Davies Liu] move docs to python/docs/ 5fe3903 [Davies Liu] Merge branch 'master' into sphinx 9468ab0 [Davies Liu] fix makefile b408f38 [Davies Liu] address all comments e2ccb1b [Davies Liu] update name and version 9081ead [Davies Liu] generate PySpark API docs using Sphinx	2014-09-16 12:51:58 -07:00
Kousuke Saruta	a9e910430f	[SPARK-3546] InputStream of ManagedBuffer is not closed and causes running out of file descriptor Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2408 from sarutak/resolve-resource-leak-issue and squashes the following commits: 074781d [Kousuke Saruta] Modified SuffleBlockFetcherIterator 5f63f67 [Kousuke Saruta] Move metrics increment logic and debug logging outside try block b37231a [Kousuke Saruta] Modified FileSegmentManagedBuffer#nioByteBuffer to check null or not before invoking channel.close bf29d4a [Kousuke Saruta] Modified FileSegment to close channel	2014-09-16 12:41:45 -07:00
Michael Armbrust	84073eb117	[SQL][DOCS] Improve section on thrift-server Taken from liancheng's updates. Merged conflicts with #2316. Author: Michael Armbrust <michael@databricks.com> Closes #2384 from marmbrus/sqlDocUpdate and squashes the following commits: 2db6319 [Michael Armbrust] @liancheng's updates	2014-09-16 11:51:46 -07:00
Nicholas Chammas	df90e81fd3	[Docs] minor punctuation fix Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #2414 from nchammas/patch-1 and squashes the following commits: 14664bf [Nicholas Chammas] [Docs] minor punctuation fix	2014-09-16 11:48:20 -07:00
Aaron Staple	8e7ae477ba	[SPARK-2314][SQL] Override collect and take in python library, and count in java library, with optimized versions. SchemaRDD overrides RDD functions, including collect, count, and take, with optimized versions making use of the query optimizer. The java and python interface classes wrapping SchemaRDD need to ensure the optimized versions are called as well. This patch overrides relevant calls in the python and java interfaces with optimized versions. Adds a new Row serialization pathway between python and java, based on JList[Array[Byte]] versus the existing RDD[Array[Byte]]. I wasn’t overjoyed about doing this, but I noticed that some QueryPlans implement optimizations in executeCollect(), which outputs an Array[Row] rather than the typical RDD[Row] that can be shipped to python using the existing serialization code. To me it made sense to ship the Array[Row] over to python directly instead of converting it back to an RDD[Row] just for the purpose of sending the Rows to python using the existing serialization code. Author: Aaron Staple <aaron.staple@gmail.com> Closes #1592 from staple/SPARK-2314 and squashes the following commits: 89ff550 [Aaron Staple] Merge with master. 6bb7b6c [Aaron Staple] Fix typo. b56d0ac [Aaron Staple] [SPARK-2314][SQL] Override count in JavaSchemaRDD, forwarding to SchemaRDD's count. 0fc9d40 [Aaron Staple] Fix comment typos. f03cdfa [Aaron Staple] [SPARK-2314][SQL] Override collect and take in sql.py, forwarding to SchemaRDD's collect.	2014-09-16 11:45:35 -07:00
Michael Armbrust	30f288ae34	[SPARK-2890][SQL] Allow reading of data when case insensitive resolution could cause possible ambiguity. Throwing an error in the constructor makes it possible to run queries, even when there is no actual ambiguity. Remove this check in favor of throwing an error in analysis when they query is actually is ambiguous. Also took the opportunity to add test cases that would have caught a subtle bug in my first attempt at fixing this and refactor some other test code. Author: Michael Armbrust <michael@databricks.com> Closes #2209 from marmbrus/sameNameStruct and squashes the following commits: 729cca4 [Michael Armbrust] Better tests. a003aeb [Michael Armbrust] Remove error (it'll be caught in analysis).	2014-09-16 11:42:26 -07:00
Yin Huai	7583699873	[SPARK-3308][SQL] Ability to read JSON Arrays as tables This PR aims to support reading top level JSON arrays and take every element in such an array as a row (an empty array will not generate a row). JIRA: https://issues.apache.org/jira/browse/SPARK-3308 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #2400 from yhuai/SPARK-3308 and squashes the following commits: 990077a [Yin Huai] Handle top level JSON arrays.	2014-09-16 11:40:28 -07:00
Matthew Farrellee	9d5fa763d8	[SPARK-3519] add distinct(n) to PySpark Added missing rdd.distinct(numPartitions) and associated tests Author: Matthew Farrellee <matt@redhat.com> Closes #2383 from mattf/SPARK-3519 and squashes the following commits: 30b837a [Matthew Farrellee] Combine test cases to save on JVM startups 6bc4a2c [Matthew Farrellee] [SPARK-3519] add distinct(n) to SchemaRDD in PySpark 7a17f2b [Matthew Farrellee] [SPARK-3519] add distinct(n) to PySpark	2014-09-16 11:39:57 -07:00
Cheng Hao	86d253ec4e	[SPARK-3527] [SQL] Strip the string message Author: Cheng Hao <hao.cheng@intel.com> Closes #2392 from chenghao-intel/trim and squashes the following commits: e52024f [Cheng Hao] trim the string message	2014-09-16 11:21:30 -07:00
Prashant Sharma	7b8008f5a4	[SPARK-2182] Scalastyle rule blocking non ascii characters. ...erators. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #2358 from ScrapCodes/scalastyle-unicode and squashes the following commits: 12a20f2 [Prashant Sharma] [SPARK-2182] Scalastyle rule blocking (non keyboard typeable) unicode operators.	2014-09-16 09:21:03 -07:00
Sean Owen	61e21fe7f4	SPARK-3069 [DOCS] Build instructions in README are outdated Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now. (Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.) Author: Sean Owen <sowen@cloudera.com> Closes #2014 from srowen/SPARK-3069 and squashes the following commits: 501507e [Sean Owen] Note that Zinc is for Maven builds too db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark 91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md 999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md 8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md	2014-09-16 09:18:03 -07:00
Ye Xianjin	febafefa5a	[SPARK-3040] pick up a more proper local ip address for Utils.findLocalIpAddress method Short version: NetworkInterface.getNetworkInterfaces returns ifs in reverse order compared to ifconfig output. It may pick up ip address associated with tun0 or virtual network interface. See [SPARK_3040](https://issues.apache.org/jira/browse/SPARK-3040) for more detail Author: Ye Xianjin <advancedxy@gmail.com> Closes #1946 from advancedxy/SPARK-3040 and squashes the following commits: f33f6b2 [Ye Xianjin] add windows support 087a785 [Ye Xianjin] reverse the Networkinterface.getNetworkInterfaces output order to get a more proper local ip address.	2014-09-15 21:53:38 -07:00
Prashant Sharma	ecf0c02935	[SPARK-3433][BUILD] Fix for Mima false-positives with @DeveloperAPI and @Experimental annotations. Actually false positive reported was due to mima generator not picking up the new jars in presence of old jars(theoretically this should not have happened.). So as a workaround, ran them both separately and just append them together. Author: Prashant Sharma <prashant@apache.org> Author: Prashant Sharma <prashant.s@imaginea.com> Closes #2285 from ScrapCodes/mima-fix and squashes the following commits: 093c76f [Prashant Sharma] Update mima 59012a8 [Prashant Sharma] Update mima 35b6c71 [Prashant Sharma] SPARK-3433 Fix for Mima false-positives with @DeveloperAPI and @Experimental annotations.	2014-09-15 21:14:00 -07:00
Reynold Xin	d428ac6a22	[SPARK-3540] Add reboot-slaves functionality to the ec2 script Tested on a real cluster. Author: Reynold Xin <rxin@apache.org> Closes #2404 from rxin/ec2-reboot-slaves and squashes the following commits: 00a2dbd [Reynold Xin] Allow rebooting slaves.	2014-09-15 21:09:58 -07:00
Aaron Staple	60050f4288	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file. Also made some cosmetic cleanups. Author: Aaron Staple <aaron.staple@gmail.com> Closes #2385 from staple/SPARK-1087 and squashes the following commits: 7b3bb13 [Aaron Staple] Address review comments, cosmetic cleanups. 10ba6e1 [Aaron Staple] [SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
Davies Liu	da33acb8b6	[SPARK-2951] [PySpark] support unpickle array.array for Python 2.6 Pyrolite can not unpickle array.array which pickled by Python 2.6, this patch fix it by extend Pyrolite. There is a bug in Pyrolite when unpickle array of float/double, this patch workaround it by reverse the endianness for float/double. This workaround should be removed after Pyrolite have a new release to fix this issue. I had send an PR to Pyrolite to fix it: https://github.com/irmen/Pyrolite/pull/11 Author: Davies Liu <davies.liu@gmail.com> Closes #2365 from davies/pickle and squashes the following commits: f44f771 [Davies Liu] enable tests about array 3908f5c [Davies Liu] Merge branch 'master' into pickle c77c87b [Davies Liu] cleanup debugging code 60e4e2f [Davies Liu] support unpickle array.array for Python 2.6	2014-09-15 18:57:25 -07:00
qiping.lqp	fdb302f49c	[SPARK-3516] [mllib] DecisionTree: Add minInstancesPerNode, minInfoGain params to example and Python API Added minInstancesPerNode, minInfoGain params to: * DecisionTreeRunner.scala example * Python API (tree.py) Also: * Fixed typo in tree suite test "do not choose split that does not satisfy min instance per node requirements" * small style fixes CC: mengxr Author: qiping.lqp <qiping.lqp@alibaba-inc.com> Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Author: chouqin <liqiping1991@gmail.com> Closes #2349 from jkbradley/chouqin-dt-preprune and squashes the following commits: 61b2e72 [Joseph K. Bradley] Added max of 10GB for maxMemoryInMB in Strategy. a95e7c8 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into chouqin-dt-preprune 95c479d [Joseph K. Bradley] * Fixed typo in tree suite test "do not choose split that does not satisfy min instance per node requirements" * small style fixes e2628b6 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into chouqin-dt-preprune 19b01af [Joseph K. Bradley] Merge remote-tracking branch 'chouqin/dt-preprune' into chouqin-dt-preprune f1d11d1 [chouqin] fix typo c7ebaf1 [chouqin] fix typo 39f9b60 [chouqin] change edge `minInstancesPerNode` to 2 and add one more test c6e2dfc [Joseph K. Bradley] Added minInstancesPerNode and minInfoGain parameters to DecisionTreeRunner.scala and to Python API in tree.py 0278a11 [chouqin] remove `noSplit` and set `Predict` private to tree d593ec7 [chouqin] fix docs and change minInstancesPerNode to 1 efcc736 [qiping.lqp] fix bug 10b8012 [qiping.lqp] fix style 6728fad [qiping.lqp] minor fix: remove empty lines bb465ca [qiping.lqp] Merge branch 'master' of https://github.com/apache/spark into dt-preprune cadd569 [qiping.lqp] add api docs 46b891f [qiping.lqp] fix bug e72c7e4 [qiping.lqp] add comments 845c6fa [qiping.lqp] fix style f195e83 [qiping.lqp] fix style 987cbf4 [qiping.lqp] fix bug ff34845 [qiping.lqp] separate calculation of predict of node from calculation of info gain ac42378 [qiping.lqp] add min info gain and min instances per node parameters in decision tree	2014-09-15 17:43:26 -07:00
Reza Zadeh	983d6a9c48	[MLlib] Update SVD documentation in IndexedRowMatrix Updating this to reflect the newest SVD via ARPACK Author: Reza Zadeh <rizlar@gmail.com> Closes #2389 from rezazadeh/irmdocs and squashes the following commits: 7fa1313 [Reza Zadeh] Update svd docs 715da25 [Reza Zadeh] Updated computeSVD documentation IndexedRowMatrix	2014-09-15 17:41:15 -07:00
Christoph Sawade	3b93128139	[SPARK-3396][MLLIB] Use SquaredL2Updater in LogisticRegressionWithSGD SimpleUpdater ignores the regularizer, which leads to an unregularized LogReg. To enable the common L2 regularizer (and the corresponding regularization parameter) for logistic regression the SquaredL2Updater has to be used in SGD (see, e.g., [SVMWithSGD]) Author: Christoph Sawade <christoph@sawade.me> Closes #2398 from BigCrunsh/fix-regparam-logreg and squashes the following commits: 0820c04 [Christoph Sawade] Use SquaredL2Updater in LogisticRegressionWithSGD	2014-09-15 17:39:31 -07:00
yantangzhai	37d925280c	[SPARK-2714] DAGScheduler logs jobid when runJob finishes DAGScheduler logs jobid when runJob finishes Author: yantangzhai <tyz0303@163.com> Closes #1617 from YanTangZhai/SPARK-2714 and squashes the following commits: 0a0243f [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes fbb1150 [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes 7aec2a9 [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes fb42f0f [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes 090d908 [yantangzhai] [SPARK-2714] DAGScheduler logs jobid when runJob finishes	2014-09-15 16:57:38 -07:00
Kousuke Saruta	e59fac1f97	[SPARK-3518] Remove wasted statement in JsonProtocol Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2380 from sarutak/SPARK-3518 and squashes the following commits: 8a1464e [Kousuke Saruta] Replaced a variable with simple field reference c660fbc [Kousuke Saruta] Removed useless statement in JsonProtocol.scala	2014-09-15 16:11:41 -07:00
Matthew Farrellee	fe2b1d6a20	[SPARK-3425] do not set MaxPermSize for OpenJDK 1.8 Closes #2387 Author: Matthew Farrellee <matt@redhat.com> Closes #2301 from mattf/SPARK-3425 and squashes the following commits: 20f3c09 [Matthew Farrellee] [SPARK-3425] do not set MaxPermSize for OpenJDK 1.8	2014-09-15 10:57:59 -07:00

1 2 3 4 5 ...

8184 commits