ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	9e01dcc644	[SPARK-13529][BUILD] Move network/* modules into common/network-* ## What changes were proposed in this pull request? As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder. ## How was this patch tested? Compilation and existing tests. We should run both SBT and Maven. Author: Reynold Xin <rxin@databricks.com> Closes #11409 from rxin/SPARK-13529.	2016-02-28 17:25:07 -08:00
Reynold Xin	59e3e10be2	[SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts ## What changes were proposed in this pull request? We provide a very limited set of cluster management script in Spark for Tachyon, although Tachyon itself provides a much better version of it. Given now Spark users can simply use Tachyon as a normal file system and does not require extensive configurations, we can remove this management capabilities to simplify Spark bash scripts. Note that this also reduces coupling between a 3rd party external system and Spark's release scripts, and would eliminate possibility for failures such as Tachyon being renamed or the tar balls being relocated. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #11400 from rxin/release-script.	2016-02-26 22:35:12 -08:00
Dongjoon Hyun	7af0de076f	[SPARK-11381][DOCS] Replace example code in mllib-linear-methods.md using include_example ## What changes were proposed in this pull request? This PR replaces example codes in `mllib-linear-methods.md` using `include_example` by doing the followings: * Extracts the example codes(Scala,Java,Python) as files in `example` module. * Merges some dialog-style examples into a single file. * Hide redundant codes in HTML for the consistency with other docs. ## How was the this patch tested? manual test. This PR can be tested by document generations, `SKIP_API=1 jekyll build`. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11320 from dongjoon-hyun/SPARK-11381.	2016-02-26 08:31:55 -08:00
Bryan Cutler	b33261f913	[SPARK-12634][PYSPARK][DOC] PySpark tree parameter desc to consistent format Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the tree module. closes #10601 Author: Bryan Cutler <cutlerb@gmail.com> Author: vijaykiran <mail@vijaykiran.com> Closes #11353 from BryanCutler/param-desc-consistent-tree-SPARK-12634.	2016-02-26 08:30:32 -08:00
Michael Gummelt	c98a93ded3	[SPARK-13439][MESOS] Document that spark.mesos.uris is comma-separated Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11311 from mgummelt/document_csv.	2016-02-25 13:32:09 +00:00
JeremyNixon	230bbeaa61	[SPARK-10759][ML] update cross validator with include_example This pull request uses {%include_example%} to add an example for the python cross validator to ml-guide. Author: JeremyNixon <jnixon2@gmail.com> Closes #11240 from JeremyNixon/pipeline_include_example.	2016-02-23 15:57:29 -08:00
Lianhui Wang	9f4263392e	[SPARK-7729][UI] Executor which has been killed should also be displayed on Executor Tab andrewor14 squito Dead Executors should also be displayed on Executor Tab. as following: ![image](https://cloud.githubusercontent.com/assets/545478/11492707/ae55d7f6-982b-11e5-919a-b62cd84684b2.png) Author: Lianhui Wang <lianhuiwang09@gmail.com> This patch had conflicts when merged, resolved by Committer: Andrew Or <andrew@databricks.com> Closes #10058 from lianhuiwang/SPARK-7729.	2016-02-23 11:08:39 -08:00
jerryshao	e99d017098	[SPARK-13220][CORE] deprecate yarn-client and yarn-cluster mode Author: jerryshao <sshao@hortonworks.com> Closes #11229 from jerryshao/SPARK-13220.	2016-02-23 12:30:57 +00:00
Devaraj K	02b1fefffb	[SPARK-13012][DOCUMENTATION] Replace example code in ml-guide.md using include_example Replaced example code in ml-guide.md using include_example Author: Devaraj K <devaraj@apache.org> Closes #11053 from devaraj-kavali/SPARK-13012.	2016-02-22 17:21:37 -08:00
Devaraj K	9f410871ca	[SPARK-13016][DOCUMENTATION] Replace example code in mllib-dimensionality-reduction.md using include_example Replaced example example code in mllib-dimensionality-reduction.md using include_example Author: Devaraj K <devaraj@apache.org> Closes #11132 from devaraj-kavali/SPARK-13016.	2016-02-22 17:16:56 -08:00
Bryan Cutler	e298ac91e3	[SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the fpm and recommendation modules. Closes #10602 Closes #10897 Author: Bryan Cutler <cutlerb@gmail.com> Author: somideshmukh <somilde@us.ibm.com> Closes #11186 from BryanCutler/param-desc-consistent-fpmrecc-SPARK-12632.	2016-02-22 12:48:37 +02:00
Dongjoon Hyun	024482bf51	[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.	2016-02-22 09:52:07 +00:00
Dongjoon Hyun	03e62aa3f6	[MINOR][DOCS] Fix typos in `configuration.md` and `hardware-provisioning.md` ## What changes were proposed in this pull request? This PR fixes some typos in the following documentation files. * `NOTICE`, `configuration.md`, and `hardware-provisioning.md`. ## How was the this patch tested? manual tests Author: Dongjoon Hyun <dongjoonapache.org> Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11289 from dongjoon-hyun/minor_fix_typos_notice_and_confdoc.	2016-02-21 15:27:07 -08:00
Iulian Dragos	6915cc23b3	[MINOR][DOCS][MESOS] Clarify that Mesos version is a lower bound. ## What changes were proposed in this pull request? Clarify that 0.21 is only a minimum requirement. ## How was the this patch tested? It's a doc change, so no tests. Author: Iulian Dragos <jaguarul@gmail.com> Closes #11271 from dragos/patch-1.	2016-02-19 11:47:36 -08:00
Sean Owen	fb7e21797e	[SPARK-13339][DOCS] Clarify commutative / associative operator requirements for reduce, fold Clarify that reduce functions need to be commutative, and fold functions do not See https://github.com/apache/spark/pull/11091 Author: Sean Owen <sowen@cloudera.com> Closes #11217 from srowen/SPARK-13339.	2016-02-19 10:26:38 +00:00
Sean Owen	b84404865b	[SPARK-13324][CORE][BUILD] Update plugin, test, example dependencies for 2.x Phase 1: update plugin versions, test dependencies, some example and third-party versions Author: Sean Owen <sowen@cloudera.com> Closes #11206 from srowen/SPARK-13324.	2016-02-17 19:03:29 -08:00
Christopher C. Aycock	a7c74d7563	[SPARK-13350][DOCS] Config doc updated to state that PYSPARK_PYTHON's default is "python2.7" Author: Christopher C. Aycock <chris@chrisaycock.com> Closes #11239 from chrisaycock/master.	2016-02-17 11:24:18 -08:00
junhao	7218c0eba9	[SPARK-11627] Add initial input rate limit for spark streaming backpressure mechanism. https://issues.apache.org/jira/browse/SPARK-11627 Spark Streaming backpressure mechanism has no initial input rate limit, it might cause OOM exception. In the firest batch task ,receivers receive data at the maximum speed they can reach,it might exhaust executors memory resources. Add a initial input rate limit value can make sure the Streaming job execute success in the first batch,then the backpressure mechanism can adjust receiving rate adaptively. Author: junhao <junhao@mogujie.com> Closes #9593 from junhaoMg/junhao-dev.	2016-02-16 19:43:17 -08:00
BenFradet	00c72d27bf	[SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative filtering in general This documents the implementation of ALS in `spark.ml` with example code in scala, java and python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10411 from BenFradet/SPARK-12247.	2016-02-16 13:03:28 +00:00
Xin Ren	e4675c2402	[SPARK-13018][DOCS] Replace example code in mllib-pmml-model-export.md using include_example Replace example code in mllib-pmml-model-export.md using include_example https://issues.apache.org/jira/browse/SPARK-13018 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11126 from keypointt/SPARK-13018.	2016-02-15 20:17:21 -08:00
JeremyNixon	adb5483650	[SPARK-13312][MLLIB] Update java train-validation-split example in ml-guide Response to JIRA https://issues.apache.org/jira/browse/SPARK-13312. This contribution is my original work and I license the work to this project. Author: JeremyNixon <jnixon2@gmail.com> Closes #11199 from JeremyNixon/update_train_val_split_example.	2016-02-15 09:25:13 +00:00
Amit Dev	331293c302	[SPARK-13300][DOCUMENTATION] Added pygments.rb dependancy Looks like pygments.rb gem is also required for jekyll build to work. At least on Ubuntu/RHEL I could not do build without this dependency. So added this to steps. Author: Amit Dev <amitdev@gmail.com> Closes #11180 from amitdev/master.	2016-02-14 11:41:27 +00:00
Sanket	894921d813	[SPARK-6166] Limit number of in flight outbound requests This JIRA is related to https://github.com/apache/spark/pull/5852 Had to do some minor rework and test to make sure it works with current version of spark. Author: Sanket <schintap@untilservice-lm> Closes #10838 from redsanket/limit-outbound-connections.	2016-02-11 22:40:00 -08:00
Steve Loughran	a2c7dcf61f	[SPARK-7889][WEBUI] HistoryServer updates UI for incomplete apps When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available. It does this by checking if a version of the app has been loaded with a larger filesize. If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI. https://issues.apache.org/jira/browse/SPARK-7889 Author: Steve Loughran <stevel@hortonworks.com> Author: Imran Rashid <irashid@cloudera.com> Closes #11118 from squito/SPARK-7889-alternate.	2016-02-11 21:37:53 -06:00
Sasaki Toru	c2f21d8898	[SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template In spark-env.sh.template, there are multi-byte characters, this PR will remove it. Author: Sasaki Toru <sasakitoa@nttdata.co.jp> Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.	2016-02-11 09:30:36 +00:00
Sean Owen	29c547303f	[SPARK-12414][CORE] Remove closure serializer Remove spark.closure.serializer option and use JavaSerializer always CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be. Author: Sean Owen <sowen@cloudera.com> Closes #11150 from srowen/SPARK-12414.	2016-02-10 13:34:53 -08:00
Michael Gummelt	80cb963ad9	[SPARK-5095][MESOS] Support launching multiple mesos executors in coarse grained mesos mode. This is the next iteration of tnachen's previous PR: https://github.com/apache/spark/pull/4027 In that PR, we resolved with andrewor14 and pwendell to implement the Mesos scheduler's support of `spark.executor.cores` to be consistent with YARN and Standalone. This PR implements that resolution. This PR implements two high-level features. These two features are co-dependent, so they're implemented both here: - Mesos support for spark.executor.cores - Multiple executors per slave We at Mesosphere have been working with Typesafe on a Spark/Mesos integration test suite: https://github.com/typesafehub/mesos-spark-integration-tests, which passes for this PR. The contribution is my original work and I license the work to the project under the project's open source license. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #10993 from mgummelt/executor_sizing.	2016-02-10 10:53:33 -08:00
Luciano Resende	2dbb916440	[SPARK-13189] Cleanup build references to Scala 2.10 Author: Luciano Resende <lresende@apache.org> Closes #11092 from lresende/SPARK-13189.	2016-02-09 11:56:25 -08:00
Sebastián Ramírez	c882ec57de	[SPARK-13040][DOCS] Update JDBC deprecated SPARK_CLASSPATH documentation Update JDBC documentation based on http://stackoverflow.com/a/30947090/219530 as SPARK_CLASSPATH is deprecated. Also, that's how it worked, it didn't work with the SPARK_CLASSPATH or the --jars alone. This would solve issue: https://issues.apache.org/jira/browse/SPARK-13040 Author: Sebastián Ramírez <tiangolo@gmail.com> Closes #10948 from tiangolo/patch-docs-jdbc.	2016-02-09 08:49:34 +00:00
Luc Bourlier	0bb5b73387	[SPARK-13002][MESOS] Send initial request of executors for dyn allocation Fix for [SPARK-13002](https://issues.apache.org/jira/browse/SPARK-13002) about the initial number of executors when running with dynamic allocation on Mesos. Instead of fixing it just for the Mesos case, made the change in `ExecutorAllocationManager`. It is already driving the number of executors running on Mesos, only no the initial value. The `None` and `Some(0)` are internal details on the computation of resources to reserved, in the Mesos backend scheduler. `executorLimitOption` has to be initialized correctly, otherwise the Mesos backend scheduler will, either, create to many executors at launch, or not create any executors and not be able to recover from this state. Removed the 'special case' description in the doc. It was not totally accurate, and is not needed anymore. This doesn't fix the same problem visible with Spark standalone. There is no straightforward way to send the initial value in standalone mode. Somebody knowing this part of the yarn support should review this change. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #11047 from skyluc/issue/initial-dyn-alloc-2.	2016-02-05 14:37:42 -08:00
Bill Chambers	66e1383de2	[SPARK-13214][DOCS] update dynamicAllocation documentation Author: Bill Chambers <bill@databricks.com> Closes #11094 from anabranch/dynamic-docs.	2016-02-05 14:35:39 -08:00
Yuhao Yang	c2c956bcd1	[ML][DOC] fix wrong api link in ml onevsrest minor fix for api link in ml onevsrest Author: Yuhao Yang <hhbyyh@gmail.com> Closes #11068 from hhbyyh/onevsrestDoc.	2016-02-03 21:19:44 -08:00
Timothy Chen	51b03b71ff	[SPARK-12463][SPARK-12464][SPARK-12465][SPARK-10647][MESOS] Fix zookeeper dir with mesos conf and add docs. Fix zookeeper dir configuration used in cluster mode, and also add documentation around these settings. Author: Timothy Chen <tnachen@gmail.com> Closes #10057 from tnachen/fix_mesos_dir.	2016-02-01 12:45:02 -08:00
Lewuathe	711ce048a2	[ML][MINOR] Invalid MulticlassClassification reference in ml-guide In [ml-guide](https://spark.apache.org/docs/latest/ml-guide.html#example-model-selection-via-cross-validation), there is invalid reference to `MulticlassClassificationEvaluator` apidoc. https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation.MultiClassClassificationEvaluator Author: Lewuathe <lewuathe@me.com> Closes #10996 from Lewuathe/fix-typo-in-ml-guide.	2016-02-01 12:21:21 -08:00
Takeshi YAMAMURO	da9146c91a	[DOCS] Fix the jar location of datanucleus in sql-programming-guid.md ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10901 from maropu/DocFix.	2016-02-01 12:02:06 -08:00
Josh Rosen	289373b28c	[SPARK-6363][BUILD] Make Scala 2.11 the default Scala version This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds). The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance). After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break. Author: Josh Rosen <joshrosen@databricks.com> Closes #10608 from JoshRosen/SPARK-6363.	2016-01-30 00:20:28 -08:00
James Lohse	c2204436a1	Provide same info as in spark-submit --help this is stated for --packages and --repositories. Without stating it for --jars, people expect a standard java classpath to work, with expansion and using a different delimiter than a comma. Currently this is only state in the --help for spark-submit "Comma-separated list of local jars to include on the driver and executor classpaths." Author: James Lohse <jimlohse@users.noreply.github.com> Closes #10890 from jimlohse/patch-1.	2016-01-28 10:50:50 +00:00
Andrew	093291cf9b	[SPARK-1680][DOCS] Explain environment variables for running on YARN in cluster mode JIRA 1680 added a property called spark.yarn.appMasterEnv. This PR draws users' attention to this special case by adding an explanation in configuration.html#environment-variables Author: Andrew <weiner.andrew.j@gmail.com> Closes #10869 from weineran/branch-yarn-docs.	2016-01-27 09:31:44 +00:00
Shixiong Zhu	cbd507d69c	[SPARK-7799][STREAMING][DOCUMENT] Add the linking and deploying instructions for streaming-akka project Since `actorStream` is an external project, we should add the linking and deploying instructions for it. A follow up PR of #10744 Author: Shixiong Zhu <shixiong@databricks.com> Closes #10856 from zsxwing/akka-link-instruction.	2016-01-26 11:31:54 -08:00
Sean Owen	649e9d0f5b	[SPARK-3369][CORE][STREAMING] Java mapPartitions Iterator->Iterable is inconsistent with Scala's Iterator->Iterator Fix Java function API methods for flatMap and mapPartitions to require producing only an Iterator, not Iterable. Also fix DStream.flatMap to require a function producing TraversableOnce only, not Traversable. CC rxin pwendell for API change; tdas since it also touches streaming. Author: Sean Owen <sowen@cloudera.com> Closes #10413 from srowen/SPARK-3369.	2016-01-26 11:55:28 +00:00
Yanbo Liang	dd2325d9a7	[SPARK-11965][ML][DOC] Update user guide for RFormula feature interactions Update user guide for RFormula feature interactions. Meanwhile we also update other new features such as supporting string label in Spark 1.6. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10222 from yanboliang/spark-11965.	2016-01-25 11:52:26 -08:00
Sean Owen	aca2a01654	[SPARK-12760][DOCS] inaccurate description for difference between local vs cluster mode in closure handling Clarify that modifying a driver local variable won't have the desired effect in cluster modes, and may or may not work as intended in local mode Author: Sean Owen <sowen@cloudera.com> Closes #10866 from srowen/SPARK-12760.	2016-01-23 11:45:12 +00:00
Mortada Mehyar	56f57f894e	[SPARK-12760][DOCS] invalid lambda expression in python example for … …local vs cluster srowen thanks for the PR at https://github.com/apache/spark/pull/10866! sorry it took me a while. This is related to https://github.com/apache/spark/pull/10866, basically the assignment in the lambda expression in the python example is actually invalid ``` In [1]: data = [1, 2, 3, 4, 5] In [2]: counter = 0 In [3]: rdd = sc.parallelize(data) In [4]: rdd.foreach(lambda x: counter += x) File "<ipython-input-4-fcb86c182bad>", line 1 rdd.foreach(lambda x: counter += x) ^ SyntaxError: invalid syntax ``` Author: Mortada Mehyar <mortada.mehyar@gmail.com> Closes #10867 from mortada/doc_python_fix.	2016-01-23 11:36:33 +00:00
Shixiong Zhu	bc1babd63d	[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming - Remove Akka dependency from core. Note: the streaming-akka project still uses Akka. - Remove HttpFileServer - Remove Akka configs from SparkConf and SSLOptions - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult` depends on it. - Update comments and docs Author: Shixiong Zhu <shixiong@databricks.com> Closes #10854 from zsxwing/remove-akka.	2016-01-22 21:20:04 -08:00
felixcheung	85200c09ad	[SPARK-12534][DOC] update documentation to list command line equivalent to properties Several Spark properties equivalent to Spark submit command line options are missing. Author: felixcheung <felixcheung_m@hotmail.com> Closes #10491 from felixcheung/sparksubmitdoc.	2016-01-21 16:30:20 +01:00
Sun Rui	1b2a918e59	[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #10201 from sun-rui/SPARK-12204.	2016-01-20 21:08:15 -08:00
Shixiong Zhu	b7d74a602f	[SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project Include the following changes: 1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream 2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream" 3. Update the ActorWordCount example and add the JavaActorWordCount example 4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly Author: Shixiong Zhu <shixiong@databricks.com> Closes #10744 from zsxwing/streaming-akka-2.	2016-01-20 13:55:41 -08:00
felixcheung	488bbb216c	[SPARK-12232][SPARKR] New R API for read.table to avoid name conflict shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table` Author: felixcheung <felixcheung_m@hotmail.com> Closes #10406 from felixcheung/readtable.	2016-01-19 18:31:03 -08:00
scwf	43f1d59e17	[SPARK-2750][WEB UI] Add https support to the Web UI Author: scwf <wangfei1@huawei.com> Author: Marcelo Vanzin <vanzin@cloudera.com> Author: WangTaoTheTonic <wangtao111@huawei.com> Author: w00228970 <wangfei1@huawei.com> Closes #10238 from vanzin/SPARK-2750.	2016-01-19 14:49:55 -08:00
Shixiong Zhu	721845c1b6	[SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis integration doc This PR added instructions to get Kinesis assembly jar for Python users in the Kinesis integration page like Kafka doc. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10822 from zsxwing/kinesis-doc.	2016-01-18 16:50:05 -08:00

1 2 3 4 5 ...

1466 commits