ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Zheng RuiFeng	9bfb35da1e	[SPARK-14515][DOC] Add python example for ChiSqSelector ## What changes were proposed in this pull request? Add the missing python example for ChiSqSelector ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12283 from zhengruifeng/chi2_pe.	2016-04-18 17:14:22 -07:00
Mark Grover	ff9ae61a3b	[SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assembly ## What changes were proposed in this pull request? Removing references to assembly jar in documentation. Adding an additional (previously undocumented) usage of spark-submit to run examples. ## How was this patch tested? Ran spark-submit usage to ensure formatting was fine. Ran examples using SparkSubmit. Author: Mark Grover <mark@apache.org> Closes #12365 from markgrover/spark-14601.	2016-04-14 18:51:43 -07:00
Dhruve Ashar	f83ba454a5	[SPARK-14572][DOC] Update config docs to allow -Xms in extraJavaOptions ## What changes were proposed in this pull request? The configuration docs are updated to reflect the changes introduced with [SPARK-12384](https://issues.apache.org/jira/browse/SPARK-12384). This allows the user to specify initial heap memory settings through the extraJavaOptions for executor, driver and am. ## How was this patch tested? The changes are tested in [SPARK-12384](https://issues.apache.org/jira/browse/SPARK-12384). This is just documenting the changes made. Author: Dhruve Ashar <dhruveashar@gmail.com> Closes #12333 from dhruve/doc/SPARK-14572.	2016-04-14 10:29:14 -05:00
Yuhao Yang	781df49983	[SPARK-13089][ML] [Doc] spark.ml Naive Bayes user guide and examples jira: https://issues.apache.org/jira/browse/SPARK-13089 Add section in ml-classification.md for NaiveBayes DataFrame-based API, plus example code (using include_example to clip code from examples/ folder files). Author: Yuhao Yang <hhbyyh@gmail.com> Closes #11015 from hhbyyh/naiveBayesDoc.	2016-04-13 13:58:35 -07:00
Zheng RuiFeng	fcdd69260e	[SPARK-14509][DOC] Add python CountVectorizerExample ## What changes were proposed in this pull request? Add python CountVectorizerExample ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11917 from zhengruifeng/cv_pe.	2016-04-13 13:56:23 -07:00
Dongjoon Hyun	1a0cca1fc8	[MINOR][DOCS] Fix wrong data types in JSON Datasets example. ## What changes were proposed in this pull request? This PR fixes the `age` data types from `integer` to `long` in `SQL Programming Guide: JSON Datasets`. ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12290 from dongjoon-hyun/minor_fix_type_in_json_example.	2016-04-11 09:03:11 +01:00
Zheng RuiFeng	adb9d73cd6	[SPARK-14339][DOC] Add python examples for DCT,MinMaxScaler,MaxAbsScaler ## What changes were proposed in this pull request? add three python examples ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12063 from zhengruifeng/dct_pe.	2016-04-09 11:25:39 -07:00
Michael Gummelt	30e980ad8e	[DOCS][MINOR] Remove sentence about Mesos not supporting cluster mode. Docs change to remove the sentence about Mesos not supporting cluster mode. It was not. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #12249 from mgummelt/fix-mesos-cluster-docs.	2016-04-07 17:41:55 -07:00
Malte	db75ccb552	Better host description for multi-master mesos ## What changes were proposed in this pull request? Since not having the correct zk url causes job failure, the documentation should include all parameters ## How was this patch tested? no tests necessary Author: Malte <elmalto@users.noreply.github.com> Closes #12218 from elmalto/patch-1.	2016-04-07 09:16:07 +01:00
Reynold Xin	9ca0760d67	[SPARK-10063][SQL] Remove DirectParquetOutputCommitter ## What changes were proposed in this pull request? This patch removes DirectParquetOutputCommitter. This was initially created by Databricks as a faster way to write Parquet data to S3. However, given how the underlying S3 Hadoop implementation works, this committer only works when there are no failures. If there are multiple attempts of the same task (e.g. speculation or task failures or node failures), the output data can be corrupted. I don't think this performance optimization outweighs the correctness issue. ## How was this patch tested? Removed the related tests also. Author: Reynold Xin <rxin@databricks.com> Closes #12229 from rxin/SPARK-10063.	2016-04-07 00:51:45 -07:00
Holden Karau	457e58befe	[SPARK-14424][BUILD][DOCS] Update the build docs to switch from assembly to package and add a no… ## What changes were proposed in this pull request? Change our build docs & shell scripts to that developers are aware of the change from "assembly" to "package" ## How was this patch tested? Manually ran ./bin/spark-shell after ./build/sbt assembly and verified error message printed, ran new suggested build target and verified ./bin/spark-shell runs after this. Author: Holden Karau <holden@pigscanfly.ca> Author: Holden Karau <holden@us.ibm.com> Closes #12197 from holdenk/SPARK-1424-spark-class-broken-fix-build-docs.	2016-04-06 16:00:29 -07:00
Devaraj K	bc36df127d	[SPARK-13063][YARN] Make the SPARK YARN STAGING DIR as configurable ## What changes were proposed in this pull request? Made the SPARK YARN STAGING DIR as configurable with the configuration as 'spark.yarn.staging-dir'. ## How was this patch tested? I have verified it manually by running applications on yarn, If the 'spark.yarn.staging-dir' is configured then the value used as staging directory otherwise uses the default value i.e. file system’s home directory for the user. Author: Devaraj K <devaraj@apache.org> Closes #12082 from devaraj-kavali/SPARK-13063.	2016-04-05 14:12:00 -05:00
Marcelo Vanzin	24d7d2e453	[SPARK-13579][BUILD] Stop building the main Spark assembly. This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and remove duplicate jars packages in the examples module). I also made some minor adjustments to dependencies to remove some test jars from the final packaging, and remove jars that conflict with each other when packaged separately (e.g. servlet api). Also note that this change restores guava in applications' classpaths, even though it's still shaded inside Spark. This is now needed for the Hadoop libraries that are packaged with Spark, which now are not processed by the shade plugin. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11796 from vanzin/SPARK-13579.	2016-04-04 16:52:22 -07:00
Liwei Lin	03d130f973	[SPARK-14342][CORE][DOCS][TESTS] Remove straggler references to Tachyon ## What changes were proposed in this pull request? Straggler references to Tachyon were removed: - for docs, `tachyon` has been generalized as `off-heap memory`; - for Mesos test suits, the key-value `tachyon:true`/`tachyon:false` has been changed to `os:centos`/`os:ubuntu`, since `os` is an example constrain used by the [Mesos official docs](http://mesos.apache.org/documentation/attributes-resources/). ## How was this patch tested? Existing test suites. Author: Liwei Lin <lwlin7@gmail.com> Closes #12129 from lw-lin/tachyon-cleanup.	2016-04-02 17:55:46 -07:00
jerryshao	8ba2b7f28f	[SPARK-12343][YARN] Simplify Yarn client and client argument ## What changes were proposed in this pull request? Currently in Spark on YARN, configurations can be passed through SparkConf, env and command arguments, some parts are duplicated, like client argument and SparkConf. So here propose to simplify the command arguments. ## How was this patch tested? This patch is tested manually with unit test. CC vanzin tgravescs , please help to suggest this proposal. The original purpose of this JIRA is to remove `ClientArguments`, through refactoring some arguments like `--class`, `--arg` are not so easy to replace, so here I remove the most part of command line arguments, only keep the minimal set. Author: jerryshao <sshao@hortonworks.com> Closes #11603 from jerryshao/SPARK-12343.	2016-04-01 10:52:13 -07:00
Josh Rosen	a7af6cd2ea	[SPARK-14281][TESTS] Fix java8-tests and simplify their build This patch fixes a compilation / build break in Spark's `java8-tests` and refactors their POM to simplify the build. See individual commit messages for more details. Author: Josh Rosen <joshrosen@databricks.com> Closes #12073 from JoshRosen/fix-java8-tests.	2016-03-31 13:52:59 -07:00
Michael Gummelt	4d93b653f7	[Docs] Update monitoring.md to accurately describe the history server It looks like the docs were recently updated to reflect the History Server's support for incomplete applications, but they still had wording that suggested only completed applications were viewable. This fixes that. My editor also introduced several whitespace removal changes, that I hope are OK, as text files shouldn't have trailing whitespace. To verify they're purely whitespace changes, add `&w=1` to your browser address. If this isn't acceptable, let me know and I'll update the PR. I also didn't think this required a JIRA. Let me know if I should create one. Not tested Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #12045 from mgummelt/update-history-docs.	2016-03-31 12:06:21 -07:00
Shixiong Zhu	d23ad7c1c9	[SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter ## What changes were proposed in this pull request? This PR removes all docs about the old streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter projects since I have already copied them to https://github.com/spark-packages Also remove mqtt_wordcount.py that I forgot to remove previously. ## How was this patch tested? Jenkins PR Build. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11824 from zsxwing/remove-doc.	2016-03-26 01:47:27 -07:00
Xin Ren	d283223a5a	[SPARK-13017][DOCS] Replace example code in mllib-feature-extraction.md using include_example Replace example code in mllib-feature-extraction.md using include_example https://issues.apache.org/jira/browse/SPARK-13017 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/TFIDFExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/TFIDFExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11142 from keypointt/SPARK-13017.	2016-03-24 14:25:10 -07:00
Xin Ren	dd9ca7b960	[SPARK-13019][DOCS] fix for scala-2.10 build: Replace example code in mllib-statistics.md using include_example ## What changes were proposed in this pull request? This PR for ticket SPARK-13019 is based on previous PR(https://github.com/apache/spark/pull/11108). Since PR(https://github.com/apache/spark/pull/11108) is breaking scala-2.10 build, more work is needed to fix build errors. What I did new in this PR is adding keyword argument for 'fractions': ` val approxSample = data.sampleByKey(withReplacement = false, fractions = fractions)` ` val exactSample = data.sampleByKeyExact(withReplacement = false, fractions = fractions)` I reopened ticket on JIRA but sorry I don't know how to reopen a GitHub pull request, so I just submitting a new pull request. ## How was this patch tested? Manual build testing on local machine, build based on scala-2.10. Author: Xin Ren <iamshrek@126.com> Closes #11901 from keypointt/SPARK-13019.	2016-03-24 09:34:54 +00:00
Xiangrui Meng	43ef1e52bf	Revert "[SPARK-13019][DOCS] Replace example code in mllib-statistics.md using include_example" This reverts commit `1af8de200c`.	2016-03-21 17:42:30 -07:00
Xin Ren	1af8de200c	[SPARK-13019][DOCS] Replace example code in mllib-statistics.md using include_example https://issues.apache.org/jira/browse/SPARK-13019 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11108 from keypointt/SPARK-13019.	2016-03-21 16:09:34 -07:00
Dongjoon Hyun	c11ea2e413	[MINOR][DOCS] Update build descriptions and commands ## What changes were proposed in this pull request? This PR updates Scala and Hadoop versions in the build description and commands in `Building Spark` documents. ## How was this patch tested? N/A Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11838 from dongjoon-hyun/fix_doc_building_spark.	2016-03-18 21:32:48 -07:00
Zheng RuiFeng	204c9dec2c	[MINOR][DOC] Add JavaStreamingTestExample ## What changes were proposed in this pull request? Add the java example of StreamingTest ## How was this patch tested? manual tests in CLI: bin/run-example mllib.JavaStreamingTestExample dataDir 5 100 Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11776 from zhengruifeng/streaming_je.	2016-03-17 11:09:02 +02:00
Daoyuan Wang	d1c193a2f1	[SPARK-12855][MINOR][SQL][DOC][TEST] remove spark.sql.dialect from doc and test ## What changes were proposed in this pull request? Since developer API of plug-able parser has been removed in #10801 , docs should be updated accordingly. ## How was this patch tested? This patch will not affect the real code path. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #11758 from adrian-wang/spark12855.	2016-03-16 22:52:10 -07:00
Dongjoon Hyun	4ce2d24e2a	[SPARK-13942][CORE][DOCS] Remove Shark-related docs for 2.x ## What changes were proposed in this pull request? `Shark` was merged into `Spark SQL` since [July 2014](https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html). The followings seem to be the only legacy. For Spark 2.x, we had better clean up those docs. Migration Guide ``` - ## Migration Guide for Shark Users - ... - ### Scheduling - ... - ### Reducer number - ... - ### Caching ``` ## How was this patch tested? Pass the Jenkins test. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11770 from dongjoon-hyun/SPARK-13942.	2016-03-16 15:50:24 -07:00
Shixiong Zhu	43304b1758	[SPARK-13888][DOC] Remove Akka Receiver doc and refer to the DStream Akka project ## What changes were proposed in this pull request? I have copied the docs of Streaming Akka to https://github.com/spark-packages/dstream-akka/blob/master/README.md So we can remove them from Spark now. ## How was this patch tested? Only document changes. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Shixiong Zhu <shixiong@databricks.com> Closes #11711 from zsxwing/remove-akka-doc.	2016-03-14 23:21:30 -07:00
Daniel Santana	9f13f0fc17	[MINOR][DOCS] Added Missing back slashes ## What changes were proposed in this pull request? When studying spark many users just copy examples on the documentation and paste on their terminals and because of that the missing backlashes lead them run into some shell errors. The added backslashes avoid that problem for spark users with that behavior. ## How was this patch tested? I generated the documentation locally using jekyll and checked the generated pages Author: Daniel Santana <mestresan@gmail.com> Closes #11699 from danielsan/master.	2016-03-14 12:26:08 -07:00
Sean Owen	1840852841	[SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> byte[] conversions (and remaining Coverity items) ## What changes were proposed in this pull request? - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8 - Same for `InputStreamReader` and `OutputStreamWriter` constructors - Standardizes on UTF-8 everywhere - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`) - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit `1deecd8d9c` ) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11657 from srowen/SPARK-13823.	2016-03-13 21:03:49 -07:00
Marcelo Vanzin	07f1c54477	[SPARK-13577][YARN] Allow Spark jar to be multiple jars, archive. In preparation for the demise of assemblies, this change allows the YARN backend to use multiple jars and globs as the "Spark jar". The config option has been renamed to "spark.yarn.jars" to reflect that. A second option "spark.yarn.archive" was also added; if set, this takes precedence and uploads an archive expected to contain the jar files with the Spark code and its dependencies. Existing deployments should keep working, mostly. This change drops support for the "SPARK_JAR" environment variable, and also does not fall back to using "jarOfClass" if no configuration is set, falling back to finding files under SPARK_HOME instead. This should be fine since "jarOfClass" probably wouldn't work unless you were using spark-submit anyway. Tested with the unit tests, and trying the different config options on a YARN cluster. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11500 from vanzin/SPARK-13577.	2016-03-11 07:54:57 -06:00
Yuhao Yang	0b713e0455	[SPARK-13512][ML] add example and doc for MaxAbsScaler ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-13512 Add example and doc for ml.feature.MaxAbsScaler. ## How was this patch tested? unit tests Author: Yuhao Yang <hhbyyh@gmail.com> Closes #11392 from hhbyyh/maxabsdoc.	2016-03-11 09:31:35 +02:00
Zheng RuiFeng	d18276cb1d	[SPARK-13672][ML] Add python examples of BisectingKMeans in ML and MLLIB JIRA: https://issues.apache.org/jira/browse/SPARK-13672 ## What changes were proposed in this pull request? add two python examples of BisectingKMeans for ml and mllib ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11515 from zhengruifeng/mllib_bkm_pe.	2016-03-11 09:21:12 +02:00
Dongjoon Hyun	88fa866620	[MINOR][DOC] Fix supported hive version in doc ## What changes were proposed in this pull request? Today, Spark 1.6.1 and updated docs are release. Unfortunately, there is obsolete hive version information on docs: [Building Spark](http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support). This PR fixes the following two lines. ``` -By default Spark will build with Hive 0.13.1 bindings. +By default Spark will build with Hive 1.2.1 bindings. -# Apache Hadoop 2.4.X with Hive 13 support +# Apache Hadoop 2.4.X with Hive 1.2.1 support ``` `sql/README.md` file also describe ## How was this patch tested? Manual. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11639 from dongjoon-hyun/fix_doc_hive_version.	2016-03-10 17:07:18 -08:00
JeremyNixon	3e3c3d58d8	[SPARK-13706][ML] Add Python Example for Train Validation Split ## What changes were proposed in this pull request? This pull request adds a python example for train validation split. ## How was this patch tested? This was style tested through lint-python, generally tested with ./dev/run-tests, and run in notebook and shell environments. It was viewed in docs locally with jekyll serve. This contribution is my original work and I license it to Spark under its open source license. Author: JeremyNixon <jnixon2@gmail.com> Closes #11547 from JeremyNixon/tvs_example.	2016-03-10 09:18:15 +02:00
Sergiusz Urbaniak	a4a0addccf	[SPARK-13492][MESOS] Configurable Mesos framework webui URL. ## What changes were proposed in this pull request? Previously the Mesos framework webui URL was being derived only from the Spark UI address leaving no possibility to configure it. This commit makes it configurable. If unset it falls back to the previous behavior. Motivation: This change is necessary in order to be able to install Spark on DCOS and to be able to give it a custom service link. The configured `webui_url` is configured to point to a reverse proxy in the DCOS environment. ## How was this patch tested? Locally, using unit tests and on DCOS testing and stable revision. Author: Sergiusz Urbaniak <sur@mesosphere.io> Closes #11369 from s-urbaniak/sur-webui-url.	2016-03-09 18:10:01 -08:00
Sean Owen	256704c771	[SPARK-13595][BUILD] Move docker, extras modules into external ## What changes were proposed in this pull request? Move `docker` dirs out of top level into `external/`; move `extras/*` into `external/` ## How was this patch tested? This is tested with Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #11523 from srowen/SPARK-13595.	2016-03-09 18:27:44 +00:00
Dongjoon Hyun	c3689bc24e	[SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance creation in Java code. ## What changes were proposed in this pull request? In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator. ``` - final ArrayList<Product2<Object, Object>> dataToWrite = - new ArrayList<Product2<Object, Object>>(); + final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>(); ``` Java 7 or higher supports diamond operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this. ## How was this patch tested? Manual. Pass the existing tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11541 from dongjoon-hyun/SPARK-13702.	2016-03-09 10:31:26 +00:00
Sean Owen	54040f8d35	[SPARK-13715][MLLIB] Remove last usages of jblas in tests ## What changes were proposed in this pull request? Remove last usage of jblas, in tests ## How was this patch tested? Jenkins tests -- the same ones that are being modified. Author: Sean Owen <sowen@cloudera.com> Closes #11560 from srowen/SPARK-13715.	2016-03-08 17:47:55 +00:00
Sean Owen	0eea12a3d9	[SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs ## What changes were proposed in this pull request? Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`. I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope. ## How was this patch tested? `./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest. Author: Sean Owen <sowen@cloudera.com> Closes #11522 from srowen/SPARK-13596.	2016-03-07 14:48:02 -08:00
CodingCat	a3ec50a4bc	[MINOR][DOC] improve the doc for "spark.memory.offHeap.size" The description of "spark.memory.offHeap.size" in the current document does not clearly state that memory is counted with bytes.... This PR contains a small fix for this tiny issue document fix Author: CodingCat <zhunansjtu@gmail.com> Closes #11561 from CodingCat/master.	2016-03-07 12:08:26 -08:00
rmishra	4b13896ebf	[SPARK-13705][DOCS] UpdateStateByKey Operation documentation incorrectly refers to StatefulNetworkWordCount ## What changes were proposed in this pull request? The reference to StatefulNetworkWordCount.scala from updateStatesByKey documentation should be removed, till there is a example for updateStatesByKey. ## How was this patch tested? Have tested the new documentation with jekyll build. Author: rmishra <rmishra@pivotal.io> Closes #11545 from rishitesh/SPARK-13705.	2016-03-07 09:55:49 +00:00
Xin Ren	70f6f9649b	[SPARK-13013][DOCS] Replace example code in mllib-clustering.md using include_example Replace example code in mllib-clustering.md using include_example https://issues.apache.org/jira/browse/SPARK-13013 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/KMeansExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/KMeansExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11116 from keypointt/SPARK-13013.	2016-03-03 09:32:47 -08:00
Reynold Xin	9e01dcc644	[SPARK-13529][BUILD] Move network/* modules into common/network-* ## What changes were proposed in this pull request? As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder. ## How was this patch tested? Compilation and existing tests. We should run both SBT and Maven. Author: Reynold Xin <rxin@databricks.com> Closes #11409 from rxin/SPARK-13529.	2016-02-28 17:25:07 -08:00
Reynold Xin	59e3e10be2	[SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts ## What changes were proposed in this pull request? We provide a very limited set of cluster management script in Spark for Tachyon, although Tachyon itself provides a much better version of it. Given now Spark users can simply use Tachyon as a normal file system and does not require extensive configurations, we can remove this management capabilities to simplify Spark bash scripts. Note that this also reduces coupling between a 3rd party external system and Spark's release scripts, and would eliminate possibility for failures such as Tachyon being renamed or the tar balls being relocated. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #11400 from rxin/release-script.	2016-02-26 22:35:12 -08:00
Dongjoon Hyun	7af0de076f	[SPARK-11381][DOCS] Replace example code in mllib-linear-methods.md using include_example ## What changes were proposed in this pull request? This PR replaces example codes in `mllib-linear-methods.md` using `include_example` by doing the followings: * Extracts the example codes(Scala,Java,Python) as files in `example` module. * Merges some dialog-style examples into a single file. * Hide redundant codes in HTML for the consistency with other docs. ## How was the this patch tested? manual test. This PR can be tested by document generations, `SKIP_API=1 jekyll build`. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11320 from dongjoon-hyun/SPARK-11381.	2016-02-26 08:31:55 -08:00
Bryan Cutler	b33261f913	[SPARK-12634][PYSPARK][DOC] PySpark tree parameter desc to consistent format Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the tree module. closes #10601 Author: Bryan Cutler <cutlerb@gmail.com> Author: vijaykiran <mail@vijaykiran.com> Closes #11353 from BryanCutler/param-desc-consistent-tree-SPARK-12634.	2016-02-26 08:30:32 -08:00
Michael Gummelt	c98a93ded3	[SPARK-13439][MESOS] Document that spark.mesos.uris is comma-separated Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11311 from mgummelt/document_csv.	2016-02-25 13:32:09 +00:00
JeremyNixon	230bbeaa61	[SPARK-10759][ML] update cross validator with include_example This pull request uses {%include_example%} to add an example for the python cross validator to ml-guide. Author: JeremyNixon <jnixon2@gmail.com> Closes #11240 from JeremyNixon/pipeline_include_example.	2016-02-23 15:57:29 -08:00
Lianhui Wang	9f4263392e	[SPARK-7729][UI] Executor which has been killed should also be displayed on Executor Tab andrewor14 squito Dead Executors should also be displayed on Executor Tab. as following: ![image](https://cloud.githubusercontent.com/assets/545478/11492707/ae55d7f6-982b-11e5-919a-b62cd84684b2.png) Author: Lianhui Wang <lianhuiwang09@gmail.com> This patch had conflicts when merged, resolved by Committer: Andrew Or <andrew@databricks.com> Closes #10058 from lianhuiwang/SPARK-7729.	2016-02-23 11:08:39 -08:00
jerryshao	e99d017098	[SPARK-13220][CORE] deprecate yarn-client and yarn-cluster mode Author: jerryshao <sshao@hortonworks.com> Closes #11229 from jerryshao/SPARK-13220.	2016-02-23 12:30:57 +00:00
Devaraj K	02b1fefffb	[SPARK-13012][DOCUMENTATION] Replace example code in ml-guide.md using include_example Replaced example code in ml-guide.md using include_example Author: Devaraj K <devaraj@apache.org> Closes #11053 from devaraj-kavali/SPARK-13012.	2016-02-22 17:21:37 -08:00
Devaraj K	9f410871ca	[SPARK-13016][DOCUMENTATION] Replace example code in mllib-dimensionality-reduction.md using include_example Replaced example example code in mllib-dimensionality-reduction.md using include_example Author: Devaraj K <devaraj@apache.org> Closes #11132 from devaraj-kavali/SPARK-13016.	2016-02-22 17:16:56 -08:00
Bryan Cutler	e298ac91e3	[SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the fpm and recommendation modules. Closes #10602 Closes #10897 Author: Bryan Cutler <cutlerb@gmail.com> Author: somideshmukh <somilde@us.ibm.com> Closes #11186 from BryanCutler/param-desc-consistent-fpmrecc-SPARK-12632.	2016-02-22 12:48:37 +02:00
Dongjoon Hyun	024482bf51	[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.	2016-02-22 09:52:07 +00:00
Dongjoon Hyun	03e62aa3f6	[MINOR][DOCS] Fix typos in `configuration.md` and `hardware-provisioning.md` ## What changes were proposed in this pull request? This PR fixes some typos in the following documentation files. * `NOTICE`, `configuration.md`, and `hardware-provisioning.md`. ## How was the this patch tested? manual tests Author: Dongjoon Hyun <dongjoonapache.org> Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11289 from dongjoon-hyun/minor_fix_typos_notice_and_confdoc.	2016-02-21 15:27:07 -08:00
Iulian Dragos	6915cc23b3	[MINOR][DOCS][MESOS] Clarify that Mesos version is a lower bound. ## What changes were proposed in this pull request? Clarify that 0.21 is only a minimum requirement. ## How was the this patch tested? It's a doc change, so no tests. Author: Iulian Dragos <jaguarul@gmail.com> Closes #11271 from dragos/patch-1.	2016-02-19 11:47:36 -08:00
Sean Owen	fb7e21797e	[SPARK-13339][DOCS] Clarify commutative / associative operator requirements for reduce, fold Clarify that reduce functions need to be commutative, and fold functions do not See https://github.com/apache/spark/pull/11091 Author: Sean Owen <sowen@cloudera.com> Closes #11217 from srowen/SPARK-13339.	2016-02-19 10:26:38 +00:00
Sean Owen	b84404865b	[SPARK-13324][CORE][BUILD] Update plugin, test, example dependencies for 2.x Phase 1: update plugin versions, test dependencies, some example and third-party versions Author: Sean Owen <sowen@cloudera.com> Closes #11206 from srowen/SPARK-13324.	2016-02-17 19:03:29 -08:00
Christopher C. Aycock	a7c74d7563	[SPARK-13350][DOCS] Config doc updated to state that PYSPARK_PYTHON's default is "python2.7" Author: Christopher C. Aycock <chris@chrisaycock.com> Closes #11239 from chrisaycock/master.	2016-02-17 11:24:18 -08:00
junhao	7218c0eba9	[SPARK-11627] Add initial input rate limit for spark streaming backpressure mechanism. https://issues.apache.org/jira/browse/SPARK-11627 Spark Streaming backpressure mechanism has no initial input rate limit, it might cause OOM exception. In the firest batch task ,receivers receive data at the maximum speed they can reach,it might exhaust executors memory resources. Add a initial input rate limit value can make sure the Streaming job execute success in the first batch,then the backpressure mechanism can adjust receiving rate adaptively. Author: junhao <junhao@mogujie.com> Closes #9593 from junhaoMg/junhao-dev.	2016-02-16 19:43:17 -08:00
BenFradet	00c72d27bf	[SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative filtering in general This documents the implementation of ALS in `spark.ml` with example code in scala, java and python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10411 from BenFradet/SPARK-12247.	2016-02-16 13:03:28 +00:00
Xin Ren	e4675c2402	[SPARK-13018][DOCS] Replace example code in mllib-pmml-model-export.md using include_example Replace example code in mllib-pmml-model-export.md using include_example https://issues.apache.org/jira/browse/SPARK-13018 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11126 from keypointt/SPARK-13018.	2016-02-15 20:17:21 -08:00
JeremyNixon	adb5483650	[SPARK-13312][MLLIB] Update java train-validation-split example in ml-guide Response to JIRA https://issues.apache.org/jira/browse/SPARK-13312. This contribution is my original work and I license the work to this project. Author: JeremyNixon <jnixon2@gmail.com> Closes #11199 from JeremyNixon/update_train_val_split_example.	2016-02-15 09:25:13 +00:00
Amit Dev	331293c302	[SPARK-13300][DOCUMENTATION] Added pygments.rb dependancy Looks like pygments.rb gem is also required for jekyll build to work. At least on Ubuntu/RHEL I could not do build without this dependency. So added this to steps. Author: Amit Dev <amitdev@gmail.com> Closes #11180 from amitdev/master.	2016-02-14 11:41:27 +00:00
Sanket	894921d813	[SPARK-6166] Limit number of in flight outbound requests This JIRA is related to https://github.com/apache/spark/pull/5852 Had to do some minor rework and test to make sure it works with current version of spark. Author: Sanket <schintap@untilservice-lm> Closes #10838 from redsanket/limit-outbound-connections.	2016-02-11 22:40:00 -08:00
Steve Loughran	a2c7dcf61f	[SPARK-7889][WEBUI] HistoryServer updates UI for incomplete apps When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available. It does this by checking if a version of the app has been loaded with a larger filesize. If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI. https://issues.apache.org/jira/browse/SPARK-7889 Author: Steve Loughran <stevel@hortonworks.com> Author: Imran Rashid <irashid@cloudera.com> Closes #11118 from squito/SPARK-7889-alternate.	2016-02-11 21:37:53 -06:00
Sasaki Toru	c2f21d8898	[SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template In spark-env.sh.template, there are multi-byte characters, this PR will remove it. Author: Sasaki Toru <sasakitoa@nttdata.co.jp> Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.	2016-02-11 09:30:36 +00:00
Sean Owen	29c547303f	[SPARK-12414][CORE] Remove closure serializer Remove spark.closure.serializer option and use JavaSerializer always CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be. Author: Sean Owen <sowen@cloudera.com> Closes #11150 from srowen/SPARK-12414.	2016-02-10 13:34:53 -08:00
Michael Gummelt	80cb963ad9	[SPARK-5095][MESOS] Support launching multiple mesos executors in coarse grained mesos mode. This is the next iteration of tnachen's previous PR: https://github.com/apache/spark/pull/4027 In that PR, we resolved with andrewor14 and pwendell to implement the Mesos scheduler's support of `spark.executor.cores` to be consistent with YARN and Standalone. This PR implements that resolution. This PR implements two high-level features. These two features are co-dependent, so they're implemented both here: - Mesos support for spark.executor.cores - Multiple executors per slave We at Mesosphere have been working with Typesafe on a Spark/Mesos integration test suite: https://github.com/typesafehub/mesos-spark-integration-tests, which passes for this PR. The contribution is my original work and I license the work to the project under the project's open source license. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #10993 from mgummelt/executor_sizing.	2016-02-10 10:53:33 -08:00
Luciano Resende	2dbb916440	[SPARK-13189] Cleanup build references to Scala 2.10 Author: Luciano Resende <lresende@apache.org> Closes #11092 from lresende/SPARK-13189.	2016-02-09 11:56:25 -08:00
Sebastián Ramírez	c882ec57de	[SPARK-13040][DOCS] Update JDBC deprecated SPARK_CLASSPATH documentation Update JDBC documentation based on http://stackoverflow.com/a/30947090/219530 as SPARK_CLASSPATH is deprecated. Also, that's how it worked, it didn't work with the SPARK_CLASSPATH or the --jars alone. This would solve issue: https://issues.apache.org/jira/browse/SPARK-13040 Author: Sebastián Ramírez <tiangolo@gmail.com> Closes #10948 from tiangolo/patch-docs-jdbc.	2016-02-09 08:49:34 +00:00
Luc Bourlier	0bb5b73387	[SPARK-13002][MESOS] Send initial request of executors for dyn allocation Fix for [SPARK-13002](https://issues.apache.org/jira/browse/SPARK-13002) about the initial number of executors when running with dynamic allocation on Mesos. Instead of fixing it just for the Mesos case, made the change in `ExecutorAllocationManager`. It is already driving the number of executors running on Mesos, only no the initial value. The `None` and `Some(0)` are internal details on the computation of resources to reserved, in the Mesos backend scheduler. `executorLimitOption` has to be initialized correctly, otherwise the Mesos backend scheduler will, either, create to many executors at launch, or not create any executors and not be able to recover from this state. Removed the 'special case' description in the doc. It was not totally accurate, and is not needed anymore. This doesn't fix the same problem visible with Spark standalone. There is no straightforward way to send the initial value in standalone mode. Somebody knowing this part of the yarn support should review this change. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #11047 from skyluc/issue/initial-dyn-alloc-2.	2016-02-05 14:37:42 -08:00
Bill Chambers	66e1383de2	[SPARK-13214][DOCS] update dynamicAllocation documentation Author: Bill Chambers <bill@databricks.com> Closes #11094 from anabranch/dynamic-docs.	2016-02-05 14:35:39 -08:00
Yuhao Yang	c2c956bcd1	[ML][DOC] fix wrong api link in ml onevsrest minor fix for api link in ml onevsrest Author: Yuhao Yang <hhbyyh@gmail.com> Closes #11068 from hhbyyh/onevsrestDoc.	2016-02-03 21:19:44 -08:00
Timothy Chen	51b03b71ff	[SPARK-12463][SPARK-12464][SPARK-12465][SPARK-10647][MESOS] Fix zookeeper dir with mesos conf and add docs. Fix zookeeper dir configuration used in cluster mode, and also add documentation around these settings. Author: Timothy Chen <tnachen@gmail.com> Closes #10057 from tnachen/fix_mesos_dir.	2016-02-01 12:45:02 -08:00
Lewuathe	711ce048a2	[ML][MINOR] Invalid MulticlassClassification reference in ml-guide In [ml-guide](https://spark.apache.org/docs/latest/ml-guide.html#example-model-selection-via-cross-validation), there is invalid reference to `MulticlassClassificationEvaluator` apidoc. https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation.MultiClassClassificationEvaluator Author: Lewuathe <lewuathe@me.com> Closes #10996 from Lewuathe/fix-typo-in-ml-guide.	2016-02-01 12:21:21 -08:00
Takeshi YAMAMURO	da9146c91a	[DOCS] Fix the jar location of datanucleus in sql-programming-guid.md ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10901 from maropu/DocFix.	2016-02-01 12:02:06 -08:00
Josh Rosen	289373b28c	[SPARK-6363][BUILD] Make Scala 2.11 the default Scala version This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds). The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance). After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break. Author: Josh Rosen <joshrosen@databricks.com> Closes #10608 from JoshRosen/SPARK-6363.	2016-01-30 00:20:28 -08:00
James Lohse	c2204436a1	Provide same info as in spark-submit --help this is stated for --packages and --repositories. Without stating it for --jars, people expect a standard java classpath to work, with expansion and using a different delimiter than a comma. Currently this is only state in the --help for spark-submit "Comma-separated list of local jars to include on the driver and executor classpaths." Author: James Lohse <jimlohse@users.noreply.github.com> Closes #10890 from jimlohse/patch-1.	2016-01-28 10:50:50 +00:00
Andrew	093291cf9b	[SPARK-1680][DOCS] Explain environment variables for running on YARN in cluster mode JIRA 1680 added a property called spark.yarn.appMasterEnv. This PR draws users' attention to this special case by adding an explanation in configuration.html#environment-variables Author: Andrew <weiner.andrew.j@gmail.com> Closes #10869 from weineran/branch-yarn-docs.	2016-01-27 09:31:44 +00:00
Shixiong Zhu	cbd507d69c	[SPARK-7799][STREAMING][DOCUMENT] Add the linking and deploying instructions for streaming-akka project Since `actorStream` is an external project, we should add the linking and deploying instructions for it. A follow up PR of #10744 Author: Shixiong Zhu <shixiong@databricks.com> Closes #10856 from zsxwing/akka-link-instruction.	2016-01-26 11:31:54 -08:00
Sean Owen	649e9d0f5b	[SPARK-3369][CORE][STREAMING] Java mapPartitions Iterator->Iterable is inconsistent with Scala's Iterator->Iterator Fix Java function API methods for flatMap and mapPartitions to require producing only an Iterator, not Iterable. Also fix DStream.flatMap to require a function producing TraversableOnce only, not Traversable. CC rxin pwendell for API change; tdas since it also touches streaming. Author: Sean Owen <sowen@cloudera.com> Closes #10413 from srowen/SPARK-3369.	2016-01-26 11:55:28 +00:00
Yanbo Liang	dd2325d9a7	[SPARK-11965][ML][DOC] Update user guide for RFormula feature interactions Update user guide for RFormula feature interactions. Meanwhile we also update other new features such as supporting string label in Spark 1.6. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10222 from yanboliang/spark-11965.	2016-01-25 11:52:26 -08:00
Sean Owen	aca2a01654	[SPARK-12760][DOCS] inaccurate description for difference between local vs cluster mode in closure handling Clarify that modifying a driver local variable won't have the desired effect in cluster modes, and may or may not work as intended in local mode Author: Sean Owen <sowen@cloudera.com> Closes #10866 from srowen/SPARK-12760.	2016-01-23 11:45:12 +00:00
Mortada Mehyar	56f57f894e	[SPARK-12760][DOCS] invalid lambda expression in python example for … …local vs cluster srowen thanks for the PR at https://github.com/apache/spark/pull/10866! sorry it took me a while. This is related to https://github.com/apache/spark/pull/10866, basically the assignment in the lambda expression in the python example is actually invalid ``` In [1]: data = [1, 2, 3, 4, 5] In [2]: counter = 0 In [3]: rdd = sc.parallelize(data) In [4]: rdd.foreach(lambda x: counter += x) File "<ipython-input-4-fcb86c182bad>", line 1 rdd.foreach(lambda x: counter += x) ^ SyntaxError: invalid syntax ``` Author: Mortada Mehyar <mortada.mehyar@gmail.com> Closes #10867 from mortada/doc_python_fix.	2016-01-23 11:36:33 +00:00
Shixiong Zhu	bc1babd63d	[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming - Remove Akka dependency from core. Note: the streaming-akka project still uses Akka. - Remove HttpFileServer - Remove Akka configs from SparkConf and SSLOptions - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult` depends on it. - Update comments and docs Author: Shixiong Zhu <shixiong@databricks.com> Closes #10854 from zsxwing/remove-akka.	2016-01-22 21:20:04 -08:00
felixcheung	85200c09ad	[SPARK-12534][DOC] update documentation to list command line equivalent to properties Several Spark properties equivalent to Spark submit command line options are missing. Author: felixcheung <felixcheung_m@hotmail.com> Closes #10491 from felixcheung/sparksubmitdoc.	2016-01-21 16:30:20 +01:00
Sun Rui	1b2a918e59	[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #10201 from sun-rui/SPARK-12204.	2016-01-20 21:08:15 -08:00
Shixiong Zhu	b7d74a602f	[SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project Include the following changes: 1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream 2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream" 3. Update the ActorWordCount example and add the JavaActorWordCount example 4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly Author: Shixiong Zhu <shixiong@databricks.com> Closes #10744 from zsxwing/streaming-akka-2.	2016-01-20 13:55:41 -08:00
felixcheung	488bbb216c	[SPARK-12232][SPARKR] New R API for read.table to avoid name conflict shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table` Author: felixcheung <felixcheung_m@hotmail.com> Closes #10406 from felixcheung/readtable.	2016-01-19 18:31:03 -08:00
scwf	43f1d59e17	[SPARK-2750][WEB UI] Add https support to the Web UI Author: scwf <wangfei1@huawei.com> Author: Marcelo Vanzin <vanzin@cloudera.com> Author: WangTaoTheTonic <wangtao111@huawei.com> Author: w00228970 <wangfei1@huawei.com> Closes #10238 from vanzin/SPARK-2750.	2016-01-19 14:49:55 -08:00
Shixiong Zhu	721845c1b6	[SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis integration doc This PR added instructions to get Kinesis assembly jar for Python users in the Kinesis integration page like Kafka doc. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10822 from zsxwing/kinesis-doc.	2016-01-18 16:50:05 -08:00
Shixiong Zhu	a973f483f6	[SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume integration doc This PR added instructions to get flume assembly jar for Python users in the flume integration page like Kafka doc. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10746 from zsxwing/flume-doc.	2016-01-18 15:38:03 -08:00
Jeff Lam	86972fa521	[SPARK-12722][DOCS] Fixed typo in Pipeline example http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline ``` val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model") ``` should be ``` val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model") ``` cc: jkbradley Author: Jeff Lam <sha0lin@alumni.carnegiemellon.edu> Closes #10769 from Agent007/SPARK-12722.	2016-01-16 10:41:40 +00:00
Josh Rosen	8dbbf3e75e	[SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version. /cc rxin srowen Author: Josh Rosen <joshrosen@databricks.com> Closes #10775 from JoshRosen/add-hadoop-2.7-profile.	2016-01-15 17:07:24 -08:00
Tom Graves	96fb894d4b	[SPARK-2930] clarify docs on using webhdfs with spark.yarn.access.nam… …enodes Author: Tom Graves <tgraves@yahoo-inc.com> Closes #10699 from tgravescs/SPARK-2930.	2016-01-15 13:11:27 +00:00
Joseph K. Bradley	20d8ef858a	[SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMeans user guide example Fixed WSSSE computeCost in Python mllib KMeans user guide example by using new computeCost method API in Python. Author: Joseph K. Bradley <joseph@databricks.com> Closes #10707 from jkbradley/kmeans-doc-fix.	2016-01-13 18:01:29 -08:00
Luc Bourlier	cc91e21879	[SPARK-12805][MESOS] Fixes documentation on Mesos run modes The default run has changed, but the documentation didn't fully reflect the change. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #10740 from skyluc/issue/mesos-modes-doc.	2016-01-13 11:45:13 -08:00
Sean Owen	9c7f34af37	[SPARK-5273][MLLIB][DOCS] Improve documentation examples for LinearRegression Use a much smaller step size in LinearRegressionWithSGD MLlib examples to achieve a reasonable RMSE. Our training folks hit this exact same issue when concocting an example and had the same solution. Author: Sean Owen <sowen@cloudera.com> Closes #10675 from srowen/SPARK-5273.	2016-01-12 12:13:32 +00:00
Brandon Bradley	a767ee8a05	[SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting Warning users about casting changes. Author: Brandon Bradley <bradleytastic@gmail.com> Closes #10708 from blbradley/spark-12758.	2016-01-11 14:21:50 -08:00
Reynold Xin	5b0d544339	[SPARK-12735] Consolidate & move spark-ec2 to AMPLab managed repository. Author: Reynold Xin <rxin@databricks.com> Closes #10673 from rxin/SPARK-12735.	2016-01-09 20:28:20 -08:00
Sean Owen	659fd9d04b	[SPARK-4819] Remove Guava's "Optional" from public API Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`) See also https://github.com/apache/spark/pull/10512 Author: Sean Owen <sowen@cloudera.com> Closes #10513 from srowen/SPARK-4819.	2016-01-08 13:02:30 -08:00
Jeff Zhang	00d9261724	[DOCUMENTATION] doc fix of job scheduling spark.shuffle.service.enabled is spark application related configuration, it is not necessary to set it in yarn-site.xml Author: Jeff Zhang <zjffdu@apache.org> Closes #10657 from zjffdu/doc-fix.	2016-01-08 11:38:46 -08:00
Shixiong Zhu	c94199e977	[SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations for Streaming /cc tdas brkyvz Author: Shixiong Zhu <shixiong@databricks.com> Closes #10453 from zsxwing/streaming-conf.	2016-01-07 17:37:46 -08:00
Jacek Laskowski	8113dbda0b	[STREAMING][DOCS][EXAMPLES] Minor fixes Author: Jacek Laskowski <jacek@japila.pl> Closes #10603 from jaceklaskowski/streaming-actor-custom-receiver.	2016-01-07 00:27:13 -08:00
zzcclp	84e77a15df	[DOC] fix 'spark.memory.offHeap.enabled' default value to false modify 'spark.memory.offHeap.enabled' default value to false Author: zzcclp <xm_zzc@sina.com> Closes #10633 from zzcclp/fix_spark.memory.offHeap.enabled_default_value.	2016-01-06 23:06:21 -08:00
Josh Rosen	8e19c7663a	[SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0 This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata cleaning code. Now that we have the `ContextCleaner` and a timer to trigger periodic GCs, I don't think that `spark.cleaner.ttl` is necessary anymore. The TTL-based cleaning isn't enabled by default, isn't included in our end-to-end tests, and has been a source of user confusion when it is misconfigured. If the TTL is set too low, data which is still being used may be evicted / deleted, leading to hard to diagnose bugs. For all of these reasons, I think that we should remove this functionality in Spark 2.0. Additional benefits of doing this include marginally reduced memory usage, since we no longer need to store timetsamps in hashmaps, and a handful fewer threads. Author: Josh Rosen <joshrosen@databricks.com> Closes #10534 from JoshRosen/remove-ttl-based-cleaning.	2016-01-06 20:50:31 -08:00
BenFradet	f82ebb1522	[SPARK-12368][ML][DOC] Better doc for the binary classification evaluator' metricName For the BinaryClassificationEvaluator, the scaladoc doesn't mention that "areaUnderPR" is supported, only that the default is "areadUnderROC". Also, in the documentation, it is said that: "The default metric used to choose the best ParamMap can be overriden by the setMetric method in each of these evaluators." However, the method is called setMetricName. This PR aims to fix both issues. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10328 from BenFradet/SPARK-12368.	2016-01-06 12:01:05 -08:00
Yanbo Liang	1c6cf1a563	[SPARK-12570][ML][DOC] DecisionTreeRegressor: provide variance of prediction: user guide update Update user guide doc for ```DecisionTreeRegressor``` providing variance of prediction. cc jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #10594 from yanboliang/spark-12570.	2016-01-05 14:24:32 -08:00
felixcheung	8896ec9f02	[SPARKR][DOC] minor doc update for version in migration guide checked that the change is in Spark 1.6.0. shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #10574 from felixcheung/rwritemodedoc.	2016-01-05 08:39:58 +05:30
Josh Rosen	6c83d938cc	[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection. In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection. This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly). If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different). This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons). Author: Josh Rosen <joshrosen@databricks.com> Closes #10519 from JoshRosen/jdbc-driver-precedence.	2016-01-04 10:39:42 -08:00
Reynold Xin	ee8f8d3184	[SPARK-12588] Remove HttpBroadcast in Spark 2.0. We switched to TorrentBroadcast in Spark 1.1, and HttpBroadcast has been undocumented since then. It's time to remove it in Spark 2.0. Author: Reynold Xin <rxin@databricks.com> Closes #10531 from rxin/SPARK-12588.	2015-12-30 18:07:07 -08:00
Shixiong Zhu	20591afd79	[SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for Streaming This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10385 from zsxwing/accumulator-broadcast-example.	2015-12-22 16:39:10 -08:00
Shixiong Zhu	93db50d1c2	[SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler Author: Shixiong Zhu <shixiong@databricks.com> Closes #10439 from zsxwing/kafka-message-handler-doc.	2015-12-22 15:33:30 -08:00
Reynold Xin	0a38637d05	[SPARK-11807] Remove support for Hadoop < 2.2 i.e. Hadoop 1 and Hadoop 2.0 Author: Reynold Xin <rxin@databricks.com> Closes #10404 from rxin/SPARK-11807.	2015-12-21 22:15:52 -08:00
Davies Liu	29cecd4a42	[SPARK-12388] change default compression to lz4 According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy. After changing the compressor to LZ4, I saw 20% improvement on end-to-end time for a TPCDS query (Q4). [1] https://github.com/ning/jvm-compressor-benchmark/wiki cc rxin Author: Davies Liu <davies@databricks.com> Closes #10342 from davies/lz4.	2015-12-21 14:21:43 -08:00
Reynold Xin	284e29a870	[SPARK-11808] Remove Bagel. Author: Reynold Xin <rxin@databricks.com> Closes #10395 from rxin/SPARK-11808.	2015-12-19 22:40:35 -08:00
Reynold Xin	f496031bd2	Bump master version to 2.0.0-SNAPSHOT. Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.	2015-12-19 15:13:05 -08:00
gatorsmile	499ac3e69a	[SPARK-12091] [PYSPARK] Deprecate the JAVA-specific deserialized storage levels The current default storage level of Python persist API is MEMORY_ONLY_SER. This is different from the default level MEMORY_ONLY in the official document and RDD APIs. davies Is this inconsistency intentional? Thanks! Updates: Since the data is always serialized on the Python side, the storage levels of JAVA-specific deserialization are not removed, such as MEMORY_ONLY. Updates: Based on the reviewers' feedback. In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library, so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`, `MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, `DISK_ONLY_2` and `OFF_HEAP`. Author: gatorsmile <gatorsmile@gmail.com> Closes #10092 from gatorsmile/persistStorageLevel.	2015-12-18 20:06:05 -08:00
Burak Yavuz	2377b707f2	[SPARK-11985][STREAMING][KINESIS][DOCS] Update Kinesis docs - Provide example on `message handler` - Provide bit on KPL record de-aggregation - Fix typos Author: Burak Yavuz <brkyvz@gmail.com> Closes #9970 from brkyvz/kinesis-docs.	2015-12-18 15:24:41 -08:00
Joseph K. Bradley	8148cc7a5c	[SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6 No known breaking changes, but some deprecations and changes of behavior. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #10235 from jkbradley/mllib-guide-update-1.6.	2015-12-16 11:53:04 -08:00
Yu ISHIKAWA	7b6dc29d0e	[SPARK-6518][MLLIB][EXAMPLE][DOC] Add example code and user guide for bisecting k-means This PR includes only an example code in order to finish it quickly. I'll send another PR for the docs soon. Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9952 from yu-iskw/SPARK-6518.	2015-12-16 10:55:42 -08:00
Yu ISHIKAWA	26d70bd2b4	[SPARK-12215][ML][DOC] User guide section for KMeans in spark.ml cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #10244 from yu-iskw/SPARK-12215.	2015-12-16 10:43:45 -08:00
Jeff Zhang	2eb5af5f0d	[SPARK-12318][SPARKR] Save mode in SparkR should be error by default shivaram Please help review. Author: Jeff Zhang <zjffdu@apache.org> Closes #10290 from zjffdu/SPARK-12318.	2015-12-16 10:32:32 -08:00
Timothy Hunter	a6325fc401	[SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentation This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow. Credit goes to the original author Titan-C (mentioned in the NOTICE). Note that I am not a CSS expert, so I can only address comments up to some extent. Default view: <img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png"> When collapsed manually by the user: <img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png"> Disappears when column is too narrow: <img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png"> Can still be opened by the user if necessary: <img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png"> Author: Timothy Hunter <timhunter@databricks.com> Closes #10297 from thunterdb/12324.	2015-12-16 10:12:33 -08:00
jerryshao	63ccdef813	[SPARK-10123][DEPLOY] Support specifying deploy mode from configuration Please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10195 from jerryshao/SPARK-10123.	2015-12-15 18:24:23 -08:00
Timothy Chen	c2de99a7c3	[SPARK-12351][MESOS] Add documentation about submitting Spark with mesos cluster mode. Adding more documentation about submitting jobs with mesos cluster mode. Author: Timothy Chen <tnachen@gmail.com> Closes #10086 from tnachen/mesos_supervise_docs.	2015-12-15 18:20:00 -08:00
BenFradet	e25f1fe427	[MINOR][DOC] Fix broken word2vec link Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10282 from BenFradet/SPARK-12199.	2015-12-14 13:50:30 +00:00
Xusen Yin	98b212d36b	[SPARK-12199][DOC] Follow-up: Refine example code in ml-features.md https://issues.apache.org/jira/browse/SPARK-12199 Follow-up PR of SPARK-11551. Fix some errors in ml-features.md mengxr Author: Xusen Yin <yinxusen@gmail.com> Closes #10193 from yinxusen/SPARK-12199.	2015-12-12 17:47:01 -08:00
BenFradet	aea676ca2d	[SPARK-12217][ML] Document invalid handling for StringIndexer Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation. I wonder if I should also add a snippet to the code example, input welcome. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10257 from BenFradet/SPARK-12217.	2015-12-11 15:43:00 -08:00
anabranch	aa305dcaf5	[SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation Adding in Pipeline Import and Export Documentation. Author: anabranch <wac.chambers@gmail.com> Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #10179 from anabranch/master.	2015-12-11 12:55:56 -08:00
jerryshao	24d3357d66	[STREAMING][DOC][MINOR] Update the description of direct Kafka stream doc With the merge of [SPARK-8337](https://issues.apache.org/jira/browse/SPARK-8337), now the Python API has the same functionalities compared to Scala/Java, so here changing the description to make it more precise. zsxwing tdas , please review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10246 from jerryshao/direct-kafka-doc-update.	2015-12-10 15:31:46 -08:00
Josh Rosen	23a9e62bad	[SPARK-12251] Document and improve off-heap memory configurations This patch adds documentation for Spark configurations that affect off-heap memory and makes some naming and validation improvements for those configs. - Change `spark.memory.offHeapSize` to `spark.memory.offHeap.size`. This is fine because this configuration has not shipped in any Spark release yet (it's new in Spark 1.6). - Deprecated `spark.unsafe.offHeap` in favor of a new `spark.memory.offHeap.enabled` configuration. The motivation behind this change is to gather all memory-related configurations under the same prefix. - Add a check which prevents users from setting `spark.memory.offHeap.enabled=true` when `spark.memory.offHeap.size == 0`. After SPARK-11389 (#9344), which was committed in Spark 1.6, Spark enforces a hard limit on the amount of off-heap memory that it will allocate to tasks. As a result, enabling off-heap execution memory without setting `spark.memory.offHeap.size` will lead to immediate OOMs. The new configuration validation makes this scenario easier to diagnose, helping to avoid user confusion. - Document these configurations on the configuration page. Author: Josh Rosen <joshrosen@databricks.com> Closes #10237 from JoshRosen/SPARK-12251.	2015-12-10 15:29:04 -08:00
Marcelo Vanzin	4a46b8859d	[SPARK-11563][CORE][REPL] Use RpcEnv to transfer REPL-generated classes. This avoids bringing up yet another HTTP server on the driver, and instead reuses the file server already managed by the driver's RpcEnv. As a bonus, the repl now inherits the security features of the network library. There's also a small change to create the directory for storing classes under the root temp dir for the application (instead of directly under java.io.tmpdir). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9923 from vanzin/SPARK-11563.	2015-12-10 13:26:30 -08:00
Timothy Hunter	2ecbe02d5b	[SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation. Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark). It also removes some files that I forgot to delete with #10207 Author: Timothy Hunter <timhunter@databricks.com> Closes #10234 from thunterdb/12212.	2015-12-10 12:50:46 -08:00
Yin Huai	ac8cdf1cdc	[SPARK-11678][SQL][DOCS] Document basePath in the programming guide. This PR adds document for `basePath`, which is a new parameter used by `HadoopFsRelation`. The compiled doc is shown below. ![image](https://cloud.githubusercontent.com/assets/2072857/11673132/1ba01192-9dcb-11e5-98d9-ac0b4e92e98c.png) JIRA: https://issues.apache.org/jira/browse/SPARK-11678 Author: Yin Huai <yhuai@databricks.com> Closes #10211 from yhuai/basePathDoc.	2015-12-09 18:09:36 -08:00
Andrew Ray	7a8e587dc0	[SPARK-12211][DOC][GRAPHX] Fix version number in graphx doc for migration from 1.1 Migration from 1.1 section added to the GraphX doc in 1.2.0 (see https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#migrating-from-spark-11) uses \{{site.SPARK_VERSION}} as the version where changes were introduced, it should be just 1.2. Author: Andrew Ray <ray.andrew@gmail.com> Closes #10206 from aray/graphx-doc-1.1-migration.	2015-12-09 17:16:01 -08:00
Xusen Yin	051c6a066f	[SPARK-11551][DOC] Replace example code in ml-features.md using include_example PR on behalf of somideshmukh, thanks! Author: Xusen Yin <yinxusen@gmail.com> Author: somideshmukh <somilde@us.ibm.com> Closes #10219 from yinxusen/SPARK-11551.	2015-12-09 12:00:48 -08:00
Timothy Hunter	765c67f5f2	[SPARK-8517][ML][DOC] Reorganizes the spark.ml user guide This PR moves pieces of the spark.ml user guide to reflect suggestions in SPARK-8517. It does not introduce new content, as requested. <img width="192" alt="screen shot 2015-12-08 at 11 36 00 am" src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png"> Author: Timothy Hunter <timhunter@databricks.com> Closes #10207 from thunterdb/spark-8517.	2015-12-08 18:40:21 -08:00
Michael Armbrust	3959489423	[SPARK-12069][SQL] Update documentation with Datasets Author: Michael Armbrust <michael@databricks.com> Closes #10060 from marmbrus/docs.	2015-12-08 15:58:35 -08:00
BenFradet	06746b3005	[SPARK-12159][ML] Add user guide section for IndexToString transformer Documentation regarding the `IndexToString` label transformer with code snippets in Scala/Java/Python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10166 from BenFradet/SPARK-12159.	2015-12-08 12:45:34 -08:00
Cheng Lian	da2012a0e1	[SPARK-11551][DOC][EXAMPLE] Revert PR #10002 This reverts PR #10002, commit `78209b0cca`. The original PR wasn't tested on Jenkins before being merged. Author: Cheng Lian <lian@databricks.com> Closes #10200 from liancheng/revert-pr-10002.	2015-12-08 19:18:59 +08:00
Yanbo Liang	4a39b5a1be	[SPARK-11958][SPARK-11957][ML][DOC] SQLTransformer user guide and example code Add ```SQLTransformer``` user guide, example code and make Scala API doc more clear. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10006 from yanboliang/spark-11958.	2015-12-07 23:50:57 -08:00
somideshmukh	78209b0cca	[SPARK-11551][DOC][EXAMPLE] Replace example code in ml-features.md using include_example Made new patch contaning only markdown examples moved to exmaple/folder. Ony three java code were not shfted since they were contaning compliation error ,these classes are 1)StandardScale 2)NormalizerExample 3)VectorIndexer Author: Xusen Yin <yinxusen@gmail.com> Author: somideshmukh <somilde@us.ibm.com> Closes #10002 from somideshmukh/SomilBranch1.33.	2015-12-07 23:26:34 -08:00
Xusen Yin	871e85d9c1	[SPARK-11963][DOC] Add docs for QuantileDiscretizer https://issues.apache.org/jira/browse/SPARK-11963 Author: Xusen Yin <yinxusen@gmail.com> Closes #9962 from yinxusen/SPARK-11963.	2015-12-07 13:16:47 -08:00
rotems	f30373f5ee	[SPARK-12080][CORE] Kryo - Support multiple user registrators Author: rotems <roter> Closes #10078 from Botnaim/KryoMultipleCustomRegistrators.	2015-12-04 16:58:34 -08:00
felixcheung	43c575cb17	[SPARK-12116][SPARKR][DOCS] document how to workaround function name conflicts with dplyr shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #10119 from felixcheung/rdocdplyrmasked.	2015-12-03 09:22:21 -08:00
Jeff Zhang	7470d9edbb	[DOCUMENTATION][MLLIB] typo in mllib doc \cc mengxr Author: Jeff Zhang <zjffdu@apache.org> Closes #10093 from zjffdu/mllib_typo.	2015-12-03 15:36:28 +00:00
Andrew Or	d96f8c997b	[SPARK-12081] Make unified memory manager work with small heaps The existing `spark.memory.fraction` (default 0.75) gives the system 25% of the space to work with. For small heaps, this is not enough: e.g. default 1GB leaves only 250MB system memory. This is especially a problem in local mode, where the driver and executor are crammed in the same JVM. Members of the community have reported driver OOM's in such cases. New proposal. We now reserve 300MB before taking the 75%. For 1GB JVMs, this leaves `(1024 - 300) * 0.75 = 543MB` for execution and storage. This is proposal (1) listed in the [JIRA](https://issues.apache.org/jira/browse/SPARK-12081). Author: Andrew Or <andrew@databricks.com> Closes #10081 from andrewor14/unified-memory-small-heaps.	2015-12-01 19:51:12 -08:00
Xusen Yin	e76431f886	[SPARK-11961][DOC] Add docs of ChiSqSelector https://issues.apache.org/jira/browse/SPARK-11961 Author: Xusen Yin <yinxusen@gmail.com> Closes #9965 from yinxusen/SPARK-11961.	2015-12-01 15:21:53 -08:00
woj-i	6a8cf80cc8	[SPARK-11821] Propagate Kerberos keytab for all environments andrewor14 the same PR as in branch 1.5 harishreedharan Author: woj-i <wojciechindyk@gmail.com> Closes #9859 from woj-i/master.	2015-12-01 11:05:45 -08:00
Josh Rosen	f73379be2b	[HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin. I accidentally omitted these as part of #10049.	2015-11-30 18:25:59 -08:00
Xusen Yin	e6dc89a339	[SPARK-12035] Add more debug information in include_example tag of Jekyll https://issues.apache.org/jira/browse/SPARK-12035 When we debuging lots of example code files, like in https://github.com/apache/spark/pull/10002, it's hard to know which file causes errors due to limited information in `include_example.rb`. With their filenames, we can locate bugs easily. Author: Xusen Yin <yinxusen@gmail.com> Closes #10026 from yinxusen/SPARK-12035.	2015-11-30 17:18:44 -08:00
Josh Rosen	d3ca8cfac2	[SPARK-12000] Fix API doc generation issues This pull request fixes multiple issues with API doc generation. - Modify the Jekyll plugin so that the entire doc build fails if API docs cannot be generated. This will make it easy to detect when the doc build breaks, since this will now trigger Jenkins failures. - Change how we handle the `-target` compiler option flag in order to fix `javadoc` generation. - Incorporate doc changes from thunterdb (in #10048). Closes #10048. Author: Josh Rosen <joshrosen@databricks.com> Author: Timothy Hunter <timhunter@databricks.com> Closes #10049 from JoshRosen/fix-doc-build.	2015-11-30 16:37:27 -08:00
Feynman Liang	5535888930	[SPARK-11960][MLLIB][DOC] User guide for streaming tests CC jkbradley mengxr josepablocam Author: Feynman Liang <feynman.liang@gmail.com> Closes #10005 from feynmanliang/streaming-test-user-guide.	2015-11-30 15:38:44 -08:00
Yuhao Yang	e232720a65	[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Original PR is reverted due to document build error. https://github.com/apache/spark/pull/9722 mengxr feynmanliang yinxusen Sorry for the troubling. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9974 from hhbyyh/ldaMLExample.	2015-11-30 14:56:51 -08:00
BenFradet	f2fbfa444f	[MINOR][DOCS] fixed list display in ml-ensembles The list in ml-ensembles.md wasn't properly formatted and, as a result, was looking like this: ![old](http://i.imgur.com/2ZhELLR.png) This PR aims to make it look like this: ![new](http://i.imgur.com/0Xriwd2.png) Author: BenFradet <benjamin.fradet@gmail.com> Closes #10025 from BenFradet/ml-ensembles-doc.	2015-11-30 13:02:08 -08:00
muxator	4376b5bea8	doc typo: "classificaion" -> "classification" Author: muxator <muxator@users.noreply.github.com> Closes #10008 from muxator/patch-1.	2015-11-26 18:52:20 -08:00
Jeff Zhang	67b6732088	[DOCUMENTATION] Fix minor doc error Author: Jeff Zhang <zjffdu@apache.org> Closes #9956 from zjffdu/dev_typo.	2015-11-25 11:37:42 -08:00
Yu ISHIKAWA	0dee44a664	[MINOR] Remove unnecessary spaces in `include_example.rb` Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9960 from yu-iskw/minor-remove-spaces.	2015-11-25 11:35:52 -08:00
Stephen Samuel	026ea2eab1	Updated sql programming guide to include jdbc fetch size Author: Stephen Samuel <sam@sksamuel.com> Closes #9377 from sksamuel/master.	2015-11-23 19:52:12 -08:00
Marcelo Vanzin	c2467dadae	[SPARK-11140][CORE] Transfer files using network lib when using NettyRpcEnv. This change abstracts the code that serves jars / files to executors so that each RpcEnv can have its own implementation; the akka version uses the existing HTTP-based file serving mechanism, while the netty versions uses the new stream support added to the network lib, which makes file transfers benefit from the easier security configuration of the network library, and should also reduce overhead overall. The change includes a small fix to TransportChannelHandler so that it propagates user events to downstream handlers. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9530 from vanzin/SPARK-11140.	2015-11-23 13:54:19 -08:00
Luciano Resende	242be7daed	[SPARK-11910][STREAMING][DOCS] Update twitter4j dependency version Author: Luciano Resende <lresende@apache.org> Closes #9892 from lresende/SPARK-11910.	2015-11-23 13:46:34 -08:00
jerryshao	5fd86e4fc2	[SPARK-7173][YARN] Add label expression support for application master Add label expression support for AM to restrict it runs on the specific set of nodes. I tested it locally and works fine. sryza and vanzin please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #9800 from jerryshao/SPARK-7173.	2015-11-23 10:41:17 -08:00
Timothy Hunter	fc4b792d28	[SPARK-11835] Adds a sidebar menu to MLlib's documentation This PR adds a sidebar menu when browsing the user guide of MLlib. It uses a YAML file to describe the structure of the documentation. It should be trivial to adapt this to the other projects. ![screen shot 2015-11-18 at 4 46 12 pm](https://cloud.githubusercontent.com/assets/7594753/11259591/a55173f4-8e17-11e5-9340-0aed79d66262.png) Author: Timothy Hunter <timhunter@databricks.com> Closes #9826 from thunterdb/spark-11835.	2015-11-22 21:51:42 -08:00
Xiangrui Meng	a2dce22e0a	Revert "[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml" This reverts commit `e359d5dcf5`.	2015-11-20 16:51:47 -08:00
Vikas Nelamangala	ed47b1e660	[SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md using include_example Author: Vikas Nelamangala <vikasnelamangala@Vikass-MacBook-Pro.local> Closes #9689 from vikasnp/master.	2015-11-20 15:18:41 -08:00
Yuhao Yang	e359d5dcf5	[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9722 from hhbyyh/ldaMLExample.	2015-11-20 09:57:09 -08:00
felixcheung	1a93323c5b	[SPARK-11339][SPARKR] Document the list of functions in R base package that are masked by functions with same name in SparkR Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called. For those we can't call, added them to SparkR programming guide. It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg. ``` > methods("transform") [1] transform,ANY-method transform.data.frame [3] transform,DataFrame-method transform.default see '?methods' for accessing help and source code > methods("subset") [1] subset.data.frame subset,DataFrame-method subset.default [4] subset.matrix see '?methods' for accessing help and source code Warning message: In .S3methods(generic.function, class, parent.frame()) : function 'subset' appears not to be S3 generic; found functions that look like S3 methods ``` Any idea? More information on masking: http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm http://www.sfu.ca/~sweldon/howTo/guide4.pdf This is what the output doc looks like (minus css): ![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png) Author: felixcheung <felixcheung_m@hotmail.com> Closes #9785 from felixcheung/rmasked.	2015-11-18 23:32:49 -08:00
Yanbo Liang	e222d75849	[SPARK-11684][R][ML][DOC] Update SparkR glm API doc, user guide and example codes This PR includes: * Update SparkR:::glm, SparkR:::summary API docs. * Update SparkR machine learning user guide and example codes to show: * supporting feature interaction in R formula. * summary for gaussian GLM model. * coefficients for binomial GLM model. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9727 from yanboliang/spark-11684.	2015-11-18 13:30:29 -08:00
Reynold Xin	a416e41e28	[SPARK-11809] Switch the default Mesos mode to coarse-grained mode Based on my conversions with people, I believe the consensus is that the coarse-grained mode is more stable and easier to reason about. It is best to use that as the default rather than the more flaky fine-grained mode. Author: Reynold Xin <rxin@databricks.com> Closes #9795 from rxin/SPARK-11809.	2015-11-18 12:50:29 -08:00
Xusen Yin	9154f89bef	[SPARK-11728] Replace example code in ml-ensembles.md using include_example JIRA issue https://issues.apache.org/jira/browse/SPARK-11728. The ml-ensembles.md file contains `OneVsRestExample`. Instead of writing new code files of two `OneVsRestExample`s, I use two existing files in the examples directory, they are `OneVsRestExample.scala` and `JavaOneVsRestExample.scala`. Author: Xusen Yin <yinxusen@gmail.com> Closes #9716 from yinxusen/SPARK-11728.	2015-11-17 23:44:06 -08:00
Xusen Yin	328eb49e62	[SPARK-11729] Replace example code in ml-linear-methods.md using include_example JIRA link: https://issues.apache.org/jira/browse/SPARK-11729 Author: Xusen Yin <yinxusen@gmail.com> Closes #9713 from yinxusen/SPARK-11729.	2015-11-17 13:59:59 -08:00
Cheng Lian	7b1407c7b9	[SPARK-11089][SQL] Adds option for disabling multi-session in Thrift server This PR adds a new option `spark.sql.hive.thriftServer.singleSession` for disabling multi-session support in the Thrift server. Note that this option is added as a Spark configuration (retrieved from `SparkConf`) rather than Spark SQL configuration (retrieved from `SQLConf`). This is because all SQL configurations are session-ized. Since multi-session support is by default on, no JDBC connection can modify global configurations like the newly added one. Author: Cheng Lian <lian@databricks.com> Closes #9740 from liancheng/spark-11089.single-session-option.	2015-11-17 11:17:52 -08:00
Philipp Hoffmann	15cc36b778	[SPARK-11779][DOCS] Fix reference to deprecated MESOS_NATIVE_LIBRARY MESOS_NATIVE_LIBRARY was renamed in favor of MESOS_NATIVE_JAVA_LIBRARY. This commit fixes the reference in the documentation. Author: Philipp Hoffmann <mail@philipphoffmann.de> Closes #9768 from philipphoffmann/patch-2.	2015-11-17 14:13:13 +00:00
yangping.wu	7276fa9aa9	[SPARK-11751] Doc describe error in the "Spark Streaming Programming Guide" page In the [Task Launching Overheads](http://spark.apache.org/docs/latest/streaming-programming-guide.html#task-launching-overheads) section, >Task Serialization: Using Kryo serialization for serializing tasks can reduce the task sizes, and therefore reduce the time taken to send them to the slaves. as we known Task Serialization is configuration by spark.closure.serializer parameter, but currently only the Java serializer is supported. If we set spark.closure.serializer to org.apache.spark.serializer.KryoSerializer, then this will throw a exception. Author: yangping.wu <wyphao.2007@163.com> Closes #9734 from 397090770/397090770-patch-1.	2015-11-17 14:11:34 +00:00
Andrew Or	33a0ec9377	[SPARK-11710] Document new memory management model Author: Andrew Or <andrew@databricks.com> Closes #9676 from andrewor14/memory-management-docs.	2015-11-16 17:00:18 -08:00
Kai Jiang	9a73b33a9a	[MINOR][DOCS] typo in docs/configuration.md `<\code>` end tag missing backslash in docs/configuration.md{L308-L339} ref #8795 Author: Kai Jiang <jiangkai@gmail.com> Closes #9715 from vectorijk/minor-typo-docs.	2015-11-14 11:59:37 +00:00
Xusen Yin	912b94363b	[SPARK-11336] Add links to example codes https://issues.apache.org/jira/browse/SPARK-11336 mengxr I add a hyperlink of Spark on Github and a hint of their existences in Spark code repo in each code example. I remove the config key for changing the example code dir, since we assume all examples should be in spark/examples. The hyperlink, though we cannot use it now, since the Spark v1.6.0 has not been released yet, can be used after the release. So it is not a problem. I add some screen shots, so you can get an instant feeling. <img width="949" alt="screen shot 2015-10-27 at 10 47 18 pm" src="https://cloud.githubusercontent.com/assets/2637239/10780634/bd20e072-7cfc-11e5-8960-def4fc62a8ea.png"> <img width="1144" alt="screen shot 2015-10-27 at 10 47 31 pm" src="https://cloud.githubusercontent.com/assets/2637239/10780636/c3f6e180-7cfc-11e5-80b2-233589f4a9a3.png"> Author: Xusen Yin <yinxusen@gmail.com> Closes #9320 from yinxusen/SPARK-11336.	2015-11-13 13:14:25 -08:00
Yanbo Liang	99693fef0a	[SPARK-11723][ML][DOC] Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame, include: * Use libSVM data source for all example codes under examples/ml, and remove unused import. * Use libSVM data source for user guides under ml-*** which were omitted by #8697. * Fix bug: We should use ```sqlContext.read().format("libsvm").load(path)``` at Java side, but the API doc and user guides misuse as ```sqlContext.read.format("libsvm").load(path)```. * Code cleanup. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9690 from yanboliang/spark-11723.	2015-11-13 08:43:05 -08:00
Rishabh Bhardwaj	61a28486cc	[SPARK-11445][DOCS] Replaced example code in mllib-ensembles.md using include_example I have made the required changes and tested. Kindly review the changes. Author: Rishabh Bhardwaj <rbnext29@gmail.com> Closes #9407 from rishabhbhardwaj/SPARK-11445.	2015-11-13 08:36:46 -08:00
Yanbo Liang	ea5ae2705a	[SPARK-11629][ML][PYSPARK][DOC] Python example code for Multilayer Perceptron Classification Add Python example code for Multilayer Perceptron Classification, and make example code in user guide document testable. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9594 from yanboliang/spark-11629.	2015-11-12 21:29:43 -08:00
Andrew Or	12a0784ac0	[SPARK-11667] Update dynamic allocation docs to reflect supported cluster managers Author: Andrew Or <andrew@databricks.com> Closes #9637 from andrewor14/update-da-docs.	2015-11-12 15:48:42 -08:00
Andrew Or	cf38fc7551	[SPARK-11670] Fix incorrect kryo buffer default value in docs <img width="931" alt="screen shot 2015-11-11 at 1 53 21 pm" src="https://cloud.githubusercontent.com/assets/2133137/11108261/35d183d4-889a-11e5-9572-85e9d6cebd26.png"> Author: Andrew Or <andrew@databricks.com> Closes #9638 from andrewor14/fix-kryo-docs.	2015-11-12 15:47:29 -08:00
Nick Evans	dd77e278b9	[SPARK-11335][STREAMING] update kafka direct python docs on how to get the offset ranges for a KafkaRDD tdas koeninger This updates the Spark Streaming + Kafka Integration Guide doc with a working method to access the offsets of a `KafkaRDD` through Python. Author: Nick Evans <me@nicolasevans.org> Closes #9289 from manygrams/update_kafka_direct_python_docs.	2015-11-11 13:29:30 -08:00
Josh Rosen	529a1d3380	[SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled classes This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning of closures that were compiled by Java 8. In order to avoid ASM dependency conflicts, Spark excludes ASM from all of its dependencies and uses a shaded version of ASM 4 that comes from `reflectasm` (see [SPARK-782](https://issues.apache.org/jira/browse/SPARK-782) and #232). This patch updates Spark to use a shaded version of ASM 5.0.4 that was published by the Apache XBean project; the POM used to create the shaded artifact can be found at https://github.com/apache/geronimo-xbean/blob/xbean-4.4/xbean-asm5-shaded/pom.xml. http://movingfulcrum.tumblr.com/post/80826553604/asm-framework-50-the-missing-migration-guide was a useful resource while upgrading the code to use the new ASM5 opcodes. I also added a new regression tests in the `java8-tests` subproject; the existing tests were insufficient to catch this bug, which only affected Scala 2.11 user code which was compiled targeting Java 8. Author: Josh Rosen <joshrosen@databricks.com> Closes #9512 from JoshRosen/SPARK-6152.	2015-11-11 11:16:39 -08:00
Pravin Gadakh	638c51d938	[SPARK-11550][DOCS] Replace example code in mllib-optimization.md using include_example Author: Pravin Gadakh <pravingadakh177@gmail.com> Closes #9516 from pravingadakh/SPARK-11550.	2015-11-10 14:47:04 -08:00
Xusen Yin	a81f47ff74	[SPARK-11382] Replace example code in mllib-decision-tree.md using include_example https://issues.apache.org/jira/browse/SPARK-11382 B.T.W. I fix an error in naive_bayes_example.py. Author: Xusen Yin <yinxusen@gmail.com> Closes #9596 from yinxusen/SPARK-11382.	2015-11-10 10:05:53 -08:00
gatorsmile	2f38378856	[SPARK-11360][DOC] Loss of nullability when writing parquet files This fix is to add one line to explain the current behavior of Spark SQL when writing Parquet files. All columns are forced to be nullable for compatibility reasons. Author: gatorsmile <gatorsmile@gmail.com> Closes #9314 from gatorsmile/lossNull.	2015-11-09 16:06:48 -08:00
Rishabh Bhardwaj	b7720fa455	[SPARK-11548][DOCS] Replaced example code in mllib-collaborative-filtering.md using include_example Kindly review the changes. Author: Rishabh Bhardwaj <rbnext29@gmail.com> Closes #9519 from rishabhbhardwaj/SPARK-11337.	2015-11-09 14:27:36 -08:00
sachin aggarwal	51d41e4b1a	[SPARK-11552][DOCS][Replaced example code in ml-decision-tree.md using include_example] I have tested it on my local, it is working fine, please review Author: sachin aggarwal <different.sachin@gmail.com> Closes #9539 from agsachin/SPARK-11552-real.	2015-11-09 14:25:42 -08:00
Bharat Lal	860ea0d386	[SPARK-11581][DOCS] Example mllib code in documentation incorrectly computes MSE Author: Bharat Lal <bharat.iisc@gmail.com> Closes #9560 from bharatl/SPARK-11581.	2015-11-09 11:33:01 -08:00
chriskang90	874cd66d4b	[DOCS] Fix typo for Python section on unifying Kafka streams 1) kafkaStreams is a list. The list should be unpacked when passing it into the streaming context union method, which accepts a variable number of streams. 2) print() should be pprint() for pyspark. This contribution is my original work, and I license the work to the project under the project's open source license. Author: chriskang90 <jckang@uchicago.edu> Closes #9545 from c-kang/streaming_python_typo.	2015-11-09 19:39:22 +01:00
Yanbo Liang	d50a66cc04	[SPARK-10689][ML][DOC] User guide and example code for AFTSurvivalRegression Add user guide and example code for ```AFTSurvivalRegression```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9491 from yanboliang/spark-10689.	2015-11-09 08:57:29 -08:00
Rohit Agarwal	b541b31630	[DOC][MINOR][SQL] Fix internal link It doesn't show up as a hyperlink currently. It will show up as a hyperlink after this change. Author: Rohit Agarwal <mindprince@gmail.com> Closes #9544 from mindprince/patch-2.	2015-11-09 13:28:00 +01:00
xin Wu	26739059bc	[SPARK-10046][SQL] Hive warehouse dir not set in current directory when not … Doc change to align with HiveConf default in terms of where to create `warehouse` directory. Author: xin Wu <xinwu@us.ibm.com> Closes #9365 from xwu0226/spark-10046-commit.	2015-11-08 12:28:19 -08:00
Rohit Agarwal	5c4e6d7ec9	[DOC][SQL] Remove redundant out-of-place python snippet This snippet seems to be mistakenly introduced at two places in #5348. Author: Rohit Agarwal <mindprince@gmail.com> Closes #9540 from mindprince/patch-1.	2015-11-08 14:24:26 +00:00
Sean Owen	d981902101	[SPARK-11476][DOCS] Incorrect function referred to in MLib Random data generation documentation Fix Python example to use normalRDD as advertised Author: Sean Owen <sowen@cloudera.com> Closes #9529 from srowen/SPARK-11476.	2015-11-08 11:15:58 +00:00
Yanbo Liang	72634f27e3	[MINOR][ML][DOC] Rename weights to coefficients in user guide We should use ```coefficients``` rather than ```weights``` in user guide that freshman can get the right conventional name at the outset. mengxr vectorijk Author: Yanbo Liang <ybliang8@gmail.com> Closes #9493 from yanboliang/docs-coefficients.	2015-11-05 08:59:06 -08:00
Josh Rosen	ce5e6a2849	[SPARK-11491] Update build to use Scala 2.10.5 Spark should build against Scala 2.10.5, since that includes a fix for Scaladoc that will fix doc snapshot publishing: https://issues.scala-lang.org/browse/SI-8479 Author: Josh Rosen <joshrosen@databricks.com> Closes #9450 from JoshRosen/upgrade-to-scala-2.10.5.	2015-11-04 16:58:38 -08:00
Wenchen Fan	e0fc9c7e59	[SPARK-11197][SQL] add doc for run SQL on files directly Author: Wenchen Fan <wenchen@databricks.com> Closes #9467 from cloud-fan/doc.	2015-11-04 09:33:30 -08:00
Xusen Yin	9b214cea89	[SPARK-11443] Reserve space lines The trim_codeblock(lines) function in include_example.rb removes some blank lines in the code. Author: Xusen Yin <yinxusen@gmail.com> Closes #9400 from yinxusen/SPARK-11443.	2015-11-04 08:36:55 -08:00
Pravin Gadakh	820064e613	[SPARK-11380][DOCS] Replace example code in mllib-frequent-pattern-mining.md using include_example Author: Pravin Gadakh <pravingadakh177@gmail.com> Author: Pravin Gadakh <prgadakh@in.ibm.com> Closes #9340 from pravingadakh/SPARK-11380.	2015-11-04 08:32:08 -08:00
lewuathe	d648a4ad54	[DOC] Missing link to R DataFrame API doc Author: lewuathe <lewuathe@me.com> Author: Lewuathe <lewuathe@me.com> Closes #9394 from Lewuathe/missing-link-to-R-dataframe.	2015-11-03 16:38:22 -08:00
felixcheung	a9676cc710	[SPARK-11407][SPARKR] Add doc for running from RStudio ![image](https://cloud.githubusercontent.com/assets/8969467/10871746/612ba44a-80a4-11e5-99a0-40b9931dee52.png) (This is without css, but you get the idea) shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #9401 from felixcheung/rstudioprogrammingguide.	2015-11-03 11:53:10 -08:00
Rishabh Bhardwaj	2804674a7a	[SPARK-11383][DOCS] Replaced example code in mllib-naive-bayes.md/mllib-isotonic-regression.md using include_example I have made the required changes in mllib-naive-bayes.md/mllib-isotonic-regression.md and also verified them. Kindle Review it. Author: Rishabh Bhardwaj <rbnext29@gmail.com> Closes #9353 from rishabhbhardwaj/SPARK-11383.	2015-11-02 14:03:50 -08:00
Sean Owen	643c49c75e	[SPARK-11305][DOCS] Remove Third-Party Hadoop Distributions Doc Page Remove Hadoop third party distro page, and move Hadoop cluster config info to configuration page CC pwendell Author: Sean Owen <sowen@cloudera.com> Closes #9298 from srowen/SPARK-11305.	2015-11-01 12:25:49 +00:00
felixcheung	bb5a2af034	[SPARK-11340][SPARKR] Support setting driver properties when starting Spark from R programmatically or from RStudio Mapping spark.driver.memory from sparkEnvir to spark-submit commandline arguments. shivaram suggested that we possibly add other spark.driver.* properties - do we want to add all of those? I thought those could be set in SparkConf? sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #9290 from felixcheung/rdrivermem.	2015-10-30 13:51:32 -07:00
tedyu	f304f9c9a1	[SPARK-11318] Include hive profile in make-distribution.sh command Author: tedyu <yuzhihong@gmail.com> Closes #9281 from tedyu/master.	2015-10-29 15:02:13 +01:00
Mageswaran.D	fd9e345cee	Typo in mllib-evaluation-metrics.md Recall by threshold snippet was using "precisionByThreshold" Author: Mageswaran.D <mageswaran1989@gmail.com> Closes #9333 from Mageswaran1989/Typo_in_mllib-evaluation-metrics.md.	2015-10-28 08:46:30 -07:00
Xusen Yin	d77d198fcc	[SPARK-11297] Add new code tags mengxr https://issues.apache.org/jira/browse/SPARK-11297 Add new code tags to hold the same look and feel with previous documents. Author: Xusen Yin <yinxusen@gmail.com> Closes #9265 from yinxusen/SPARK-11297.	2015-10-26 23:53:41 -07:00
Xusen Yin	943d4fa204	[SPARK-11289][DOC] Substitute code examples in ML features extractors with include_example mengxr https://issues.apache.org/jira/browse/SPARK-11289 I make some changes in ML feature extractors. I.e. TF-IDF, Word2Vec, and CountVectorizer. I add new example code in spark/examples, hope it is the right place to add those examples. Author: Xusen Yin <yinxusen@gmail.com> Closes #9266 from yinxusen/SPARK-11289.	2015-10-26 21:17:53 -07:00
Josh Rosen	b67dc6a434	[SPARK-11299][DOC] Fix link to Scala DataFrame Functions reference The SQL programming guide's link to the DataFrame functions reference points to the wrong location; this patch fixes that. Author: Josh Rosen <joshrosen@databricks.com> Closes #9269 from JoshRosen/SPARK-11299.	2015-10-25 10:31:44 +01:00
Sun Rui	2462dbcce8	[SPARK-10971][SPARKR] RRunner should allow setting path to Rscript. Add a new spark conf option "spark.sparkr.r.driver.command" to specify the executable for an R script in client modes. The existing spark conf option "spark.sparkr.r.command" is used to specify the executable for an R script in cluster modes for both driver and workers. See also [launch R worker script](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L395). BTW, [envrionment variable "SPARKR_DRIVER_R"](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L275) is used to locate R shell on the local host. For your information, PYSPARK has two environment variables serving simliar purpose: PYSPARK_PYTHON Python binary executable to use for PySpark in both driver and workers (default is `python`). PYSPARK_DRIVER_PYTHON Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON). pySpark use the code [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L41) to determine the python executable for a python script. Author: Sun Rui <rui.sun@intel.com> Closes #9179 from sun-rui/SPARK-10971.	2015-10-23 21:38:04 -07:00
Xusen Yin	03ccb22080	[SPARK-10382] Make example code in user guide testable A POC code for making example code in user guide testable. mengxr We still need to talk about the labels in code. Author: Xusen Yin <yinxusen@gmail.com> Closes #9109 from yinxusen/SPARK-10382.	2015-10-23 08:31:01 -07:00
Rohan Bhanderi	16dc9f344c	Fix typo "Received" to "Receiver" in streaming-kafka-integration.md Removed typo on line 8 in markdown : "Received" -> "Receiver" Author: Rohan Bhanderi <rohan.bhanderi@sjsu.edu> Closes #9242 from RohanBhanderi/patch-1.	2015-10-23 01:10:46 -07:00
Josh Rosen	f6d06adf05	[SPARK-10708] Consolidate sort shuffle implementations There's a lot of duplication between SortShuffleManager and UnsafeShuffleManager. Given that these now provide the same set of functionality, now that UnsafeShuffleManager supports large records, I think that we should replace SortShuffleManager's serialized shuffle implementation with UnsafeShuffleManager's and should merge the two managers together. Author: Josh Rosen <joshrosen@databricks.com> Closes #8829 from JoshRosen/consolidate-sort-shuffle-implementations.	2015-10-22 09:46:30 -07:00
vundela	2f6dd634c1	[SPARK-11105] [YARN] Distribute log4j.properties to executors Currently log4j.properties file is not uploaded to executor's which is leading them to use the default values. This fix will make sure that file is always uploaded to distributed cache so that executor will use the latest settings. If user specifies log configurations through --files then executors will be picking configs from --files instead of $SPARK_CONF_DIR/log4j.properties Author: vundela <vsr@cloudera.com> Author: Srinivasa Reddy Vundela <vsr@cloudera.com> Closes #9118 from vundela/master.	2015-10-20 11:12:28 -07:00
Lukasz Piepiora	a112d69fdc	[SPARK-11174] [DOCS] Fix typo in the GraphX programming guide This patch fixes a small typo in the GraphX programming guide Author: Lukasz Piepiora <lpiepiora@gmail.com> Closes #9160 from lpiepiora/11174-fix-typo-in-graphx-programming-guide.	2015-10-18 14:25:57 +01:00
Britta Weber	723aa75a9d	fix typo bellow -> below Author: Britta Weber <britta.weber@elasticsearch.com> Closes #9136 from brwe/typo-bellow.	2015-10-15 14:47:11 -07:00
Nick Pritchard	b591de7c07	[SPARK-11039][Documentation][Web UI] Document additional ui configurations Add documentation for configuration: - spark.sql.ui.retainedExecutions - spark.streaming.ui.retainedBatches Author: Nick Pritchard <nicholas.pritchard@falkonry.com> Closes #9052 from pnpritchard/SPARK-11039.	2015-10-15 12:45:37 -07:00
Andrew Or	b3ffac5178	[SPARK-10983] Unified memory manager This patch unifies the memory management of the storage and execution regions such that either side can borrow memory from each other. When memory pressure arises, storage will be evicted in favor of execution. To avoid regressions in cases where storage is crucial, we dynamically allocate a fraction of space for storage that execution cannot evict. Several configurations are introduced: - spark.memory.fraction (default 0.75): fraction of the heap space used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records. - spark.memory.storageFraction (default 0.5): size of the storage region within the space set aside by `spark.memory.fraction`. Cached data may only be evicted if total storage exceeds this region. - spark.memory.useLegacyMode (default false): whether to use the memory management that existed in Spark 1.5 and before. This is mainly for backward compatibility. For a detailed description of the design, see [SPARK-10000](https://issues.apache.org/jira/browse/SPARK-10000). This patch builds on top of the `MemoryManager` interface introduced in #9000. Author: Andrew Or <andrew@databricks.com> Closes #9084 from andrewor14/unified-memory-manager.	2015-10-13 13:49:59 -07:00
jerryshao	f97e9323b5	[SPARK-10739] [YARN] Add application attempt window for Spark on Yarn Add application attempt window for Spark on Yarn to ignore old out of window failures, this is useful for long running applications to recover from failures. Author: jerryshao <sshao@hortonworks.com> Closes #8857 from jerryshao/SPARK-10739 and squashes the following commits: 36eabdc [jerryshao] change the doc 7f9b77d [jerryshao] Style change 1c9afd0 [jerryshao] Address the comments caca695 [jerryshao] Add application attempt window for Spark on Yarn	2015-10-12 18:18:19 -07:00
Kay Ousterhout	091c2c3ecd	[SPARK-11056] Improve documentation of SBT build. This commit improves the documentation around building Spark to (1) recommend using SBT interactive mode to avoid the overhead of launching SBT and (2) refer to the wiki page that documents using SPARK_PREPEND_CLASSES to avoid creating the assembly jar for each compile. cc srowen Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #9068 from kayousterhout/SPARK-11056.	2015-10-12 14:23:29 -07:00
Jean-Baptiste Onofré	60150cf00a	[SPARK-10883] Add a note about how to build Spark sub-modules (reactor) Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #8993 from jbonofre/SPARK-10883-2.	2015-10-08 11:38:39 +01:00
admackin	cd28139c9b	Akka framesize units should be specified 1.4 docs noted that the units were MB - i have assumed this is still the case Author: admackin <admackin@users.noreply.github.com> Closes #9025 from admackin/master.	2015-10-08 00:01:23 -07:00
Xin Ren	27cdde2ff8	[SPARK-10669] [DOCS] Link to each language's API in codetabs in ML docs: spark.mllib In the Markdown docs for the spark.mllib Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "ChiSqSelector" section in `64743870f2/docs/mllib-feature-extraction.md` This JIRA is just for spark.mllib, not spark.ml. Please let me know if more work is needed, thanks a lot. Author: Xin Ren <iamshrek@126.com> Closes #8977 from keypointt/SPARK-10669.	2015-10-07 15:00:19 +01:00
Sean Owen	82bbc2a5f2	[SPARK-9570] [DOCS] Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'. Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs. Follow-on to https://github.com/apache/spark/pull/8385 CC nssalian Author: Sean Owen <sowen@cloudera.com> Closes #8968 from srowen/SPARK-9570.	2015-10-04 09:31:52 +01:00
Yuhao Yang	9b9fe5f7bf	[SPARK-10670] [ML] [Doc] add api reference for ml doc jira: https://issues.apache.org/jira/browse/SPARK-10670 In the Markdown docs for the spark.ml Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "Word2Vec" section in `64743870f2/docs/ml-features.md` This JIRA is just for spark.ml, not spark.mllib Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8901 from hhbyyh/docAPI.	2015-09-28 22:40:02 -07:00
David Martin	b58249930d	Fix two mistakes in programming-guide page seperate -> separate sees -> see Author: David Martin <dmartinpro@users.noreply.github.com> Closes #8928 from dmartinpro/patch-1.	2015-09-28 10:41:39 +01:00
Bin Wang	fb4c7be747	add doc for spark.streaming.stopGracefullyOnShutdown Author: Bin Wang <wbin00@gmail.com> Closes #8898 from wb14123/doc.	2015-09-27 21:26:54 +01:00
Matt Hagen	558e9c7e60	[SPARK-10663] Removed unnecessary invocation of DataFrame.toDF method. The Scala example under the "Example: Pipeline" heading in this document initializes the "test" variable to a DataFrame. Because test is already a DF, there is not need to call test.toDF as the example does in a subsequent line: model.transform(test.toDF). So, I removed the extraneous toDF invocation. Author: Matt Hagen <anonz3000@gmail.com> Closes #8875 from hagenhaus/SPARK-10663.	2015-09-22 21:14:25 -07:00
Akash Mishra	0bd0e5bed2	[SPARK-10695] [DOCUMENTATION] [MESOS] Fixing incorrect value informati… …on for spark.mesos.constraints parameter. Author: Akash Mishra <akash.mishra20@gmail.com> Closes #8816 from SleepyThread/constraint-fix.	2015-09-22 00:14:27 -07:00
Marcelo Vanzin	97a99dde6e	[SPARK-10676] [DOCS] Add documentation for SASL encryption options. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8803 from vanzin/SPARK-10676.	2015-09-21 13:15:44 -07:00
Jacek Laskowski	ca9fe540fe	[SPARK-10662] [DOCS] Code snippets are not properly formatted in tables * Backticks are processed properly in Spark Properties table * Removed unnecessary spaces * See http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/running-on-yarn.html Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8795 from jaceklaskowski/docs-yarn-formatting.	2015-09-21 19:46:39 +01:00
Josh Rosen	2117eea71e	[SPARK-10710] Remove ability to disable spilling in core and SQL It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`. This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling. Author: Josh Rosen <joshrosen@databricks.com> Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.	2015-09-19 21:40:21 -07:00
Alexis Seigneurin	d83b6aae8b	Fixed links to the API Submitting this change on the master branch as requested in https://github.com/apache/spark/pull/8819#issuecomment-141505941 Author: Alexis Seigneurin <alexis.seigneurin@gmail.com> Closes #8838 from aseigneurin/patch-2.	2015-09-19 12:01:22 +01:00
Kousuke Saruta	d507f9c0b7	[SPARK-10584] [SQL] [DOC] Documentation about the compatible Hive version is wrong. In Spark 1.5.0, Spark SQL is compatible with Hive 0.12.0 through 1.2.1 but the documentation is wrong. /CC yhuai Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8776 from sarutak/SPARK-10584-2.	2015-09-19 01:59:36 -07:00
Reynold Xin	348d7c9a93	[SPARK-9808] Remove hash shuffle file consolidation. Author: Reynold Xin <rxin@databricks.com> Closes #8812 from rxin/SPARK-9808-1.	2015-09-18 13:48:41 -07:00
Reynold Xin	74d8f7dda8	Added <code> tag to documentation.	2015-09-17 22:46:13 -07:00
Felix Bechstein	9a56dcdf7f	docs/running-on-mesos.md: state default values in default column This PR simply uses the default value column for defaults. Author: Felix Bechstein <felix.bechstein@otto.de> Closes #8810 from felixb/fix_mesos_doc.	2015-09-17 22:42:46 -07:00
Michael Armbrust	e0dc2bc232	[SPARK-10650] Clean before building docs The [published docs for 1.5.0](http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/) have a bunch of test classes in them. The only way I can reproduce this is to `test:compile` before running `unidoc`. To prevent this from happening again, I've added a clean before doc generation. Author: Michael Armbrust <michael@databricks.com> Closes #8787 from marmbrus/testsInDocs.	2015-09-17 11:05:30 -07:00
yangping.wu	c88bb5df94	[SPARK-10660] Doc describe error in the "Running Spark on YARN" page In the Configuration section, the spark.yarn.driver.memoryOverhead and spark.yarn.am.memoryOverhead‘s default value should be "driverMemory * 0.10, with minimum of 384" and "AM memory * 0.10, with minimum of 384" respectively. Because from Spark 1.4.0, the MEMORY_OVERHEAD_FACTOR is set to 0.1.0, not 0.07. Author: yangping.wu <wyphao.2007@163.com> Closes #8797 from 397090770/SparkOnYarnDocError.	2015-09-17 09:52:40 -07:00
Joseph K. Bradley	b921fe4dc0	[SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups Various ML guide cleanups. * ml-guide.md: Make it easier to access the algorithm-specific guides. * LDA user guide: EM often begins with useless topics, but running longer generally improves them dramatically. E.g., 10 iterations on a Wikipedia dataset produces useless topics, but 50 iterations produces very meaningful topics. * mllib-feature-extraction.html#elementwiseproduct: “w” parameter should be “scalingVec” * Clean up Binarizer user guide a little. * Document in Pipeline that users should not put an instance into the Pipeline in more than 1 place. * spark.ml Word2Vec user guide: clean up grammar/writing * Chi Sq Feature Selector docs: Improve text in doc. CC: mengxr feynmanliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #8752 from jkbradley/mlguide-fixes-1.5.	2015-09-15 19:43:26 -07:00
Jacek Laskowski	416003b264	[DOCS] Small fixes to Spark on Yarn doc * a follow-up to `16b6d18613` as `--num-executors` flag is not suppported. * links + formatting Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8762 from jaceklaskowski/docs-spark-on-yarn.	2015-09-15 20:42:33 +01:00
Reynold Xin	09b7e7c198	Update version to 1.6.0-SNAPSHOT. Author: Reynold Xin <rxin@databricks.com> Closes #8350 from rxin/1.6.	2015-09-15 00:54:20 -07:00
Jacek Laskowski	833be73314	Small fixes to docs Links work now properly + consistent use of Spark standalone cluster (Spark uppercase + lowercase the rest -- seems agreed in the other places in the docs). Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8759 from jaceklaskowski/docs-submitting-apps.	2015-09-14 23:40:29 -07:00
Kousuke Saruta	cf2821ef5f	[SPARK-10584] [DOC] [SQL] Documentation about spark.sql.hive.metastore.version is wrong. The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1. Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8739 from sarutak/SPARK-10584.	2015-09-14 12:06:23 -07:00
Sean Owen	1dc614b874	[SPARK-10222] [GRAPHX] [DOCS] More thoroughly deprecate Bagel in favor of GraphX Finish deprecating Bagel; remove reference to nonexistent example Author: Sean Owen <sowen@cloudera.com> Closes #8731 from srowen/SPARK-10222.	2015-09-13 08:36:46 +01:00
y-shimizu	c268ca4ddd	[SPARK-10518] [DOCS] Update code examples in spark.ml user guide to use LIBSVM data source instead of MLUtils I fixed to use LIBSVM data source in the example code in spark.ml instead of MLUtils Author: y-shimizu <y.shimizu0429@gmail.com> Closes #8697 from y-shimizu/SPARK-10518.	2015-09-11 08:27:30 -07:00
Akash Mishra	a5ef2d0600	[SPARK-10514] [MESOS] waiting for min no of total cores acquired by Spark by implementing the sufficientResourcesRegistered method spark.scheduler.minRegisteredResourcesRatio configuration parameter works for YARN mode but not for Mesos Coarse grained mode. If the parameter specified default value of 0 will be set for spark.scheduler.minRegisteredResourcesRatio in base class and this method will always return true. There are no existing test for YARN mode too. Hence not added test for the same. Author: Akash Mishra <akash.mishra20@gmail.com> Closes #8672 from SleepyThread/master.	2015-09-10 12:04:02 -07:00
Holden Karau	a76bde9dae	[SPARK-10469] [DOC] Try and document the three options From JIRA: Add documentation for tungsten-sort. From the mailing list "I saw a new "spark.shuffle.manager=tungsten-sort" implemented in https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its corresponding description in http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty there are only 'sort' and 'hash' two options)." Author: Holden Karau <holden@pigscanfly.ca> Closes #8638 from holdenk/SPARK-10469-document-tungsten-sort.	2015-09-10 11:49:53 -07:00
Sean Paradiso	1dc7548c59	[MINOR] [MLLIB] [ML] [DOC] fixed typo: label for negative result should be 0.0 (original: 1.0) Small typo in the example for `LabelledPoint` in the MLLib docs. Author: Sean Paradiso <seanparadiso@gmail.com> Closes #8680 from sparadiso/docs_mllib_smalltypo.	2015-09-09 22:09:33 -07:00
Yuhao Yang	91a577d277	[SPARK-10249] [ML] [DOC] Add Python Code Example to StopWordsRemover User Guide jira: https://issues.apache.org/jira/browse/SPARK-10249 update user guide since python support added. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8620 from hhbyyh/swPyDocExample.	2015-09-08 22:33:23 -07:00
Tathagata Das	52b24a602a	[SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about rate limiting and backpressure Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8656 from tdas/SPARK-10492 and squashes the following commits: 986cdd6 [Tathagata Das] Added information on backpressure	2015-09-08 14:54:43 -07:00
Jacek Laskowski	6ceed852ab	Docs small fixes Author: Jacek Laskowski <jacek@japila.pl> Closes #8629 from jaceklaskowski/docs-fixes.	2015-09-08 14:38:10 +01:00
Stephen Hopper	9d8e838d88	[DOC] Added R to the list of languages with "high-level API" support in the… … main README. Author: Stephen Hopper <shopper@shopper-osx.local> Closes #8646 from enragedginger/master.	2015-09-08 14:36:34 +01:00
Reynold Xin	5ffe752b59	[SPARK-9767] Remove ConnectionManager. We introduced the Netty network module for shuffle in Spark 1.2, and has turned it on by default for 3 releases. The old ConnectionManager is difficult to maintain. If we merge the patch now, by the time it is released, it would be 1 yr for which ConnectionManager is off by default. It's time to remove it. Author: Reynold Xin <rxin@databricks.com> Closes #8161 from rxin/SPARK-9767.	2015-09-07 10:42:30 -10:00
Tathagata Das	7a4f326c00	[SPARK-10440] [STREAMING] [DOCS] Update python API stuff in the programming guides and python docs - Fixed information around Python API tags in streaming programming guides - Added missing stuff in python docs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8595 from tdas/SPARK-10440.	2015-09-04 23:16:39 -10:00
Timothy Chen	b087d23e28	[SPARK-9669] [MESOS] Support PySpark on Mesos cluster mode. Support running pyspark with cluster mode on Mesos! This doesn't upload any scripts, so if running in a remote Mesos requires the user to specify the script from a available URI. Author: Timothy Chen <tnachen@gmail.com> Closes #8349 from tnachen/mesos_python.	2015-09-04 15:21:31 -07:00
Tom Graves	49aff7b9ad	[SPARK-10432] spark.port.maxRetries documentation is unclear Author: Tom Graves <tgraves@yahoo-inc.com> Closes #8585 from tgravescs/SPARK-10432.	2015-09-03 13:46:16 -07:00
zhuol	ec01280533	[SPARK-4223] [CORE] Support * in acls. SPARK-4223. Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access. Manual tests to verify that: "*" works for any user in: a. Spark ui: view and kill stage. Done. b. Spark history server. Done. c. Yarn application killing. Done. Author: zhuol <zhuol@yahoo-inc.com> Closes #8398 from zhuoliu/4223.	2015-09-01 11:14:59 -10:00
Sean Owen	3f63bd6023	[SPARK-10398] [DOCS] Migrate Spark download page to use new lua mirroring scripts Migrate Apache download closer.cgi refs to new closer.lua This is the bit of the change that affects the project docs; I'm implementing the changes to the Apache site separately. Author: Sean Owen <sowen@cloudera.com> Closes #8557 from srowen/SPARK-10398.	2015-09-01 20:06:01 +01:00
Xiangrui Meng	ca69fc8efd	[SPARK-10331] [MLLIB] Update example code in ml-guide * The example code was added in 1.2, before `createDataFrame`. This PR switches to `createDataFrame`. Java code still uses JavaBean. * assume `sqlContext` is available * fix some minor issues from previous code review jkbradley srowen feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8518 from mengxr/SPARK-10331.	2015-08-29 23:57:09 -07:00
Xiangrui Meng	905fbe498b	[SPARK-10348] [MLLIB] updates ml-guide * replace `ML Dataset` by `DataFrame` to unify the abstraction * ML algorithms -> pipeline components to describe the main concept * remove Scala API doc links from the main guide * `Section Title` -> `Section tile` to be consistent with other section titles in MLlib guide * modified lines break at 100 chars or periods jkbradley feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8517 from mengxr/SPARK-10348.	2015-08-29 23:26:23 -07:00
GuoQiang Li	5369be8068	[SPARK-10350] [DOC] [SQL] Removed duplicated option description from SQL guide Author: GuoQiang Li <witgo@qq.com> Closes #8520 from witgo/SPARK-10350.	2015-08-29 13:20:27 -07:00
martinzapletal	e8ea5bafee	[SPARK-9910] [ML] User guide for train validation split Author: martinzapletal <zapletal-martin@email.cz> Closes #8377 from zapletal-martin/SPARK-9910.	2015-08-28 21:03:48 -07:00
Xiangrui Meng	88032ecaf0	[SPARK-9671] [MLLIB] re-org user guide and add migration guide This PR updates the MLlib user guide and adds migration guide for 1.4->1.5. * merge migration guide for `spark.mllib` and `spark.ml` packages * remove dependency section from `spark.ml` guide * move the paragraph about `spark.mllib` and `spark.ml` to the top and recommend `spark.ml` * move Sam's talk to footnote to make the section focus on dependencies Minor changes to code examples and other wording will be in a separate PR. jkbradley srowen feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8498 from mengxr/SPARK-9671.	2015-08-28 13:53:31 -07:00
Yuhao Yang	e2a843090c	[SPARK-9890] [DOC] [ML] User guide for CountVectorizer jira: https://issues.apache.org/jira/browse/SPARK-9890 document with Scala and java examples Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8487 from hhbyyh/cvDoc.	2015-08-28 08:00:44 -07:00
Keiji Yoshida	18294cd871	Fix DynamodDB/DynamoDB typo in Kinesis Integration doc Fix DynamodDB/DynamoDB typo in Kinesis Integration doc Author: Keiji Yoshida <yoshida.keiji.84@gmail.com> Closes #8501 from yosssi/patch-1.	2015-08-28 09:36:50 +01:00
Feynman Liang	af0e1249b1	[SPARK-9905] [ML] [DOC] Adds LinearRegressionSummary user guide * Adds user guide for `LinearRegressionSummary` * Fixes unresolved issues in #8197 CC jkbradley mengxr Author: Feynman Liang <fliang@databricks.com> Closes #8491 from feynmanliang/SPARK-9905.	2015-08-27 21:55:20 -07:00
MechCoder	30734d45fb	[SPARK-9911] [DOC] [ML] Update Userguide for Evaluator I added a small note about the different types of evaluator and the metrics used. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #8304 from MechCoder/multiclass_evaluator.	2015-08-27 21:44:06 -07:00
Yin Huai	b3dd569ad4	[SPARK-10287] [SQL] Fixes JSONRelation refreshing on read path https://issues.apache.org/jira/browse/SPARK-10287 After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet). Author: Yin Huai <yhuai@databricks.com> Closes #8469 from yhuai/jsonRefresh.	2015-08-27 16:11:25 -07:00
Feynman Liang	5bfe9e1111	[SPARK-9680] [MLLIB] [DOC] StopWordsRemovers user guide and Java compatibility test * Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine * Cleans up scaladocs for public methods * Adds test for Java compatibility * Follow up Python user guide code example is tracked by SPARK-10249 Author: Feynman Liang <fliang@databricks.com> Closes #8436 from feynmanliang/SPARK-10230.	2015-08-27 16:10:37 -07:00
MechCoder	c94ecdfc5b	[SPARK-9906] [ML] User guide for LogisticRegressionSummary User guide for LogisticRegression summaries Author: MechCoder <manojkumarsivaraj334@gmail.com> Author: Manoj Kumar <mks542@nyu.edu> Author: Feynman Liang <fliang@databricks.com> Closes #8197 from MechCoder/log_summary_user_guide.	2015-08-27 15:33:43 -07:00
Yuhao Yang	6185cdd2af	[SPARK-9901] User guide for RowMatrix Tall-and-skinny QR jira: https://issues.apache.org/jira/browse/SPARK-9901 The jira covers only the document update. I can further provide example code for QR (like the ones for SVD and PCA) in a separate PR. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8462 from hhbyyh/qrDoc.	2015-08-27 13:57:20 -07:00
CodingCat	84baa5e9b5	[SPARK-10315] remove document on spark.akka.failure-detector.threshold https://issues.apache.org/jira/browse/SPARK-10315 this parameter is not used any longer and there is some mistake in the current document , should be 'akka.remote.watch-failure-detector.threshold' Author: CodingCat <zhunansjtu@gmail.com> Closes #8483 from CodingCat/SPARK_10315.	2015-08-27 20:19:09 +01:00
Michael Armbrust	dc86a227e4	[SPARK-9148] [SPARK-10252] [SQL] Update SQL Programming Guide Author: Michael Armbrust <michael@databricks.com> Closes #8441 from marmbrus/documentation.	2015-08-27 11:45:15 -07:00
Moussa Taifi	9625d13d57	[DOCS] [STREAMING] [KAFKA] Fix typo in exactly once semantics Fix Typo in exactly once semantics [Semantics of output operations] link Author: Moussa Taifi <moutai10@gmail.com> Closes #8468 from moutai/patch-3.	2015-08-27 10:34:47 +01:00
Cheng Lian	0fac144f6b	[SPARK-9424] [SQL] Parquet programming guide updates for 1.5 Author: Cheng Lian <lian@databricks.com> Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.	2015-08-26 18:14:54 -07:00
Feynman Liang	125205cdb3	[SPARK-9888] [MLLIB] User guide for new LDA features * Adds two new sections to LDA's user guide; one for each optimizer/model * Documents new features added to LDA (e.g. topXXXperXXX, asymmetric priors, hyperpam optimization) * Cleans up a TODO and sets a default parameter in LDA code jkbradley hhbyyh Author: Feynman Liang <fliang@databricks.com> Closes #8254 from feynmanliang/SPARK-9888.	2015-08-25 17:39:20 -07:00
Yuhao Yang	b37f0cc1b4	[SPARK-8531] [ML] Update ML user guide for MinMaxScaler jira: https://issues.apache.org/jira/browse/SPARK-8531 Update ML user guide for MinMaxScaler Author: Yuhao Yang <hhbyyh@gmail.com> Author: unknown <yuhaoyan@yuhaoyan-MOBL1.ccr.corp.intel.com> Closes #7211 from hhbyyh/minmaxdoc.	2015-08-25 10:54:03 -07:00
Joseph K. Bradley	13db11cb08	[SPARK-10061] [DOC] ML ensemble docs User guide for spark.ml GBTs and Random Forests. The examples are copied from the decision tree guide and modified to run. I caught some issues I had somehow missed in the tree guide as well. I have run all examples, including Java ones. (Of course, I thought I had previously as well...) CC: mengxr manishamde yanboliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #8369 from jkbradley/ml-ensemble-docs.	2015-08-24 15:38:54 -07:00
Keiji Yoshida	623c675fde	Update streaming-programming-guide.md Update `See the Scala example` to `See the Java example`. Author: Keiji Yoshida <yoshida.keiji.84@gmail.com> Closes #8376 from yosssi/patch-1.	2015-08-23 11:04:29 +01:00
Keiji Yoshida	46fcb9e0db	Update programming-guide.md Update `lineLengths.persist();` to `lineLengths.persist(StorageLevel.MEMORY_ONLY());` because `JavaRDD#persist` needs a parameter of `StorageLevel`. Author: Keiji Yoshida <yoshida.keiji.84@gmail.com> Closes #8372 from yosssi/patch-1.	2015-08-22 02:38:10 -07:00
Xusen Yin	630a994e6a	[SPARK-9893] User guide with Java test suite for VectorSlicer Add user guide for `VectorSlicer`, with Java test suite and Python version VectorSlicer. Note that Python version does not support selecting by names now. Author: Xusen Yin <yinxusen@gmail.com> Closes #8267 from yinxusen/SPARK-9893.	2015-08-21 16:30:12 -07:00
Alexander Ulanov	dcfe0c5cde	[SPARK-9846] [DOCS] User guide for Multilayer Perceptron Classifier Added user guide for multilayer perceptron classifier: - Simplified description of the multilayer perceptron classifier - Example code for Scala and Java Author: Alexander Ulanov <nashb@yandex.ru> Closes #8262 from avulanov/SPARK-9846-mlpc-docs.	2015-08-20 20:02:27 -07:00
Eric Liang	8e0a072f78	[SPARK-9895] User Guide for RFormula Feature Transformer mengxr Author: Eric Liang <ekl@databricks.com> Closes #8293 from ericl/docs-2.	2015-08-19 15:43:08 -07:00
Marcelo Vanzin	5fd53c64bb	[SPARK-9833] [YARN] Add options to disable delegation token retrieval. This allows skipping the code that tries to talk to Hive and HBase to fetch delegation tokens, in case that somehow conflicts with the application being run. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8134 from vanzin/SPARK-9833.	2015-08-19 10:51:59 -07:00
Yanbo Liang	802b5b8791	[SPARK-10084] [MLLIB] [DOC] Add Python example for mllib FP-growth user guide 1, Add Python example for mllib FP-growth user guide. 2, Correct mistakes of Scala and Java examples. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8279 from yanboliang/spark-10084.	2015-08-19 08:53:34 -07:00
Joseph K. Bradley	39e4ebd521	[SPARK-10060] [ML] [DOC] spark.ml DecisionTree user guide New user guide section ml-decision-tree.md, including code examples. I have run all examples, including the Java ones. CC: manishamde yanboliang mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #8244 from jkbradley/ml-dt-docs.	2015-08-19 07:38:27 -07:00
lewuathe	ba2a07e2b6	[SPARK-9977] [DOCS] Update documentation for StringIndexer By using `StringIndexer`, we can obtain indexed label on new column. So a following estimator should use this new column through pipeline if it wants to use string indexed label. I think it is better to make it explicit on documentation. Author: lewuathe <lewuathe@me.com> Closes #8205 from Lewuathe/SPARK-9977.	2015-08-19 09:54:03 +01:00
Sean Owen	f141efeafb	[SPARK-10070] [DOCS] Remove Guava dependencies in user guides `Lists.newArrayList` -> `Arrays.asList` CC jkbradley feynmanliang Anybody into replacing usages of `Lists.newArrayList` in the examples / source code too? this method isn't useful in Java 7 and beyond. Author: Sean Owen <sowen@cloudera.com> Closes #8272 from srowen/SPARK-10070.	2015-08-19 09:41:09 +01:00
Bill Chambers	b23c4d3ffc	Fix Broken Link Link was broken because it included tick marks. Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #8302 from anabranch/patch-1.	2015-08-19 00:05:01 -07:00
Alexander Ulanov	1c843e2848	[SPARK-9508] GraphX Pregel docs update with new Pregel code SPARK-9436 simplifies the Pregel code. graphx-programming-guide needs to be modified accordingly since it lists the old Pregel code Author: Alexander Ulanov <nashb@yandex.ru> Closes #7831 from avulanov/SPARK-9508-pregel-doc2.	2015-08-18 22:13:52 -07:00
Davies Liu	de3223872a	[SPARK-9705] [DOC] fix docs about Python version cc JoshRosen Author: Davies Liu <davies@databricks.com> Closes #8245 from davies/python_doc.	2015-08-18 22:11:27 -07:00
Feynman Liang	badf7fa650	[SPARK-8473] [SPARK-9889] [ML] User guide and example code for DCT mengxr jkbradley Author: Feynman Liang <fliang@databricks.com> Closes #8184 from feynmanliang/SPARK-9889-DCT-docs.	2015-08-18 17:54:49 -07:00
Dennis Huo	9b731fad2b	[SPARK-9782] [YARN] Support YARN application tags via SparkConf Add a new test case in yarn/ClientSuite which checks how the various SparkConf and ClientArguments propagate into the ApplicationSubmissionContext. Author: Dennis Huo <dhuo@google.com> Closes #8072 from dennishuo/dhuo-yarn-application-tags.	2015-08-18 14:34:20 -07:00
Piotr Migdal	8bae9015b7	[SPARK-10085] [MLLIB] [DOCS] removed unnecessary numpy array import See https://issues.apache.org/jira/browse/SPARK-10085 Author: Piotr Migdal <pmigdal@gmail.com> Closes #8284 from stared/spark-10085.	2015-08-18 12:59:28 -07:00
Yanbo Liang	747c2ba800	[SPARK-10032] [PYSPARK] [DOC] Add Python example for mllib LDAModel user guide Add Python example for mllib LDAModel user guide Author: Yanbo Liang <ybliang8@gmail.com> Closes #8227 from yanboliang/spark-10032.	2015-08-18 12:56:36 -07:00

... 4 5 6 7 8 ...

1758 commits