Commit graph

1150 commits

Author SHA1 Message Date
Deron Eriksson fcbcba66c9 [SPARK-1564] [DOCS] Added Javascript to Javadocs to create badges for tags like :: Experimental ::
Modified copy_api_dirs.rb and created api-javadocs.js and api-javadocs.css files in order to add badges to javadoc files for :: Experimental ::, :: DeveloperApi ::, and :: AlphaComponent :: tags

Author: Deron Eriksson <deron@us.ibm.com>

Closes #7169 from deroneriksson/SPARK-1564_JavaDocs_badges and squashes the following commits:

a8353db [Deron Eriksson] added license headers to api-docs.css and api-javadocs.css
07feb07 [Deron Eriksson] added linebreaks to make jquery more readable when adding html badge tags
65b4930 [Deron Eriksson] Modified copy_api_dirs.rb and created api-javadocs.js and api-javadocs.css files in order to add badges to javadoc files for :: Experimental ::, :: DeveloperApi ::, and :: AlphaComponent :: tags
2015-07-02 13:55:53 -07:00
Yanbo Liang 0a468a46bf [SPARK-8758] [MLLIB] Add Python user guide for PowerIterationClustering
Add Python user guide for PowerIterationClustering

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #7155 from yanboliang/spark-8758 and squashes the following commits:

18d803b [Yanbo Liang] address comments
dd29577 [Yanbo Liang] Add Python user guide for PowerIterationClustering
2015-07-02 09:59:54 -07:00
Ilya Ganelin 3697232b7d [SPARK-3071] Increase default driver memory
I've updated default values in comments, documentation, and in the command line builder to be 1g based on comments in the JIRA. I've also updated most usages to point at a single variable defined in the Utils.scala and JavaUtils.java files. This wasn't possible in all cases (R, shell scripts etc.) but usage in most code is now pointing at the same place.

Please let me know if I've missed anything.

Will the spark-shell use the value within the command line builder during instantiation?

Author: Ilya Ganelin <ilya.ganelin@capitalone.com>

Closes #7132 from ilganeli/SPARK-3071 and squashes the following commits:

4074164 [Ilya Ganelin] String fix
271610b [Ilya Ganelin] Merge branch 'SPARK-3071' of github.com:ilganeli/spark into SPARK-3071
273b6e9 [Ilya Ganelin] Test fix
fd67721 [Ilya Ganelin] Update JavaUtils.java
26cc177 [Ilya Ganelin] test fix
e5db35d [Ilya Ganelin] Fixed test failure
39732a1 [Ilya Ganelin] merge fix
a6f7deb [Ilya Ganelin] Created default value for DRIVER MEM in Utils that's now used in almost all locations instead of setting manually in each
09ad698 [Ilya Ganelin] Update SubmitRestProtocolSuite.scala
19b6f25 [Ilya Ganelin] Missed one doc update
2698a3d [Ilya Ganelin] Updated default value for driver memory
2015-07-01 23:11:02 -07:00
zsxwing 75b9fe4c5f [SPARK-8378] [STREAMING] Add the Python API for Flume
Author: zsxwing <zsxwing@gmail.com>

Closes #6830 from zsxwing/flume-python and squashes the following commits:

78dfdac [zsxwing] Fix the compile error in the test code
f1bf3c0 [zsxwing] Address TD's comments
0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly
e93736b [zsxwing] Fix the test case for determine_modules_to_test
9d5821e [zsxwing] Fix pyspark_core dependencies
f9ee681 [zsxwing] Merge branch 'master' into flume-python
7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py
b96b0de [zsxwing] Merge branch 'master' into flume-python
ce85e83 [zsxwing] Fix incompatible issues for Python 3
01cbb3d [zsxwing] Add import sys
152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3
14ba0ff [zsxwing] Add flume-assembly for sbt building
b8d5551 [zsxwing] Merge branch 'master' into flume-python
4762c34 [zsxwing] Fix the doc
0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API
9f33873 [zsxwing] Add the Python API for Flume
2015-07-01 11:59:24 -07:00
Yuhao Yang 2012913355 [SPARK-8308] [MLLIB] add missing save load for python example
jira: https://issues.apache.org/jira/browse/SPARK-8308

1. add some missing save/load in python examples. , LogisticRegression, LinearRegression and NaiveBayes
2. tune down iterations for MatrixFactorization, since current number will trigger StackOverflow for default java configuration (>1M)

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #6760 from hhbyyh/docUpdate and squashes the following commits:

9bd3383 [Yuhao Yang] update scala example
8a44692 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docUpdate
077cbb8 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docUpdate
3e948dc [Yuhao Yang] add missing save load for python example
2015-07-01 11:17:56 -07:00
sethah 8d23587f1d [SPARK-7739] [MLLIB] Improve ChiSqSelector example code in user guide
Author: sethah <seth.hendrickson16@gmail.com>

Closes #7029 from sethah/working_on_SPARK-7739 and squashes the following commits:

ef96916 [sethah] Fixing some style issues
efea1f8 [sethah] adding clarification to ChiSqSelector example
2015-06-30 16:28:25 -07:00
Tijo Thomas 9213f73a8e [SPARK-8615] [DOCUMENTATION] Fixed Sample deprecated code
Modified the deprecated jdbc api in the documentation.

Author: Tijo Thomas <tijoparacka@gmail.com>

Closes #7039 from tijoparacka/JIRA_8615 and squashes the following commits:

6e73b8a [Tijo Thomas] Reverted new lines
4042fcf [Tijo Thomas] updated to sql documentation
a27949c [Tijo Thomas] Fixed Sample deprecated code
2015-06-30 10:50:45 -07:00
MechCoder 45281664e0 [SPARK-4127] [MLLIB] [PYSPARK] Python bindings for StreamingLinearRegressionWithSGD
Python bindings for StreamingLinearRegressionWithSGD

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #6744 from MechCoder/spark-4127 and squashes the following commits:

d8f6457 [MechCoder] Moved StreamingLinearAlgorithm to pyspark.mllib.regression
d47cc24 [MechCoder] Inherit from StreamingLinearAlgorithm
1b4ddd6 [MechCoder] minor
4de6c68 [MechCoder] Minor refactor
5e85a3b [MechCoder] Add tests for simultaneous training and prediction
fb27889 [MechCoder] Add example and docs
505380b [MechCoder] Add tests
d42bdae [MechCoder] [SPARK-4127] Python bindings for StreamingLinearRegressionWithSGD
2015-06-30 10:25:59 -07:00
Neelesh Srinivas Salian d48e78934a [SPARK-3629] [YARN] [DOCS]: Improvement of the "Running Spark on YARN" document
As per the description in the JIRA, I moved the contents of the page and added a few additional content.

Author: Neelesh Srinivas Salian <nsalian@cloudera.com>

Closes #6924 from nssalian/SPARK-3629 and squashes the following commits:

944b7a0 [Neelesh Srinivas Salian] Changed the lines about deploy-mode and added backticks to all parameters
40dbc0b [Neelesh Srinivas Salian] Changed dfs to HDFS, deploy-mode in backticks and updated the master yarn line
9cbc072 [Neelesh Srinivas Salian] Updated a few lines in the Launching Spark on YARN Section
8e8db7f [Neelesh Srinivas Salian] Removed the changes in this commit to help clearly distinguish movement from update
151c298 [Neelesh Srinivas Salian] SPARK-3629: Improvement of the Spark on YARN document
2015-06-27 09:07:10 +03:00
Rosstin b5a6663da2 [SPARK-8639] [DOCS] Fixed Minor Typos in Documentation
Ticket: [SPARK-8639](https://issues.apache.org/jira/browse/SPARK-8639)

fixed minor typos in docs/README.md and docs/api.md

Author: Rosstin <asterazul@gmail.com>

Closes #7046 from Rosstin/SPARK-8639 and squashes the following commits:

6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
2015-06-27 08:47:00 +03:00
Marcelo Vanzin 37bf76a2de [SPARK-8302] Support heterogeneous cluster install paths on YARN.
Some users have Hadoop installations on different paths across
their cluster. Currently, that makes it hard to set up some
configuration in Spark since that requires hardcoding paths to
jar files or native libraries, which wouldn't work on such a cluster.

This change introduces a couple of YARN-specific configurations
that instruct the backend to replace certain paths when launching
remote processes. That way, if the configuration says the Spark
jar is in "/spark/spark.jar", and also says that "/spark" should be
replaced with "{{SPARK_INSTALL_DIR}}", YARN will start containers
in the NMs with "{{SPARK_INSTALL_DIR}}/spark.jar" as the location
of the jar.

Coupled with YARN's environment whitelist (which allows certain
env variables to be exposed to containers), this allows users to
support such heterogeneous environments, as long as a single
replacement is enough. (Otherwise, this feature would need to be
extended to support multiple path replacements.)

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #6752 from vanzin/SPARK-8302 and squashes the following commits:

4bff8d4 [Marcelo Vanzin] Add docs, rename configs.
0aa2a02 [Marcelo Vanzin] Only do replacement for paths that need it.
2e9cc9d [Marcelo Vanzin] Style.
a5e1f68 [Marcelo Vanzin] [SPARK-8302] Support heterogeneous cluster install paths on YARN.
2015-06-26 08:45:22 -05:00
Holden Karau 43e66192f4 [SPARK-8506] Add pakages to R context created through init.
Author: Holden Karau <holden@pigscanfly.ca>

Closes #6928 from holdenk/SPARK-8506-sparkr-does-not-provide-an-easy-way-to-depend-on-spark-packages-when-performing-init-from-inside-of-r and squashes the following commits:

b60dd63 [Holden Karau] Add an example with the spark-csv package
fa8bc92 [Holden Karau] typo: sparm -> spark
865a90c [Holden Karau] strip spaces for comparision
c7a4471 [Holden Karau] Add some documentation
c1a9233 [Holden Karau] refactor for testing
c818556 [Holden Karau] Add pakages to R
2015-06-24 11:55:20 -07:00
Cheng Lian 111d6b9b8a [SPARK-8139] [SQL] Updates docs and comments of data sources and Parquet output committer options
This PR only applies to master branch (1.5.0-SNAPSHOT) since it references `org.apache.parquet` classes which only appear in Parquet 1.7.0.

Author: Cheng Lian <lian@databricks.com>

Closes #6683 from liancheng/output-committer-docs and squashes the following commits:

b4648b8 [Cheng Lian] Removes spark.sql.sources.outputCommitterClass as it's not a public option
ee63923 [Cheng Lian] Updates docs and comments of data sources and Parquet output committer options
2015-06-23 17:24:26 -07:00
Cheng Lian d96d7b5574 [DOC] [SQL] Addes Hive metastore Parquet table conversion section
This PR adds a section about Hive metastore Parquet table conversion. It documents:

1. Schema reconciliation rules introduced in #5214 (see [this comment] [1] in #5188)
2. Metadata refreshing requirement introduced in #5339

[1]: https://github.com/apache/spark/pull/5188#issuecomment-86531248

Author: Cheng Lian <lian@databricks.com>

Closes #5348 from liancheng/sql-doc-parquet-conversion and squashes the following commits:

42ae0d0 [Cheng Lian] Adds Python `refreshTable` snippet
4c9847d [Cheng Lian] Resorts to SQL for Python metadata refreshing snippet
756e660 [Cheng Lian] Adds Python snippet for metadata refreshing
50675db [Cheng Lian] Addes Hive metastore Parquet table conversion section
2015-06-23 14:19:21 -07:00
Joseph K. Bradley a1894422ad [SPARK-7715] [MLLIB] [ML] [DOC] Updated MLlib programming guide for release 1.4
Reorganized docs a bit.  Added migration guides.

**Q**: Do we want to say more for the 1.3 -> 1.4 migration guide for ```spark.ml```?  It would be a lot.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6897 from jkbradley/ml-guide-1.4 and squashes the following commits:

4bf26d6 [Joseph K. Bradley] tiny fix
8085067 [Joseph K. Bradley] fixed spacing/layout issues in ml guide from previous commit in this PR
6cd5c78 [Joseph K. Bradley] Updated MLlib programming guide for release 1.4
2015-06-21 16:25:25 -07:00
cody koeninger b305e377fb [SPARK-8390] [STREAMING] [KAFKA] fix docs related to HasOffsetRanges
Author: cody koeninger <cody@koeninger.org>

Closes #6863 from koeninger/SPARK-8390 and squashes the following commits:

26a06bd [cody koeninger] Merge branch 'master' into SPARK-8390
3744492 [cody koeninger] [Streaming][Kafka][SPARK-8390] doc changes per TD, test to make sure approach shown in docs actually compiles + runs
b108c9d [cody koeninger] [Streaming][Kafka][SPARK-8390] further doc fixes, clean up spacing
bb4336b [cody koeninger] [Streaming][Kafka][SPARK-8390] fix docs related to HasOffsetRanges, cleanup
3f3c57a [cody koeninger] [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out of the existing java direct stream api
2015-06-19 17:18:31 -07:00
MechCoder 54976e55e3 [SPARK-4118] [MLLIB] [PYSPARK] Python bindings for StreamingKMeans
Python bindings for StreamingKMeans

Will change status to MRG once docs, tests and examples are updated.

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #6499 from MechCoder/spark-4118 and squashes the following commits:

7722d16 [MechCoder] minor style fixes
51052d3 [MechCoder] Doc fixes
2061a76 [MechCoder] Add tests for simultaneous training and prediction Minor style fixes
81482fd [MechCoder] minor
5d9fe61 [MechCoder] predictOn should take into account the latest model
8ab9e89 [MechCoder] Fix Python3 error
a9817df [MechCoder] Better tests and minor fixes
c80e451 [MechCoder] Add ignore_unicode_prefix
ee8ce16 [MechCoder] Update tests, doc and examples
4b1481f [MechCoder] Some changes and tests
d8b066a [MechCoder] [SPARK-4118] [MLlib] [PySpark] Python bindings for StreamingKMeans
2015-06-19 12:23:15 -07:00
Sean Owen 4be53d0395 [SPARK-5836] [DOCS] [STREAMING] Clarify what may cause long-running Spark apps to preserve shuffle files
Clarify what may cause long-running Spark apps to preserve shuffle files

Author: Sean Owen <sowen@cloudera.com>

Closes #6901 from srowen/SPARK-5836 and squashes the following commits:

a9faef0 [Sean Owen] Clarify what may cause long-running Spark apps to preserve shuffle files
2015-06-19 11:03:04 -07:00
Jihong MA ebd363aecd [SPARK-7265] Improving documentation for Spark SQL Hive support
Please review this pull request.

Author: Jihong MA <linlin200605@gmail.com>

Closes #5933 from JihongMA/SPARK-7265 and squashes the following commits:

dfaa971 [Jihong MA] SPARK-7265 minor fix of the content
ace454d [Jihong MA] SPARK-7265 take out PySpark on YARN limitation
9ea0832 [Jihong MA] Merge remote-tracking branch 'upstream/master'
d5bf3f5 [Jihong MA] Merge remote-tracking branch 'upstream/master'
7b842e6 [Jihong MA] Merge remote-tracking branch 'upstream/master'
9c84695 [Jihong MA] SPARK-7265 address review comment
a399aa6 [Jihong MA] SPARK-7265 Improving documentation for Spark SQL Hive support
2015-06-19 14:06:49 +02:00
Lars Francke 4ce3bab89f [SPARK-8462] [DOCS] Documentation fixes for Spark SQL
This fixes various minor documentation issues on the Spark SQL page

Author: Lars Francke <lars.francke@gmail.com>

Closes #6890 from lfrancke/SPARK-8462 and squashes the following commits:

dd7e302 [Lars Francke] Merge branch 'master' into SPARK-8462
34eff2c [Lars Francke] Minor documentation fixes
2015-06-18 19:40:32 -07:00
zsxwing 24e53793b4 [SPARK-8376] [DOCS] Add common lang3 to the Spark Flume Sink doc
Commons Lang 3 has been added as one of the dependencies of Spark Flume Sink since #5703. This PR updates the doc for it.

Author: zsxwing <zsxwing@gmail.com>

Closes #6829 from zsxwing/flume-sink-dep and squashes the following commits:

f8617f0 [zsxwing] Add common lang3 to the Spark Flume Sink doc
2015-06-18 16:00:27 -07:00
Josh Rosen 44c931f006 [SPARK-8353] [DOCS] Show anchor links when hovering over documentation headers
This patch uses [AnchorJS](https://bryanbraun.github.io/anchorjs/) to show deep anchor links when hovering over headers in the Spark documentation. For example:

![image](https://cloud.githubusercontent.com/assets/50748/8240800/1502f85c-15ba-11e5-819a-97b231370a39.png)

This makes it easier for users to link to specific sections of the documentation.

I also removed some dead Javascript which isn't used in our current docs (it was introduced for the old AMPCamp training, but isn't used anymore).

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6808 from JoshRosen/SPARK-8353 and squashes the following commits:

e59d8a7 [Josh Rosen] Suppress underline on hover
f518b6a [Josh Rosen] Turn on for all headers, since we use H1s in a bunch of places
a9fec01 [Josh Rosen] Add anchor links when hovering over headers; remove some dead JS code
2015-06-18 15:10:09 -07:00
Neelesh Srinivas Salian ddc5baf17d [SPARK-8320] [STREAMING] Add example in streaming programming guide that shows union of multiple input streams
Added python code to https://spark.apache.org/docs/latest/streaming-programming-guide.html
to the Level of Parallelism in Data Receiving section.

Please review and let me know if there are any additional changes that are needed.

Thank you.

Author: Neelesh Srinivas Salian <nsalian@cloudera.com>

Closes #6862 from nssalian/SPARK-8320 and squashes the following commits:

4bfd126 [Neelesh Srinivas Salian] Changed loop structure to be more in line with Python style
e5345de [Neelesh Srinivas Salian] Changes to kafak append, for loop and show to print()
3fc5c6d [Neelesh Srinivas Salian] SPARK-8320
2015-06-18 09:44:36 -07:00
zsxwing 78a430ea4d [SPARK-7961][SQL]Refactor SQLConf to display better error message
1. Add `SQLConfEntry` to store the information about a configuration. For those configurations that cannot be found in `sql-programming-guide.md`, I left the doc as `<TODO>`.
2. Verify the value when setting a configuration if this is in SQLConf.
3. Use `SET -v` to display all public configurations.

Author: zsxwing <zsxwing@gmail.com>

Closes #6747 from zsxwing/sqlconf and squashes the following commits:

7d09bad [zsxwing] Use SQLConfEntry in HiveContext
49f6213 [zsxwing] Add getConf, setConf to SQLContext and HiveContext
e014f53 [zsxwing] Merge branch 'master' into sqlconf
93dad8e [zsxwing] Fix the unit tests
cf950c1 [zsxwing] Fix the code style and tests
3c5f03e [zsxwing] Add unsetConf(SQLConfEntry) and fix the code style
a2f4add [zsxwing] getConf will return the default value if a config is not set
037b1db [zsxwing] Add schema to SetCommand
0520c3c [zsxwing] Merge branch 'master' into sqlconf
7afb0ec [zsxwing] Fix the configurations about HiveThriftServer
7e728e3 [zsxwing] Add doc for SQLConfEntry and fix 'toString'
5e95b10 [zsxwing] Add enumConf
c6ba76d [zsxwing] setRawString => setConfString, getRawString => getConfString
4abd807 [zsxwing] Fix the test for 'set -v'
6e47e56 [zsxwing] Fix the compilation error
8973ced [zsxwing] Remove floatConf
1fc3a8b [zsxwing] Remove the 'conf' command and use 'set -v' instead
99c9c16 [zsxwing] Fix tests that use SQLConfEntry as a string
88a03cc [zsxwing] Add new lines between confs and return types
ce7c6c8 [zsxwing] Remove seqConf
f3c1b33 [zsxwing] Refactor SQLConf to display better error message
2015-06-17 23:22:54 -07:00
MechCoder 22732e1eca [SPARK-7605] [MLLIB] [PYSPARK] Python API for ElementwiseProduct
Python API for org.apache.spark.mllib.feature.ElementwiseProduct

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #6346 from MechCoder/spark-7605 and squashes the following commits:

79d1ef5 [MechCoder] Consistent and support list / array types
5f81d81 [MechCoder] [SPARK-7605] [MLlib] Python API for ElementwiseProduct
2015-06-17 22:08:38 -07:00
Sean Owen f005be0273 [SPARK-8395] [DOCS] start-slave.sh docs incorrect
start-slave.sh no longer takes a worker # param in 1.4+

Author: Sean Owen <sowen@cloudera.com>

Closes #6855 from srowen/SPARK-8395 and squashes the following commits:

300278e [Sean Owen] start-slave.sh no longer takes a worker # param in 1.4+
2015-06-17 13:31:10 -07:00
Radek Ostrowski 4bd10fd509 [SQL] [DOC] improved a comment
[SQL][DOC] I found it a bit confusing when I came across it for the first time in the docs

Author: Radek Ostrowski <dest.hawaii@gmail.com>
Author: radek <radek@radeks-MacBook-Pro-2.local>

Closes #6332 from radek1st/master and squashes the following commits:

dae3347 [Radek Ostrowski] fixed typo
c76bb3a [radek] improved a comment
2015-06-16 21:04:26 +01:00
Moussa Taifi dc455b8833 [SPARK-DOCS] [SPARK-SQL] Update sql-programming-guide.md
Typo in thriftserver section

Author: Moussa Taifi <moutai10@gmail.com>

Closes #6847 from moutai/patch-1 and squashes the following commits:

1bd29df [Moussa Taifi] Update sql-programming-guide.md
2015-06-16 20:59:22 +01:00
Peter Hoffmann f3f2a4397d fix read/write mixup
Author: Peter Hoffmann <ph@peter-hoffmann.com>

Closes #6815 from hoffmann/patch-1 and squashes the following commits:

2abb6da [Peter Hoffmann] fix read/write mixup
2015-06-14 11:41:16 -07:00
Mike Dusenberry 35d1267cf8 [Spark-8343] [Streaming] [Docs] Improve Spark Streaming Guides.
This improves the Spark Streaming Guides by fixing broken links, rewording confusing sections, fixing typos, adding missing words, etc.

Author: Mike Dusenberry <dusenberrymw@gmail.com>

Closes #6801 from dusenberrymw/SPARK-8343_Improve_Spark_Streaming_Guides_MERGED and squashes the following commits:

6688090 [Mike Dusenberry] Improvements to the Spark Streaming Custom Receiver Guide, including slight rewording of confusing sections, and fixing typos & missing words.
436fbd8 [Mike Dusenberry] Bunch of improvements to the Spark Streaming Guide, including fixing broken links, slight rewording of confusing sections, fixing typos & missing words, etc.
2015-06-13 21:22:46 -07:00
Tathagata Das e9471d3414 [SPARK-7284] [STREAMING] Updated streaming documentation
- Kinesis API updated
- Kafka version updated, and Python API for Direct Kafka added
- Added SQLContext.getOrCreate()
- Added information on how to get partitionId in foreachRDD

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #6781 from tdas/SPARK-7284 and squashes the following commits:

aac7be0 [Tathagata Das] Added information on how to get partition id
a66ec22 [Tathagata Das] Complete the line incomplete line,
a92ca39 [Tathagata Das] Updated streaming documentation
2015-06-12 15:22:59 -07:00
Marcelo Vanzin 9cbdf31ec1 [SPARK-6511] [docs] Fix example command in hadoop-provided docs.
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #6766 from vanzin/SPARK-6511 and squashes the following commits:

49f0f67 [Marcelo Vanzin] [SPARK-6511] [docs] Fix example command in hadoop-provided docs.
2015-06-11 15:29:03 -07:00
Hossein 30ebf1a233 [SPARK-8282] [SPARKR] Make number of threads used in RBackend configurable
Read number of threads for RBackend from configuration.

[SPARK-8282] #comment Linking with JIRA

Author: Hossein <hossein@databricks.com>

Closes #6730 from falaki/SPARK-8282 and squashes the following commits:

33b3d98 [Hossein] Documented new config parameter
70f2a9c [Hossein] Fixing import
ec44225 [Hossein] Read number of threads for RBackend from configuration
2015-06-10 13:19:44 -07:00
Cheng Lian 8f7308f9c4 [SQL] [MINOR] Fixes a minor Java example error in SQL programming guide
Author: Cheng Lian <lian@databricks.com>

Closes #6749 from liancheng/java-sample-fix and squashes the following commits:

5b44585 [Cheng Lian] Fixes a minor Java example error in SQL programming guide
2015-06-10 11:48:14 -07:00
Patrick Wendell 6e4fb0c9e8 [SPARK-6511] [DOCUMENTATION] Explain how to use Hadoop provided builds
This provides preliminary documentation pointing out how to use the
Hadoop free builds. I am hoping over time this list can grow to
include most of the popular Hadoop distributions.

Getting more people using these builds will help us long term reduce
the number of binaries we build.

Author: Patrick Wendell <patrick@databricks.com>

Closes #6729 from pwendell/hadoop-provided and squashes the following commits:

1113b76 [Patrick Wendell] [SPARK-6511] [Documentation] Explain how to use Hadoop provided builds
2015-06-09 16:14:21 -07:00
FavioVazquez 490d5a72ec [SPARK-8274] [DOCUMENTATION-MLLIB] Fix wrong URLs in MLlib Frequent Pattern Mining Documentation
There is a mistake in the URLs of the Scala section of FP-Growth in the MLlib Frequent Pattern Mining documentation. The URL points to https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/fpm/FPGrowth.html which is the Java's API, the link should point to the Scala API https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.fpm.FPGrowth

There's another mistake in the FP-GrowthModel in the same section, the link points, again, to the Java's API https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/fpm/FPGrowthModel.html, the link should point to the Scala API https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.fpm.FPGrowthModel

Author: FavioVazquez <favio.vazquezp@gmail.com>

Closes #6722 from FavioVazquez/fix-wrog-urls-mllib-fpgrowth and squashes the following commits:

e1ca54d [FavioVazquez] - Fixed wrong URLs in MLlib Frequent Pattern Mining, FP-Growth Scala section
ad882a3 [FavioVazquez] Merge remote-tracking branch 'upstream/master'
f27a20b [FavioVazquez] Merge remote-tracking branch 'upstream/master'
9af7074 [FavioVazquez] Merge remote-tracking branch 'upstream/master'
edab1ef [FavioVazquez] Merge remote-tracking branch 'upstream/master'
b2e2f8c [FavioVazquez] Merge remote-tracking branch 'upstream/master'
2015-06-09 15:02:18 +01:00
Kousuke Saruta e6fb6cedf3 [STREAMING] [DOC] Remove duplicated description about WAL
I noticed there is a duplicated description about WAL.

```
To ensure zero-data loss, you have to additionally enable Write Ahead Logs in Spark Streaming. To ensure zero data loss, enable the Write Ahead Logs (introduced in Spark 1.2).
```

Let's remove the duplication.

I don't file this issue in JIRA because it's minor.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #6719 from sarutak/remove-multiple-description and squashes the following commits:

cc9bb21 [Kousuke Saruta] Removed duplicated description about WAL
2015-06-09 12:19:01 +01:00
Liang-Chi Hsieh 03ef6be9ce [SPARK-7939] [SQL] Add conf to enable/disable partition column type inference
JIRA: https://issues.apache.org/jira/browse/SPARK-7939

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6503 from viirya/disable_partition_type_inference and squashes the following commits:

3e90470 [Liang-Chi Hsieh] Default to enable type inference and update docs.
455edb1 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into disable_partition_type_inference
9a57933 [Liang-Chi Hsieh] Add conf to enable/disable partition column type inference.
2015-06-08 17:50:38 +08:00
Daoyuan Wang 10fc2f6f51 [SPARK-4761] [DOC] [SQL] kryo default setting in SQL Thrift server
this is a follow up of #3621

/cc liancheng pwendell

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #6639 from adrian-wang/kryodoc and squashes the following commits:

3c4b1cf [Daoyuan Wang] [DOC] kryo default setting in SQL Thrift server
2015-06-08 01:07:50 -07:00
Sean Owen e84815dc33 [SPARK-7733] [CORE] [BUILD] Update build, code to use Java 7 for 1.5.0+
Update build to use Java 7, and remove some comments and special-case support for Java 6.

Author: Sean Owen <sowen@cloudera.com>

Closes #6265 from srowen/SPARK-7733 and squashes the following commits:

59bda4e [Sean Owen] Update build to use Java 7, and remove some comments and special-case support for Java 6
2015-06-07 20:18:13 +01:00
Yijie Shen 835f1380d9 [DOC] [TYPO] Fix typo in standalone deploy scripts description
Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #6691 from yijieshen/patch-2 and squashes the following commits:

b40a4b0 [Yijie Shen] [DOC][TYPO] Fix typo in standalone deploy scripts description
2015-06-07 15:30:37 +01:00
Hari Shreedharan 3285a51121 [SPARK-7955] [CORE] Ensure executors with cached RDD blocks are not re…
…moved if dynamic allocation is enabled.

This is a work in progress. This patch ensures that an executor that has cached RDD blocks are not removed,
but makes no attempt to find another executor to remove. This is meant to get some feedback on the current
approach, and if it makes sense then I will look at choosing another executor to remove. No testing has been done either.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #6508 from harishreedharan/dymanic-caching and squashes the following commits:

dddf1eb [Hari Shreedharan] Minor configuration description update.
10130e2 [Hari Shreedharan] Fix compile issue.
5417b53 [Hari Shreedharan] Add documentation for new config. Remove block from cachedBlocks when it is dropped.
875916a [Hari Shreedharan] Make some code more readable.
39940ca [Hari Shreedharan] Handle the case where the executor has not yet registered.
90ad711 [Hari Shreedharan] Remove unused imports and unused methods.
063985c [Hari Shreedharan] Send correct message instead of recursively calling same method.
ec2fd7e [Hari Shreedharan] Add file missed in last commit
5d10fad [Hari Shreedharan] Update cached blocks status using local info, rather than doing an RPC.
193af4c [Hari Shreedharan] WIP. Use local state rather than via RPC.
ae932ff [Hari Shreedharan] Fix config param name.
272969d [Hari Shreedharan] Fix seconds to millis bug.
5a1993f [Hari Shreedharan] Add timeout for cache executors. Ignore broadcast blocks while checking if there are cached blocks.
57fefc2 [Hari Shreedharan] [SPARK-7955][Core] Ensure executors with cached RDD blocks are not removed if dynamic allocation is enabled.
2015-06-06 21:13:26 -07:00
Luca Martinetti 4060526cd3 [SPARK-7747] [SQL] [DOCS] spark.sql.planner.externalSort
Add documentation for spark.sql.planner.externalSort

Author: Luca Martinetti <luca@luca.io>

Closes #6272 from lucamartinetti/docs-externalsort and squashes the following commits:

985661b [Luca Martinetti] [SPARK-7747] [SQL] [DOCS] Add documentation for spark.sql.planner.externalSort
2015-06-05 13:40:11 -07:00
Akhil Das 019dc9f558 [STREAMING] Update streaming-kafka-integration.md
Fixed the broken links (Examples) in the documentation.

Author: Akhil Das <akhld@darktech.ca>

Closes #6666 from akhld/patch-2 and squashes the following commits:

2228b83 [Akhil Das] Update streaming-kafka-integration.md
2015-06-05 14:23:23 +02:00
Hari Shreedharan 566cb59479 [HOTFIX] History Server API docs error fix.
Minor error in the monitoring docs. Also made indentation changes in `ApiRootResource`

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #6628 from harishreedharan/eventlog-formatting and squashes the following commits:

a12553d [Hari Shreedharan] Javadoc updates.
ca399b6 [Hari Shreedharan] [HOTFIX] History Server API docs error fix.
2015-06-03 16:53:57 -07:00
Hari Shreedharan d2a86eb8f0 [SPARK-7161] [HISTORY SERVER] Provide REST api to download event logs fro...
...m History Server

This PR adds a new API that allows the user to download event logs for an application as a zip file. APIs have been added to download all logs for a given application or just for a specific attempt.

This also add an additional method to the ApplicationHistoryProvider to get the raw files, zipped.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #5792 from harishreedharan/eventlog-download and squashes the following commits:

221cc26 [Hari Shreedharan] Update docs with new API information.
a131be6 [Hari Shreedharan] Fix style issues.
5528bd8 [Hari Shreedharan] Merge branch 'master' into eventlog-download
6e8156e [Hari Shreedharan] Simplify tests, use Guava stream copy methods.
d8ddede [Hari Shreedharan] Remove unnecessary case in EventLogDownloadResource.
ffffb53 [Hari Shreedharan] Changed interface to use zip stream. Added more tests.
1100b40 [Hari Shreedharan] Ensure that `Path` does not appear in interfaces, by rafactoring interfaces.
5a5f3e2 [Hari Shreedharan] Fix test ordering issue.
0b66948 [Hari Shreedharan] Minor formatting/import fixes.
4fc518c [Hari Shreedharan] Fix rat failures.
a48b91f [Hari Shreedharan] Refactor to make attemptId optional in the API. Also added tests.
0fc1424 [Hari Shreedharan] File download now works for individual attempts and the entire application.
350d7e8 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into eventlog-download
fd6ab00 [Hari Shreedharan] Fix style issues
32b7662 [Hari Shreedharan] Use UIRoot directly in ApiRootResource. Also, use `Response` class to set headers.
7b362b2 [Hari Shreedharan] Almost working.
3d18ebc [Hari Shreedharan] [WIP] Try getting the event log download to work.
2015-06-03 13:43:13 -05:00
Patrick Wendell 2c4d550eda [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
Author: Patrick Wendell <patrick@databricks.com>

Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:

2f42d02 [Patrick Wendell] A few more excludes
4bebcf0 [Patrick Wendell] Update to RC4
61aaf46 [Patrick Wendell] Using new release candidate
55f1610 [Patrick Wendell] Another exclude
04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
2015-06-03 10:11:27 -07:00
Yuhao Yang 43adbd5611 [SPARK-8043] [MLLIB] [DOC] update NaiveBayes and SVM examples in doc
jira: https://issues.apache.org/jira/browse/SPARK-8043

I found some issues during testing the save/load examples in markdown Documents, as a part of 1.4 QA plan

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #6584 from hhbyyh/naiveDocExample and squashes the following commits:

a01a206 [Yuhao Yang] fix for Gaussian mixture
2fb8b96 [Yuhao Yang] update NaiveBayes and SVM examples in doc
2015-06-02 23:15:38 -07:00
Mike Dusenberry ad06727fe9 [SPARK-7985] [ML] [MLlib] [Docs] Remove "fittingParamMap" references. Updating ML Doc "Estimator, Transformer, and Param" examples.
Updating ML Doc's *"Estimator, Transformer, and Param"* example to use `model.extractParamMap` instead of `model.fittingParamMap`, which no longer exists.

mengxr, I believe this addresses (part of) the *update documentation* TODO list item from [PR 5820](https://github.com/apache/spark/pull/5820).

Author: Mike Dusenberry <dusenberrymw@gmail.com>

Closes #6514 from dusenberrymw/Fix_ML_Doc_Estimator_Transformer_Param_Example and squashes the following commits:

6366e1f [Mike Dusenberry] Updating instances of model.extractParamMap to model.parent.extractParamMap, since the Params of the parent Estimator could possibly differ from thos of the Model.
d850e0e [Mike Dusenberry] Removing all references to "fittingParamMap" throughout Spark, since it has been removed.
0480304 [Mike Dusenberry] Updating the ML Doc "Estimator, Transformer, and Param" Java example to use model.extractParamMap() instead of model.fittingParamMap(), which no longer exists.
7d34939 [Mike Dusenberry] Updating ML Doc "Estimator, Transformer, and Param" example to use model.extractParamMap instead of model.fittingParamMap, which no longer exists.
2015-06-02 12:38:14 -07:00
Xiangrui Meng 0221c7f0ef [SPARK-7582] [MLLIB] user guide for StringIndexer
This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6561 from mengxr/SPARK-7582 and squashes the following commits:

4bba4f1 [Xiangrui Meng] fix example
ba1cd1b [Xiangrui Meng] fix style
7fa18d1 [Xiangrui Meng] add user guide for StringIndexer
136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer
2015-06-01 22:03:29 -07:00