Commit graph

1225 commits

Author SHA1 Message Date
Keiji Yoshida 623c675fde Update streaming-programming-guide.md
Update `See the Scala example` to `See the Java example`.

Author: Keiji Yoshida <yoshida.keiji.84@gmail.com>

Closes #8376 from yosssi/patch-1.
2015-08-23 11:04:29 +01:00
Keiji Yoshida 46fcb9e0db Update programming-guide.md
Update `lineLengths.persist();` to `lineLengths.persist(StorageLevel.MEMORY_ONLY());` because `JavaRDD#persist` needs a parameter of `StorageLevel`.

Author: Keiji Yoshida <yoshida.keiji.84@gmail.com>

Closes #8372 from yosssi/patch-1.
2015-08-22 02:38:10 -07:00
Xusen Yin 630a994e6a [SPARK-9893] User guide with Java test suite for VectorSlicer
Add user guide for `VectorSlicer`, with Java test suite and Python version VectorSlicer.

Note that Python version does not support selecting by names now.

Author: Xusen Yin <yinxusen@gmail.com>

Closes #8267 from yinxusen/SPARK-9893.
2015-08-21 16:30:12 -07:00
Alexander Ulanov dcfe0c5cde [SPARK-9846] [DOCS] User guide for Multilayer Perceptron Classifier
Added user guide for multilayer perceptron classifier:
  - Simplified description of the multilayer perceptron classifier
  - Example code for Scala and Java

Author: Alexander Ulanov <nashb@yandex.ru>

Closes #8262 from avulanov/SPARK-9846-mlpc-docs.
2015-08-20 20:02:27 -07:00
Eric Liang 8e0a072f78 [SPARK-9895] User Guide for RFormula Feature Transformer
mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #8293 from ericl/docs-2.
2015-08-19 15:43:08 -07:00
Marcelo Vanzin 5fd53c64bb [SPARK-9833] [YARN] Add options to disable delegation token retrieval.
This allows skipping the code that tries to talk to Hive and HBase to
fetch delegation tokens, in case that somehow conflicts with the application
being run.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8134 from vanzin/SPARK-9833.
2015-08-19 10:51:59 -07:00
Yanbo Liang 802b5b8791 [SPARK-10084] [MLLIB] [DOC] Add Python example for mllib FP-growth user guide
1, Add Python example for mllib FP-growth user guide.
2, Correct mistakes of Scala and Java examples.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8279 from yanboliang/spark-10084.
2015-08-19 08:53:34 -07:00
Joseph K. Bradley 39e4ebd521 [SPARK-10060] [ML] [DOC] spark.ml DecisionTree user guide
New user guide section ml-decision-tree.md, including code examples.

I have run all examples, including the Java ones.

CC: manishamde yanboliang mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8244 from jkbradley/ml-dt-docs.
2015-08-19 07:38:27 -07:00
lewuathe ba2a07e2b6 [SPARK-9977] [DOCS] Update documentation for StringIndexer
By using `StringIndexer`, we can obtain indexed label on new column. So a following estimator should use this new column through pipeline if it wants to use string indexed label.
I think it is better to make it explicit on documentation.

Author: lewuathe <lewuathe@me.com>

Closes #8205 from Lewuathe/SPARK-9977.
2015-08-19 09:54:03 +01:00
Sean Owen f141efeafb [SPARK-10070] [DOCS] Remove Guava dependencies in user guides
`Lists.newArrayList` -> `Arrays.asList`

CC jkbradley feynmanliang

Anybody into replacing usages of `Lists.newArrayList` in the examples / source code too? this method isn't useful in Java 7 and beyond.

Author: Sean Owen <sowen@cloudera.com>

Closes #8272 from srowen/SPARK-10070.
2015-08-19 09:41:09 +01:00
Bill Chambers b23c4d3ffc Fix Broken Link
Link was broken because it included tick marks.

Author: Bill Chambers <wchambers@ischool.berkeley.edu>

Closes #8302 from anabranch/patch-1.
2015-08-19 00:05:01 -07:00
Alexander Ulanov 1c843e2848 [SPARK-9508] GraphX Pregel docs update with new Pregel code
SPARK-9436 simplifies the Pregel code. graphx-programming-guide needs to be modified accordingly since it lists the old Pregel code

Author: Alexander Ulanov <nashb@yandex.ru>

Closes #7831 from avulanov/SPARK-9508-pregel-doc2.
2015-08-18 22:13:52 -07:00
Davies Liu de3223872a [SPARK-9705] [DOC] fix docs about Python version
cc JoshRosen

Author: Davies Liu <davies@databricks.com>

Closes #8245 from davies/python_doc.
2015-08-18 22:11:27 -07:00
Feynman Liang badf7fa650 [SPARK-8473] [SPARK-9889] [ML] User guide and example code for DCT
mengxr jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8184 from feynmanliang/SPARK-9889-DCT-docs.
2015-08-18 17:54:49 -07:00
Dennis Huo 9b731fad2b [SPARK-9782] [YARN] Support YARN application tags via SparkConf
Add a new test case in yarn/ClientSuite which checks how the various SparkConf
and ClientArguments propagate into the ApplicationSubmissionContext.

Author: Dennis Huo <dhuo@google.com>

Closes #8072 from dennishuo/dhuo-yarn-application-tags.
2015-08-18 14:34:20 -07:00
Piotr Migdal 8bae9015b7 [SPARK-10085] [MLLIB] [DOCS] removed unnecessary numpy array import
See https://issues.apache.org/jira/browse/SPARK-10085

Author: Piotr Migdal <pmigdal@gmail.com>

Closes #8284 from stared/spark-10085.
2015-08-18 12:59:28 -07:00
Yanbo Liang 747c2ba800 [SPARK-10032] [PYSPARK] [DOC] Add Python example for mllib LDAModel user guide
Add Python example for mllib LDAModel user guide

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8227 from yanboliang/spark-10032.
2015-08-18 12:56:36 -07:00
Yanbo Liang f4fa61effe [SPARK-10029] [MLLIB] [DOC] Add Python examples for mllib IsotonicRegression user guide
Add Python examples for mllib IsotonicRegression user guide

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8225 from yanboliang/spark-10029.
2015-08-18 12:55:36 -07:00
Feynman Liang f5ea391290 [SPARK-9900] [MLLIB] User guide for Association Rules
Updates FPM user guide to include Association Rules.

Author: Feynman Liang <fliang@databricks.com>

Closes #8207 from feynmanliang/SPARK-9900-arules.
2015-08-18 12:53:57 -07:00
jose.cambronero c90c605dc6 [SPARK-9902] [MLLIB] Add Java and Python examples to user guide for 1-sample KS test
added doc examples for python.

Author: jose.cambronero <jose.cambronero@cloudera.com>

Closes #8154 from josepablocam/spark_9902.
2015-08-17 19:09:45 -07:00
Sandy Ryza f9d1a92aa1 [SPARK-7707] User guide and example code for KernelDensity
Author: Sandy Ryza <sandy@cloudera.com>

Closes #8230 from sryza/sandy-spark-7707.
2015-08-17 17:57:51 -07:00
Feynman Liang 0b6b017613 [SPARK-9898] [MLLIB] Prefix Span user guide
Adds user guide for `PrefixSpan`, including Scala and Java example code.

mengxr zhangjiajin

Author: Feynman Liang <fliang@databricks.com>

Closes #8253 from feynmanliang/SPARK-9898.
2015-08-17 17:53:24 -07:00
Yanbo Liang 0076e82123 [SPARK-9768] [PYSPARK] [ML] Add Python API and user guide for ml.feature.ElementwiseProduct
Add Python API, user guide and example for ml.feature.ElementwiseProduct.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8061 from yanboliang/SPARK-9768.
2015-08-17 17:25:41 -07:00
Feynman Liang fdaf17f63f [SPARK-10068] [MLLIB] Adds links to MLlib types, algos, utilities listing
mengxr jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8255 from feynmanliang/SPARK-10068.
2015-08-17 15:42:14 -07:00
Reynold Xin e5fd60415f [SPARK-9934] Deprecate NIO ConnectionManager.
Deprecate NIO ConnectionManager in Spark 1.5.0, before removing it in Spark 1.6.0.

Author: Reynold Xin <rxin@databricks.com>

Closes #8162 from rxin/SPARK-9934.
2015-08-14 20:55:32 -07:00
Rosstin 7a539ef3b1 [SPARK-8965] [DOCS] Add ml-guide Python Example: Estimator, Transformer, and Param
Added ml-guide Python Example: Estimator, Transformer, and Param
/docs/_site/ml-guide.html

Author: Rosstin <asterazul@gmail.com>

Closes #8081 from Rosstin/SPARK-8965.
2015-08-13 09:18:39 -07:00
Niranjan Padmanabhan 738f353988 [SPARK-9092] Fixed incompatibility when both num-executors and dynamic...
… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.

Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com>

Closes #7657 from neurons/SPARK-9092.
2015-08-12 16:10:21 -07:00
Yuhao Yang 66d87c1d76 [SPARK-7583] [MLLIB] User guide update for RegexTokenizer
jira: https://issues.apache.org/jira/browse/SPARK-7583

User guide update for RegexTokenizer

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #7828 from hhbyyh/regexTokenizerDoc.
2015-08-12 09:35:32 -07:00
Timothy Chen 741a29f989 [SPARK-9575] [MESOS] Add docuemntation around Mesos shuffle service.
andrewor14

Author: Timothy Chen <tnachen@gmail.com>

Closes #7907 from tnachen/mesos_shuffle.
2015-08-11 23:33:22 -07:00
Timothy Chen 5c99d8bf98 [SPARK-8798] [MESOS] Allow additional uris to be fetched with mesos
Some users like to download additional files in their sandbox that they can refer to from their spark program, or even later mount these files to another directory.

Author: Timothy Chen <tnachen@gmail.com>

Closes #7195 from tnachen/mesos_files.
2015-08-11 23:26:33 -07:00
Eric Liang 74a293f453 [SPARK-9713] [ML] Document SparkR MLlib glm() integration in Spark 1.5
This documents the use of R model formulae in the SparkR guide. Also fixes some bugs in the R api doc.

mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #8085 from ericl/docs.
2015-08-11 21:26:03 -07:00
Prabeesh K 853809e948 [SPARK-5155] [PYSPARK] [STREAMING] Mqtt streaming support in Python
This PR is based on #4229, thanks prabeesh.

Closes #4229

Author: Prabeesh K <prabsmails@gmail.com>
Author: zsxwing <zsxwing@gmail.com>
Author: prabs <prabsmails@gmail.com>
Author: Prabeesh K <prabeesh.k@namshi.com>

Closes #7833 from zsxwing/pr4229 and squashes the following commits:

9570bec [zsxwing] Fix the variable name and check null in finally
4a9c79e [zsxwing] Fix pom.xml indentation
abf5f18 [zsxwing] Merge branch 'master' into pr4229
935615c [zsxwing] Fix the flaky MQTT tests
47278c5 [zsxwing] Include the project class files
478f844 [zsxwing] Add unpack
5f8a1d4 [zsxwing] Make the maven build generate the test jar for Python MQTT tests
734db99 [zsxwing] Merge branch 'master' into pr4229
126608a [Prabeesh K] address the comments
b90b709 [Prabeesh K] Merge pull request #1 from zsxwing/pr4229
d07f454 [zsxwing] Register StreamingListerner before starting StreamingContext; Revert unncessary changes; fix the python unit test
a6747cb [Prabeesh K] wait for starting the receiver before publishing data
87fc677 [Prabeesh K] address the comments:
97244ec [zsxwing] Make sbt build the assembly test jar for streaming mqtt
80474d1 [Prabeesh K] fix
1f0cfe9 [Prabeesh K] python style fix
e1ee016 [Prabeesh K] scala style fix
a5a8f9f [Prabeesh K] added Python test
9767d82 [Prabeesh K] implemented Python-friendly class
a11968b [Prabeesh K] fixed python style
795ec27 [Prabeesh K] address comments
ee387ae [Prabeesh K] Fix assembly jar location of mqtt-assembly
3f4df12 [Prabeesh K] updated version
b34c3c1 [prabs] adress comments
3aa7fff [prabs] Added Python streaming mqtt word count example
b7d42ff [prabs] Mqtt streaming support in Python
2015-08-10 16:33:23 -07:00
Mahmoud Lababidi d285212756 Fixed AtmoicReference<> Example
Author: Mahmoud Lababidi <lababidi@gmail.com>

Closes #8076 from lababidi/master and squashes the following commits:

af4553b [Mahmoud Lababidi] Fixed AtmoicReference<> Example
2015-08-10 13:02:01 -07:00
Jeff Zhang fe12277b40 Fix doc typo
Straightforward fix on doc typo

Author: Jeff Zhang <zjffdu@apache.org>

Closes #8019 from zjffdu/master and squashes the following commits:

aed6e64 [Jeff Zhang] Fix doc typo
2015-08-06 21:03:47 -07:00
Davies Liu 17284db314 [SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe
spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing.

cc marmbrus rxin

Author: Davies Liu <davies@databricks.com>

Closes #7998 from davies/tungsten and squashes the following commits:

c1c16da [Davies Liu] update doc
1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe

(cherry picked from commit 4e70e8256c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-08-06 19:42:02 -07:00
Davies Liu 49b1504fe3 Revert "[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe"
This reverts commit 4e70e8256c.
2015-08-06 17:36:12 -07:00
Davies Liu 4e70e8256c [SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe
spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing.

cc marmbrus rxin

Author: Davies Liu <davies@databricks.com>

Closes #7998 from davies/tungsten and squashes the following commits:

c1c16da [Davies Liu] update doc
1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe
2015-08-06 17:30:31 -07:00
Sean Owen 0d7aac99da [SPARK-9641] [DOCS] spark.shuffle.service.port is not documented
Document spark.shuffle.service.{enabled,port}

CC sryza tgravescs
This is pretty minimal; is there more to say here about the service?

Author: Sean Owen <sowen@cloudera.com>

Closes #7991 from srowen/SPARK-9641 and squashes the following commits:

3bb946e [Sean Owen] Add link to docs for setup and config of external shuffle service
2302e01 [Sean Owen] Document spark.shuffle.service.{enabled,port}
2015-08-06 19:29:42 +01:00
Mike Dusenberry 34dcf10104 [SPARK-6486] [MLLIB] [PYTHON] Add BlockMatrix to PySpark.
mengxr This adds the `BlockMatrix` to PySpark.  I have the conversions to `IndexedRowMatrix` and `CoordinateMatrix` ready as well, so once PR #7554 is completed (which relies on PR #7746), this PR can be finished.

Author: Mike Dusenberry <mwdusenb@us.ibm.com>

Closes #7761 from dusenberrymw/SPARK-6486_Add_BlockMatrix_to_PySpark and squashes the following commits:

27195c2 [Mike Dusenberry] Adding one more check to _convert_to_matrix_block_tuple, and a few minor documentation changes.
ae50883 [Mike Dusenberry] Minor update: BlockMatrix should inherit from DistributedMatrix.
b8acc1c [Mike Dusenberry] Moving BlockMatrix to pyspark.mllib.linalg.distributed, updating the logic to match that of the other distributed matrices, adding conversions, and adding documentation.
c014002 [Mike Dusenberry] Using properties for better documentation.
3bda6ab [Mike Dusenberry] Adding documentation.
8fb3095 [Mike Dusenberry] Small cleanup.
e17af2e [Mike Dusenberry] Adding BlockMatrix to PySpark.
2015-08-05 07:40:50 -07:00
Namit Katariya 1bf608b5ef [SPARK-9601] [DOCS] Fix JavaPairDStream signature for stream-stream and windowed join in streaming guide doc
Author: Namit Katariya <katariya.namit@gmail.com>

Closes #7935 from namitk/SPARK-9601 and squashes the following commits:

03b5784 [Namit Katariya] [SPARK-9601] Fix signature of JavaPairDStream for stream-stream and windowed join in streaming guide doc
2015-08-05 01:07:33 -07:00
Reynold Xin f7abd6bec9 Update docs/README.md to put all prereqs together.
This pull request groups all the prereq requirements into a single section.

cc srowen shivaram

Author: Reynold Xin <rxin@databricks.com>

Closes #7951 from rxin/readme-docs and squashes the following commits:

ab7ded0 [Reynold Xin] Updated docs/README.md to put all prereqs together.
2015-08-04 22:17:14 -07:00
Mike Dusenberry 571d5b5363 [SPARK-6485] [MLLIB] [PYTHON] Add CoordinateMatrix/RowMatrix/IndexedRowMatrix to PySpark.
This PR adds the RowMatrix, IndexedRowMatrix, and CoordinateMatrix distributed matrices to PySpark.  Each distributed matrix class acts as a wrapper around the Scala/Java counterpart by maintaining a reference to the Java object.  New distributed matrices can be created using factory methods added to DistributedMatrices, which creates the Java distributed matrix and then wraps it with the corresponding PySpark class.  This design allows for simple conversion between the various distributed matrices, and lets us re-use the Scala code.  Serialization between Python and Java is implemented using DataFrames as needed for IndexedRowMatrix and CoordinateMatrix for simplicity.  Associated documentation and unit-tests have also been added.  To facilitate code review, this PR implements access to the rows/entries as RDDs, the number of rows & columns, and conversions between the various distributed matrices (not including BlockMatrix), and does not implement the other linear algebra functions of the matrices, although this will be very simple to add now.

Author: Mike Dusenberry <mwdusenb@us.ibm.com>

Closes #7554 from dusenberrymw/SPARK-6485_Add_CoordinateMatrix_RowMatrix_IndexedMatrix_to_PySpark and squashes the following commits:

bb039cb [Mike Dusenberry] Minor documentation update.
b887c18 [Mike Dusenberry] Updating the matrix conversion logic again to make it even cleaner.  Now, we allow the 'rows' parameter in the constructors to be either an RDD or the Java matrix object. If 'rows' is an RDD, we create a Java matrix object, wrap it, and then store that.  If 'rows' is a Java matrix object of the correct type, we just wrap and store that directly.  This is only for internal usage, and publicly, we still require 'rows' to be an RDD.  We no longer store the 'rows' RDD, and instead just compute it from the Java object when needed.  The point of this is that when we do matrix conversions, we do the conversion on the Scala/Java side, which returns a Java object, so we should use that directly, but exposing 'java_matrix' parameter in the public API is not ideal. This non-public feature of allowing 'rows' to be a Java matrix object is documented in the '__init__' constructor docstrings, which are not part of the generated public API, and doctests are also included.
7f0dcb6 [Mike Dusenberry] Updating module docstring.
cfc1be5 [Mike Dusenberry] Use 'new SQLContext(matrix.rows.sparkContext)' rather than 'SQLContext.getOrCreate', as the later doesn't guarantee that the SparkContext will be the same as for the matrix.rows data.
687e345 [Mike Dusenberry] Improving conversion performance.  This adds an optional 'java_matrix' parameter to the constructors, and pulls the conversion logic out into a '_create_from_java' function. Now, if the constructors are given a valid Java distributed matrix object as 'java_matrix', they will store those internally, rather than create a new one on the Scala/Java side.
3e50b6e [Mike Dusenberry] Moving the distributed matrices to pyspark.mllib.linalg.distributed.
308f197 [Mike Dusenberry] Using properties for better documentation.
1633f86 [Mike Dusenberry] Minor documentation cleanup.
f0c13a7 [Mike Dusenberry] CoordinateMatrix should inherit from DistributedMatrix.
ffdd724 [Mike Dusenberry] Updating doctests to make documentation cleaner.
3fd4016 [Mike Dusenberry] Updating docstrings.
27cd5f6 [Mike Dusenberry] Simplifying input conversions in the constructors for each distributed matrix.
a409cf5 [Mike Dusenberry] Updating doctests to be less verbose by using lists instead of DenseVectors explicitly.
d19b0ba [Mike Dusenberry] Updating code and documentation to note that a vector-like object (numpy array, list, etc.) can be used in place of explicit Vector object, and adding conversions when necessary to RowMatrix construction.
4bd756d [Mike Dusenberry] Adding param documentation to IndexedRow and MatrixEntry.
c6bded5 [Mike Dusenberry] Move conversion logic from tuples to IndexedRow or MatrixEntry types from within the IndexedRowMatrix and CoordinateMatrix constructors to separate _convert_to_indexed_row and _convert_to_matrix_entry functions.
329638b [Mike Dusenberry] Moving the Experimental tag to the top of each docstring.
0be6826 [Mike Dusenberry] Simplifying doctests by removing duplicated rows/entries RDDs within the various tests.
c0900df [Mike Dusenberry] Adding the colons that were accidentally not inserted.
4ad6819 [Mike Dusenberry] Documenting the  and  parameters.
3b854b9 [Mike Dusenberry] Minor updates to documentation.
10046e8 [Mike Dusenberry] Updating documentation to use class constructors instead of the removed DistributedMatrices factory methods.
119018d [Mike Dusenberry] Adding static  methods to each of the distributed matrix classes to consolidate conversion logic.
4d7af86 [Mike Dusenberry] Adding type checks to the constructors.  Although it is slightly verbose, it is better for the user to have a good error message than a cryptic stacktrace.
93b6a3d [Mike Dusenberry] Pulling the DistributedMatrices Python class out of this pull request.
f6f3c68 [Mike Dusenberry] Pulling the DistributedMatrices Scala class out of this pull request.
6a3ecb7 [Mike Dusenberry] Updating pattern matching.
08f287b [Mike Dusenberry] Slight reformatting of the documentation.
a245dc0 [Mike Dusenberry] Updating Python doctests for compatability between Python 2 & 3. Since Python 3 removed the idea of a separate 'long' type, all values that would have been outputted as a 'long' (ex: '4L') will now be treated as an 'int' and outputed as one (ex: '4').  The doctests now explicitly convert to ints so that both Python 2 and 3 will have the same output.  This is fine since the values are all small, and thus can be easily represented as ints.
4d3a37e [Mike Dusenberry] Reformatting a few long Python doctest lines.
7e3ca16 [Mike Dusenberry] Fixing long lines.
f721ead [Mike Dusenberry] Updating documentation for each of the distributed matrices.
ab0e8b6 [Mike Dusenberry] Updating unit test to be more useful.
dda2f89 [Mike Dusenberry] Added wrappers for the conversions between the various distributed matrices.  Added logic to be able to access the rows/entries of the distributed matrices, which requires serialization through DataFrames for IndexedRowMatrix and CoordinateMatrix types. Added unit tests.
0cd7166 [Mike Dusenberry] Implemented the CoordinateMatrix API in PySpark, following the idea of the IndexedRowMatrix API, including using DataFrames for serialization.
3c369cb [Mike Dusenberry] Updating the architecture a bit to make conversions between the various distributed matrix types easier.  The different distributed matrix classes are now only wrappers around the Java objects, and take the Java object as an argument during construction.  This way, we can call  for example on an , which returns a reference to a Java RowMatrix object, and then construct a PySpark RowMatrix object wrapped around the Java object.  This is analogous to the behavior of PySpark RDDs and DataFrames.  We now delegate creation of the various distributed matrices from scratch in PySpark to the factory methods on .
4bdd09b [Mike Dusenberry] Implemented the IndexedRowMatrix API in PySpark, following the idea of the RowMatrix API.  Note that for the IndexedRowMatrix, we use DataFrames to serialize the data between Python and Scala/Java, so we accept PySpark RDDs, then convert to a DataFrame, then convert back to RDDs on the Scala/Java side before constructing the IndexedRowMatrix.
23bf1ec [Mike Dusenberry] Updating documentation to add PySpark RowMatrix. Inserting newline above doctest so that it renders properly in API docs.
b194623 [Mike Dusenberry] Updating design to have a PySpark RowMatrix simply create and keep a reference to a wrapper over a Java RowMatrix.  Updating DistributedMatrices factory methods to accept numRows and numCols with default values.  Updating PySpark DistributedMatrices factory method to simply create a PySpark RowMatrix. Adding additional doctests for numRows and numCols parameters.
bc2d220 [Mike Dusenberry] Adding unit tests for RowMatrix methods.
d7e316f [Mike Dusenberry] Implemented the RowMatrix API in PySpark by doing the following: Added a DistributedMatrices class to contain factory methods for creating the various distributed matrices.  Added a factory method for creating a RowMatrix from an RDD of Vectors.  Added a createRowMatrix function to the PythonMLlibAPI to interface with the factory method.  Added DistributedMatrix, DistributedMatrices, and RowMatrix classes to the pyspark.mllib.linalg api.
2015-08-04 16:30:03 -07:00
Sean Owen 0afa6fbf52 [SPARK-9521] [DOCS] Addendum. Require Maven 3.3.3+ in the build
Follow on for #7852: Building Spark doc needs to refer to new Maven requirement too

Author: Sean Owen <sowen@cloudera.com>

Closes #7905 from srowen/SPARK-9521.2 and squashes the following commits:

73285df [Sean Owen] Follow on for #7852: Building Spark doc needs to refer to new Maven requirement too
2015-08-04 13:48:22 +09:00
Shivaram Venkataraman 7abaaad5b1 Add a prerequisites section for building docs
This puts all the install commands that need to be run in one section instead of being spread over many paragraphs

cc rxin

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #7912 from shivaram/docs-setup-readme and squashes the following commits:

cf7a204 [Shivaram Venkataraman] Add a prerequisites section for building docs
2015-08-03 17:00:59 -07:00
Yanbo Liang 8ca287ebbd [SPARK-9191] [ML] [Doc] Add ml.PCA user guide and code examples
Add ml.PCA user guide document and code examples for Scala/Java/Python.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #7522 from yanboliang/ml-pca-md and squashes the following commits:

60dec05 [Yanbo Liang] address comments
f992abe [Yanbo Liang] Add ml.PCA doc and examples
2015-08-03 13:58:00 -07:00
Kousuke Saruta ba1c4e138d [SPARK-9558][DOCS]Update docs to follow the increase of memory defaults.
Now the memory defaults of master and slave in Standalone mode and History Server is 1g, not 512m. So let's update docs.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #7896 from sarutak/update-doc-for-daemon-memory and squashes the following commits:

a77626c [Kousuke Saruta] Fix docs to follow the update of increase of memory defaults
2015-08-03 12:53:44 -07:00
KaiXinXiaoLei 536d2adc12 [SPARK-9535][SQL][DOCS] Modify document for codegen.
#7142 made codegen enabled by default so let's modify the corresponding documents.

Closes #7142

Author: KaiXinXiaoLei <huleilei1@huawei.com>
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #7863 from sarutak/SPARK-9535 and squashes the following commits:

0884424 [Kousuke Saruta] Removed a line which mentioned about the effect of codegen enabled
3c11af0 [Kousuke Saruta] Merge branch 'sqlconfig' of https://github.com/KaiXinXiaoLei/spark into SPARK-9535
4ee531d [KaiXinXiaoLei] delete space
4cfd11d [KaiXinXiaoLei] change spark.sql.planner.externalSort
d624cf8 [KaiXinXiaoLei] sql config is wrong
2015-08-02 20:04:21 -07:00
Sean Owen 873ab0f969 [SPARK-9490] [DOCS] [MLLIB] MLlib evaluation metrics guide example python code uses deprecated print statement
Use print(x) not print x for Python 3 in eval examples
CC sethah mengxr -- just wanted to close this out before 1.5

Author: Sean Owen <sowen@cloudera.com>

Closes #7822 from srowen/SPARK-9490 and squashes the following commits:

01abeba [Sean Owen] Change "print x" to "print(x)" in the rest of the docs too
bd7f7fb [Sean Owen] Use print(x) not print x for Python 3 in eval examples
2015-07-31 13:45:28 -07:00
CodingCat c0686668ae [SPARK-9202] capping maximum number of executor&driver information kept in Worker
https://issues.apache.org/jira/browse/SPARK-9202

Author: CodingCat <zhunansjtu@gmail.com>

Closes #7714 from CodingCat/SPARK-9202 and squashes the following commits:

23977fb [CodingCat] add comments about why we don't synchronize finishedExecutors & finishedDrivers
dc9772d [CodingCat] addressing the comments
e125241 [CodingCat] stylistic fix
80bfe52 [CodingCat] fix JsonProtocolSuite
d7d9485 [CodingCat] styistic fix and respect insert ordering
031755f [CodingCat] add license info & stylistic fix
c3b5361 [CodingCat] test cases and docs
c557b3a [CodingCat] applications are fine
9cac751 [CodingCat] application is fine...
ad87ed7 [CodingCat] trimFinishedExecutorsAndDrivers
2015-07-31 20:27:00 +01:00
zsxwing 3afc1de89c [SPARK-8564] [STREAMING] Add the Python API for Kinesis
This PR adds the Python API for Kinesis, including a Python example and a simple unit test.

Author: zsxwing <zsxwing@gmail.com>

Closes #6955 from zsxwing/kinesis-python and squashes the following commits:

e42e471 [zsxwing] Merge branch 'master' into kinesis-python
455f7ea [zsxwing] Remove streaming_kinesis_asl_assembly module and simply add the source folder to streaming_kinesis_asl module
32e6451 [zsxwing] Merge remote-tracking branch 'origin/master' into kinesis-python
5082d28 [zsxwing] Fix the syntax error for Python 2.6
fca416b [zsxwing] Fix wrong comparison
96670ff [zsxwing] Fix the compilation error after merging master
756a128 [zsxwing] Merge branch 'master' into kinesis-python
6c37395 [zsxwing] Print stack trace for debug
7c5cfb0 [zsxwing] RUN_KINESIS_TESTS -> ENABLE_KINESIS_TESTS
cc9d071 [zsxwing] Fix the python test errors
466b425 [zsxwing] Add python tests for Kinesis
e33d505 [zsxwing] Merge remote-tracking branch 'origin/master' into kinesis-python
3da2601 [zsxwing] Fix the kinesis folder
687446b [zsxwing] Fix the error message and the maven output path
add2beb [zsxwing] Merge branch 'master' into kinesis-python
4957c0b [zsxwing] Add the Python API for Kinesis
2015-07-31 12:09:48 -07:00