ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Keiji Yoshida	623c675fde	Update streaming-programming-guide.md Update `See the Scala example` to `See the Java example`. Author: Keiji Yoshida <yoshida.keiji.84@gmail.com> Closes #8376 from yosssi/patch-1.	2015-08-23 11:04:29 +01:00
Keiji Yoshida	46fcb9e0db	Update programming-guide.md Update `lineLengths.persist();` to `lineLengths.persist(StorageLevel.MEMORY_ONLY());` because `JavaRDD#persist` needs a parameter of `StorageLevel`. Author: Keiji Yoshida <yoshida.keiji.84@gmail.com> Closes #8372 from yosssi/patch-1.	2015-08-22 02:38:10 -07:00
Xusen Yin	630a994e6a	[SPARK-9893] User guide with Java test suite for VectorSlicer Add user guide for `VectorSlicer`, with Java test suite and Python version VectorSlicer. Note that Python version does not support selecting by names now. Author: Xusen Yin <yinxusen@gmail.com> Closes #8267 from yinxusen/SPARK-9893.	2015-08-21 16:30:12 -07:00
Alexander Ulanov	dcfe0c5cde	[SPARK-9846] [DOCS] User guide for Multilayer Perceptron Classifier Added user guide for multilayer perceptron classifier: - Simplified description of the multilayer perceptron classifier - Example code for Scala and Java Author: Alexander Ulanov <nashb@yandex.ru> Closes #8262 from avulanov/SPARK-9846-mlpc-docs.	2015-08-20 20:02:27 -07:00
Eric Liang	8e0a072f78	[SPARK-9895] User Guide for RFormula Feature Transformer mengxr Author: Eric Liang <ekl@databricks.com> Closes #8293 from ericl/docs-2.	2015-08-19 15:43:08 -07:00
Marcelo Vanzin	5fd53c64bb	[SPARK-9833] [YARN] Add options to disable delegation token retrieval. This allows skipping the code that tries to talk to Hive and HBase to fetch delegation tokens, in case that somehow conflicts with the application being run. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8134 from vanzin/SPARK-9833.	2015-08-19 10:51:59 -07:00
Yanbo Liang	802b5b8791	[SPARK-10084] [MLLIB] [DOC] Add Python example for mllib FP-growth user guide 1, Add Python example for mllib FP-growth user guide. 2, Correct mistakes of Scala and Java examples. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8279 from yanboliang/spark-10084.	2015-08-19 08:53:34 -07:00
Joseph K. Bradley	39e4ebd521	[SPARK-10060] [ML] [DOC] spark.ml DecisionTree user guide New user guide section ml-decision-tree.md, including code examples. I have run all examples, including the Java ones. CC: manishamde yanboliang mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #8244 from jkbradley/ml-dt-docs.	2015-08-19 07:38:27 -07:00
lewuathe	ba2a07e2b6	[SPARK-9977] [DOCS] Update documentation for StringIndexer By using `StringIndexer`, we can obtain indexed label on new column. So a following estimator should use this new column through pipeline if it wants to use string indexed label. I think it is better to make it explicit on documentation. Author: lewuathe <lewuathe@me.com> Closes #8205 from Lewuathe/SPARK-9977.	2015-08-19 09:54:03 +01:00
Sean Owen	f141efeafb	[SPARK-10070] [DOCS] Remove Guava dependencies in user guides `Lists.newArrayList` -> `Arrays.asList` CC jkbradley feynmanliang Anybody into replacing usages of `Lists.newArrayList` in the examples / source code too? this method isn't useful in Java 7 and beyond. Author: Sean Owen <sowen@cloudera.com> Closes #8272 from srowen/SPARK-10070.	2015-08-19 09:41:09 +01:00
Bill Chambers	b23c4d3ffc	Fix Broken Link Link was broken because it included tick marks. Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #8302 from anabranch/patch-1.	2015-08-19 00:05:01 -07:00
Alexander Ulanov	1c843e2848	[SPARK-9508] GraphX Pregel docs update with new Pregel code SPARK-9436 simplifies the Pregel code. graphx-programming-guide needs to be modified accordingly since it lists the old Pregel code Author: Alexander Ulanov <nashb@yandex.ru> Closes #7831 from avulanov/SPARK-9508-pregel-doc2.	2015-08-18 22:13:52 -07:00
Davies Liu	de3223872a	[SPARK-9705] [DOC] fix docs about Python version cc JoshRosen Author: Davies Liu <davies@databricks.com> Closes #8245 from davies/python_doc.	2015-08-18 22:11:27 -07:00
Feynman Liang	badf7fa650	[SPARK-8473] [SPARK-9889] [ML] User guide and example code for DCT mengxr jkbradley Author: Feynman Liang <fliang@databricks.com> Closes #8184 from feynmanliang/SPARK-9889-DCT-docs.	2015-08-18 17:54:49 -07:00
Dennis Huo	9b731fad2b	[SPARK-9782] [YARN] Support YARN application tags via SparkConf Add a new test case in yarn/ClientSuite which checks how the various SparkConf and ClientArguments propagate into the ApplicationSubmissionContext. Author: Dennis Huo <dhuo@google.com> Closes #8072 from dennishuo/dhuo-yarn-application-tags.	2015-08-18 14:34:20 -07:00
Piotr Migdal	8bae9015b7	[SPARK-10085] [MLLIB] [DOCS] removed unnecessary numpy array import See https://issues.apache.org/jira/browse/SPARK-10085 Author: Piotr Migdal <pmigdal@gmail.com> Closes #8284 from stared/spark-10085.	2015-08-18 12:59:28 -07:00
Yanbo Liang	747c2ba800	[SPARK-10032] [PYSPARK] [DOC] Add Python example for mllib LDAModel user guide Add Python example for mllib LDAModel user guide Author: Yanbo Liang <ybliang8@gmail.com> Closes #8227 from yanboliang/spark-10032.	2015-08-18 12:56:36 -07:00
Yanbo Liang	f4fa61effe	[SPARK-10029] [MLLIB] [DOC] Add Python examples for mllib IsotonicRegression user guide Add Python examples for mllib IsotonicRegression user guide Author: Yanbo Liang <ybliang8@gmail.com> Closes #8225 from yanboliang/spark-10029.	2015-08-18 12:55:36 -07:00
Feynman Liang	f5ea391290	[SPARK-9900] [MLLIB] User guide for Association Rules Updates FPM user guide to include Association Rules. Author: Feynman Liang <fliang@databricks.com> Closes #8207 from feynmanliang/SPARK-9900-arules.	2015-08-18 12:53:57 -07:00
jose.cambronero	c90c605dc6	[SPARK-9902] [MLLIB] Add Java and Python examples to user guide for 1-sample KS test added doc examples for python. Author: jose.cambronero <jose.cambronero@cloudera.com> Closes #8154 from josepablocam/spark_9902.	2015-08-17 19:09:45 -07:00
Sandy Ryza	f9d1a92aa1	[SPARK-7707] User guide and example code for KernelDensity Author: Sandy Ryza <sandy@cloudera.com> Closes #8230 from sryza/sandy-spark-7707.	2015-08-17 17:57:51 -07:00
Feynman Liang	0b6b017613	[SPARK-9898] [MLLIB] Prefix Span user guide Adds user guide for `PrefixSpan`, including Scala and Java example code. mengxr zhangjiajin Author: Feynman Liang <fliang@databricks.com> Closes #8253 from feynmanliang/SPARK-9898.	2015-08-17 17:53:24 -07:00
Yanbo Liang	0076e82123	[SPARK-9768] [PYSPARK] [ML] Add Python API and user guide for ml.feature.ElementwiseProduct Add Python API, user guide and example for ml.feature.ElementwiseProduct. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8061 from yanboliang/SPARK-9768.	2015-08-17 17:25:41 -07:00
Feynman Liang	fdaf17f63f	[SPARK-10068] [MLLIB] Adds links to MLlib types, algos, utilities listing mengxr jkbradley Author: Feynman Liang <fliang@databricks.com> Closes #8255 from feynmanliang/SPARK-10068.	2015-08-17 15:42:14 -07:00
Reynold Xin	e5fd60415f	[SPARK-9934] Deprecate NIO ConnectionManager. Deprecate NIO ConnectionManager in Spark 1.5.0, before removing it in Spark 1.6.0. Author: Reynold Xin <rxin@databricks.com> Closes #8162 from rxin/SPARK-9934.	2015-08-14 20:55:32 -07:00
Rosstin	7a539ef3b1	[SPARK-8965] [DOCS] Add ml-guide Python Example: Estimator, Transformer, and Param Added ml-guide Python Example: Estimator, Transformer, and Param /docs/_site/ml-guide.html Author: Rosstin <asterazul@gmail.com> Closes #8081 from Rosstin/SPARK-8965.	2015-08-13 09:18:39 -07:00
Niranjan Padmanabhan	738f353988	[SPARK-9092] Fixed incompatibility when both num-executors and dynamic... … allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext. Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com> Closes #7657 from neurons/SPARK-9092.	2015-08-12 16:10:21 -07:00
Yuhao Yang	66d87c1d76	[SPARK-7583] [MLLIB] User guide update for RegexTokenizer jira: https://issues.apache.org/jira/browse/SPARK-7583 User guide update for RegexTokenizer Author: Yuhao Yang <hhbyyh@gmail.com> Closes #7828 from hhbyyh/regexTokenizerDoc.	2015-08-12 09:35:32 -07:00
Timothy Chen	741a29f989	[SPARK-9575] [MESOS] Add docuemntation around Mesos shuffle service. andrewor14 Author: Timothy Chen <tnachen@gmail.com> Closes #7907 from tnachen/mesos_shuffle.	2015-08-11 23:33:22 -07:00
Timothy Chen	5c99d8bf98	[SPARK-8798] [MESOS] Allow additional uris to be fetched with mesos Some users like to download additional files in their sandbox that they can refer to from their spark program, or even later mount these files to another directory. Author: Timothy Chen <tnachen@gmail.com> Closes #7195 from tnachen/mesos_files.	2015-08-11 23:26:33 -07:00
Eric Liang	74a293f453	[SPARK-9713] [ML] Document SparkR MLlib glm() integration in Spark 1.5 This documents the use of R model formulae in the SparkR guide. Also fixes some bugs in the R api doc. mengxr Author: Eric Liang <ekl@databricks.com> Closes #8085 from ericl/docs.	2015-08-11 21:26:03 -07:00
Prabeesh K	853809e948	[SPARK-5155] [PYSPARK] [STREAMING] Mqtt streaming support in Python This PR is based on #4229, thanks prabeesh. Closes #4229 Author: Prabeesh K <prabsmails@gmail.com> Author: zsxwing <zsxwing@gmail.com> Author: prabs <prabsmails@gmail.com> Author: Prabeesh K <prabeesh.k@namshi.com> Closes #7833 from zsxwing/pr4229 and squashes the following commits: 9570bec [zsxwing] Fix the variable name and check null in finally 4a9c79e [zsxwing] Fix pom.xml indentation abf5f18 [zsxwing] Merge branch 'master' into pr4229 935615c [zsxwing] Fix the flaky MQTT tests 47278c5 [zsxwing] Include the project class files 478f844 [zsxwing] Add unpack 5f8a1d4 [zsxwing] Make the maven build generate the test jar for Python MQTT tests 734db99 [zsxwing] Merge branch 'master' into pr4229 126608a [Prabeesh K] address the comments b90b709 [Prabeesh K] Merge pull request #1 from zsxwing/pr4229 d07f454 [zsxwing] Register StreamingListerner before starting StreamingContext; Revert unncessary changes; fix the python unit test a6747cb [Prabeesh K] wait for starting the receiver before publishing data 87fc677 [Prabeesh K] address the comments: 97244ec [zsxwing] Make sbt build the assembly test jar for streaming mqtt 80474d1 [Prabeesh K] fix 1f0cfe9 [Prabeesh K] python style fix e1ee016 [Prabeesh K] scala style fix a5a8f9f [Prabeesh K] added Python test 9767d82 [Prabeesh K] implemented Python-friendly class a11968b [Prabeesh K] fixed python style 795ec27 [Prabeesh K] address comments ee387ae [Prabeesh K] Fix assembly jar location of mqtt-assembly 3f4df12 [Prabeesh K] updated version b34c3c1 [prabs] adress comments 3aa7fff [prabs] Added Python streaming mqtt word count example b7d42ff [prabs] Mqtt streaming support in Python	2015-08-10 16:33:23 -07:00
Mahmoud Lababidi	d285212756	Fixed AtmoicReference<> Example Author: Mahmoud Lababidi <lababidi@gmail.com> Closes #8076 from lababidi/master and squashes the following commits: af4553b [Mahmoud Lababidi] Fixed AtmoicReference<> Example	2015-08-10 13:02:01 -07:00
Jeff Zhang	fe12277b40	Fix doc typo Straightforward fix on doc typo Author: Jeff Zhang <zjffdu@apache.org> Closes #8019 from zjffdu/master and squashes the following commits: aed6e64 [Jeff Zhang] Fix doc typo	2015-08-06 21:03:47 -07:00
Davies Liu	17284db314	[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing. cc marmbrus rxin Author: Davies Liu <davies@databricks.com> Closes #7998 from davies/tungsten and squashes the following commits: c1c16da [Davies Liu] update doc 1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe (cherry picked from commit `4e70e8256c`) Signed-off-by: Reynold Xin <rxin@databricks.com>	2015-08-06 19:42:02 -07:00
Davies Liu	49b1504fe3	Revert "[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe" This reverts commit `4e70e8256c`.	2015-08-06 17:36:12 -07:00
Davies Liu	4e70e8256c	[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen/unsafe spark.sql.tungsten.enabled will be the default value for both codegen and unsafe, they are kept internally for debug/testing. cc marmbrus rxin Author: Davies Liu <davies@databricks.com> Closes #7998 from davies/tungsten and squashes the following commits: c1c16da [Davies Liu] update doc 1a47be1 [Davies Liu] use tungsten.enabled for both of codegen/unsafe	2015-08-06 17:30:31 -07:00
Sean Owen	0d7aac99da	[SPARK-9641] [DOCS] spark.shuffle.service.port is not documented Document spark.shuffle.service.{enabled,port} CC sryza tgravescs This is pretty minimal; is there more to say here about the service? Author: Sean Owen <sowen@cloudera.com> Closes #7991 from srowen/SPARK-9641 and squashes the following commits: 3bb946e [Sean Owen] Add link to docs for setup and config of external shuffle service 2302e01 [Sean Owen] Document spark.shuffle.service.{enabled,port}	2015-08-06 19:29:42 +01:00
Mike Dusenberry	34dcf10104	[SPARK-6486] [MLLIB] [PYTHON] Add BlockMatrix to PySpark. mengxr This adds the `BlockMatrix` to PySpark. I have the conversions to `IndexedRowMatrix` and `CoordinateMatrix` ready as well, so once PR #7554 is completed (which relies on PR #7746), this PR can be finished. Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes #7761 from dusenberrymw/SPARK-6486_Add_BlockMatrix_to_PySpark and squashes the following commits: 27195c2 [Mike Dusenberry] Adding one more check to _convert_to_matrix_block_tuple, and a few minor documentation changes. ae50883 [Mike Dusenberry] Minor update: BlockMatrix should inherit from DistributedMatrix. b8acc1c [Mike Dusenberry] Moving BlockMatrix to pyspark.mllib.linalg.distributed, updating the logic to match that of the other distributed matrices, adding conversions, and adding documentation. c014002 [Mike Dusenberry] Using properties for better documentation. 3bda6ab [Mike Dusenberry] Adding documentation. 8fb3095 [Mike Dusenberry] Small cleanup. e17af2e [Mike Dusenberry] Adding BlockMatrix to PySpark.	2015-08-05 07:40:50 -07:00
Namit Katariya	1bf608b5ef	[SPARK-9601] [DOCS] Fix JavaPairDStream signature for stream-stream and windowed join in streaming guide doc Author: Namit Katariya <katariya.namit@gmail.com> Closes #7935 from namitk/SPARK-9601 and squashes the following commits: 03b5784 [Namit Katariya] [SPARK-9601] Fix signature of JavaPairDStream for stream-stream and windowed join in streaming guide doc	2015-08-05 01:07:33 -07:00
Reynold Xin	f7abd6bec9	Update docs/README.md to put all prereqs together. This pull request groups all the prereq requirements into a single section. cc srowen shivaram Author: Reynold Xin <rxin@databricks.com> Closes #7951 from rxin/readme-docs and squashes the following commits: ab7ded0 [Reynold Xin] Updated docs/README.md to put all prereqs together.	2015-08-04 22:17:14 -07:00
Mike Dusenberry	571d5b5363	[SPARK-6485] [MLLIB] [PYTHON] Add CoordinateMatrix/RowMatrix/IndexedRowMatrix to PySpark. This PR adds the RowMatrix, IndexedRowMatrix, and CoordinateMatrix distributed matrices to PySpark. Each distributed matrix class acts as a wrapper around the Scala/Java counterpart by maintaining a reference to the Java object. New distributed matrices can be created using factory methods added to DistributedMatrices, which creates the Java distributed matrix and then wraps it with the corresponding PySpark class. This design allows for simple conversion between the various distributed matrices, and lets us re-use the Scala code. Serialization between Python and Java is implemented using DataFrames as needed for IndexedRowMatrix and CoordinateMatrix for simplicity. Associated documentation and unit-tests have also been added. To facilitate code review, this PR implements access to the rows/entries as RDDs, the number of rows & columns, and conversions between the various distributed matrices (not including BlockMatrix), and does not implement the other linear algebra functions of the matrices, although this will be very simple to add now. Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes #7554 from dusenberrymw/SPARK-6485_Add_CoordinateMatrix_RowMatrix_IndexedMatrix_to_PySpark and squashes the following commits: bb039cb [Mike Dusenberry] Minor documentation update. b887c18 [Mike Dusenberry] Updating the matrix conversion logic again to make it even cleaner. Now, we allow the 'rows' parameter in the constructors to be either an RDD or the Java matrix object. If 'rows' is an RDD, we create a Java matrix object, wrap it, and then store that. If 'rows' is a Java matrix object of the correct type, we just wrap and store that directly. This is only for internal usage, and publicly, we still require 'rows' to be an RDD. We no longer store the 'rows' RDD, and instead just compute it from the Java object when needed. The point of this is that when we do matrix conversions, we do the conversion on the Scala/Java side, which returns a Java object, so we should use that directly, but exposing 'java_matrix' parameter in the public API is not ideal. This non-public feature of allowing 'rows' to be a Java matrix object is documented in the '__init__' constructor docstrings, which are not part of the generated public API, and doctests are also included. 7f0dcb6 [Mike Dusenberry] Updating module docstring. cfc1be5 [Mike Dusenberry] Use 'new SQLContext(matrix.rows.sparkContext)' rather than 'SQLContext.getOrCreate', as the later doesn't guarantee that the SparkContext will be the same as for the matrix.rows data. 687e345 [Mike Dusenberry] Improving conversion performance. This adds an optional 'java_matrix' parameter to the constructors, and pulls the conversion logic out into a '_create_from_java' function. Now, if the constructors are given a valid Java distributed matrix object as 'java_matrix', they will store those internally, rather than create a new one on the Scala/Java side. 3e50b6e [Mike Dusenberry] Moving the distributed matrices to pyspark.mllib.linalg.distributed. 308f197 [Mike Dusenberry] Using properties for better documentation. 1633f86 [Mike Dusenberry] Minor documentation cleanup. f0c13a7 [Mike Dusenberry] CoordinateMatrix should inherit from DistributedMatrix. ffdd724 [Mike Dusenberry] Updating doctests to make documentation cleaner. 3fd4016 [Mike Dusenberry] Updating docstrings. 27cd5f6 [Mike Dusenberry] Simplifying input conversions in the constructors for each distributed matrix. a409cf5 [Mike Dusenberry] Updating doctests to be less verbose by using lists instead of DenseVectors explicitly. d19b0ba [Mike Dusenberry] Updating code and documentation to note that a vector-like object (numpy array, list, etc.) can be used in place of explicit Vector object, and adding conversions when necessary to RowMatrix construction. 4bd756d [Mike Dusenberry] Adding param documentation to IndexedRow and MatrixEntry. c6bded5 [Mike Dusenberry] Move conversion logic from tuples to IndexedRow or MatrixEntry types from within the IndexedRowMatrix and CoordinateMatrix constructors to separate _convert_to_indexed_row and _convert_to_matrix_entry functions. 329638b [Mike Dusenberry] Moving the Experimental tag to the top of each docstring. 0be6826 [Mike Dusenberry] Simplifying doctests by removing duplicated rows/entries RDDs within the various tests. c0900df [Mike Dusenberry] Adding the colons that were accidentally not inserted. 4ad6819 [Mike Dusenberry] Documenting the and parameters. 3b854b9 [Mike Dusenberry] Minor updates to documentation. 10046e8 [Mike Dusenberry] Updating documentation to use class constructors instead of the removed DistributedMatrices factory methods. 119018d [Mike Dusenberry] Adding static methods to each of the distributed matrix classes to consolidate conversion logic. 4d7af86 [Mike Dusenberry] Adding type checks to the constructors. Although it is slightly verbose, it is better for the user to have a good error message than a cryptic stacktrace. 93b6a3d [Mike Dusenberry] Pulling the DistributedMatrices Python class out of this pull request. f6f3c68 [Mike Dusenberry] Pulling the DistributedMatrices Scala class out of this pull request. 6a3ecb7 [Mike Dusenberry] Updating pattern matching. 08f287b [Mike Dusenberry] Slight reformatting of the documentation. a245dc0 [Mike Dusenberry] Updating Python doctests for compatability between Python 2 & 3. Since Python 3 removed the idea of a separate 'long' type, all values that would have been outputted as a 'long' (ex: '4L') will now be treated as an 'int' and outputed as one (ex: '4'). The doctests now explicitly convert to ints so that both Python 2 and 3 will have the same output. This is fine since the values are all small, and thus can be easily represented as ints. 4d3a37e [Mike Dusenberry] Reformatting a few long Python doctest lines. 7e3ca16 [Mike Dusenberry] Fixing long lines. f721ead [Mike Dusenberry] Updating documentation for each of the distributed matrices. ab0e8b6 [Mike Dusenberry] Updating unit test to be more useful. dda2f89 [Mike Dusenberry] Added wrappers for the conversions between the various distributed matrices. Added logic to be able to access the rows/entries of the distributed matrices, which requires serialization through DataFrames for IndexedRowMatrix and CoordinateMatrix types. Added unit tests. 0cd7166 [Mike Dusenberry] Implemented the CoordinateMatrix API in PySpark, following the idea of the IndexedRowMatrix API, including using DataFrames for serialization. 3c369cb [Mike Dusenberry] Updating the architecture a bit to make conversions between the various distributed matrix types easier. The different distributed matrix classes are now only wrappers around the Java objects, and take the Java object as an argument during construction. This way, we can call for example on an , which returns a reference to a Java RowMatrix object, and then construct a PySpark RowMatrix object wrapped around the Java object. This is analogous to the behavior of PySpark RDDs and DataFrames. We now delegate creation of the various distributed matrices from scratch in PySpark to the factory methods on . 4bdd09b [Mike Dusenberry] Implemented the IndexedRowMatrix API in PySpark, following the idea of the RowMatrix API. Note that for the IndexedRowMatrix, we use DataFrames to serialize the data between Python and Scala/Java, so we accept PySpark RDDs, then convert to a DataFrame, then convert back to RDDs on the Scala/Java side before constructing the IndexedRowMatrix. 23bf1ec [Mike Dusenberry] Updating documentation to add PySpark RowMatrix. Inserting newline above doctest so that it renders properly in API docs. b194623 [Mike Dusenberry] Updating design to have a PySpark RowMatrix simply create and keep a reference to a wrapper over a Java RowMatrix. Updating DistributedMatrices factory methods to accept numRows and numCols with default values. Updating PySpark DistributedMatrices factory method to simply create a PySpark RowMatrix. Adding additional doctests for numRows and numCols parameters. bc2d220 [Mike Dusenberry] Adding unit tests for RowMatrix methods. d7e316f [Mike Dusenberry] Implemented the RowMatrix API in PySpark by doing the following: Added a DistributedMatrices class to contain factory methods for creating the various distributed matrices. Added a factory method for creating a RowMatrix from an RDD of Vectors. Added a createRowMatrix function to the PythonMLlibAPI to interface with the factory method. Added DistributedMatrix, DistributedMatrices, and RowMatrix classes to the pyspark.mllib.linalg api.	2015-08-04 16:30:03 -07:00
Sean Owen	0afa6fbf52	[SPARK-9521] [DOCS] Addendum. Require Maven 3.3.3+ in the build Follow on for #7852: Building Spark doc needs to refer to new Maven requirement too Author: Sean Owen <sowen@cloudera.com> Closes #7905 from srowen/SPARK-9521.2 and squashes the following commits: 73285df [Sean Owen] Follow on for #7852: Building Spark doc needs to refer to new Maven requirement too	2015-08-04 13:48:22 +09:00
Shivaram Venkataraman	7abaaad5b1	Add a prerequisites section for building docs This puts all the install commands that need to be run in one section instead of being spread over many paragraphs cc rxin Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #7912 from shivaram/docs-setup-readme and squashes the following commits: cf7a204 [Shivaram Venkataraman] Add a prerequisites section for building docs	2015-08-03 17:00:59 -07:00
Yanbo Liang	8ca287ebbd	[SPARK-9191] [ML] [Doc] Add ml.PCA user guide and code examples Add ml.PCA user guide document and code examples for Scala/Java/Python. Author: Yanbo Liang <ybliang8@gmail.com> Closes #7522 from yanboliang/ml-pca-md and squashes the following commits: 60dec05 [Yanbo Liang] address comments f992abe [Yanbo Liang] Add ml.PCA doc and examples	2015-08-03 13:58:00 -07:00
Kousuke Saruta	ba1c4e138d	[SPARK-9558][DOCS]Update docs to follow the increase of memory defaults. Now the memory defaults of master and slave in Standalone mode and History Server is 1g, not 512m. So let's update docs. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #7896 from sarutak/update-doc-for-daemon-memory and squashes the following commits: a77626c [Kousuke Saruta] Fix docs to follow the update of increase of memory defaults	2015-08-03 12:53:44 -07:00
KaiXinXiaoLei	536d2adc12	[SPARK-9535][SQL][DOCS] Modify document for codegen. #7142 made codegen enabled by default so let's modify the corresponding documents. Closes #7142 Author: KaiXinXiaoLei <huleilei1@huawei.com> Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #7863 from sarutak/SPARK-9535 and squashes the following commits: 0884424 [Kousuke Saruta] Removed a line which mentioned about the effect of codegen enabled 3c11af0 [Kousuke Saruta] Merge branch 'sqlconfig' of https://github.com/KaiXinXiaoLei/spark into SPARK-9535 4ee531d [KaiXinXiaoLei] delete space 4cfd11d [KaiXinXiaoLei] change spark.sql.planner.externalSort d624cf8 [KaiXinXiaoLei] sql config is wrong	2015-08-02 20:04:21 -07:00
Sean Owen	873ab0f969	[SPARK-9490] [DOCS] [MLLIB] MLlib evaluation metrics guide example python code uses deprecated print statement Use print(x) not print x for Python 3 in eval examples CC sethah mengxr -- just wanted to close this out before 1.5 Author: Sean Owen <sowen@cloudera.com> Closes #7822 from srowen/SPARK-9490 and squashes the following commits: 01abeba [Sean Owen] Change "print x" to "print(x)" in the rest of the docs too bd7f7fb [Sean Owen] Use print(x) not print x for Python 3 in eval examples	2015-07-31 13:45:28 -07:00
CodingCat	c0686668ae	[SPARK-9202] capping maximum number of executor&driver information kept in Worker https://issues.apache.org/jira/browse/SPARK-9202 Author: CodingCat <zhunansjtu@gmail.com> Closes #7714 from CodingCat/SPARK-9202 and squashes the following commits: 23977fb [CodingCat] add comments about why we don't synchronize finishedExecutors & finishedDrivers dc9772d [CodingCat] addressing the comments e125241 [CodingCat] stylistic fix 80bfe52 [CodingCat] fix JsonProtocolSuite d7d9485 [CodingCat] styistic fix and respect insert ordering 031755f [CodingCat] add license info & stylistic fix c3b5361 [CodingCat] test cases and docs c557b3a [CodingCat] applications are fine 9cac751 [CodingCat] application is fine... ad87ed7 [CodingCat] trimFinishedExecutorsAndDrivers	2015-07-31 20:27:00 +01:00
zsxwing	3afc1de89c	[SPARK-8564] [STREAMING] Add the Python API for Kinesis This PR adds the Python API for Kinesis, including a Python example and a simple unit test. Author: zsxwing <zsxwing@gmail.com> Closes #6955 from zsxwing/kinesis-python and squashes the following commits: e42e471 [zsxwing] Merge branch 'master' into kinesis-python 455f7ea [zsxwing] Remove streaming_kinesis_asl_assembly module and simply add the source folder to streaming_kinesis_asl module 32e6451 [zsxwing] Merge remote-tracking branch 'origin/master' into kinesis-python 5082d28 [zsxwing] Fix the syntax error for Python 2.6 fca416b [zsxwing] Fix wrong comparison 96670ff [zsxwing] Fix the compilation error after merging master 756a128 [zsxwing] Merge branch 'master' into kinesis-python 6c37395 [zsxwing] Print stack trace for debug 7c5cfb0 [zsxwing] RUN_KINESIS_TESTS -> ENABLE_KINESIS_TESTS cc9d071 [zsxwing] Fix the python test errors 466b425 [zsxwing] Add python tests for Kinesis e33d505 [zsxwing] Merge remote-tracking branch 'origin/master' into kinesis-python 3da2601 [zsxwing] Fix the kinesis folder 687446b [zsxwing] Fix the error message and the maven output path add2beb [zsxwing] Merge branch 'master' into kinesis-python 4957c0b [zsxwing] Add the Python API for Kinesis	2015-07-31 12:09:48 -07:00

1 2 3 4 5 ...

1225 commits