Commit graph

980 commits

Author SHA1 Message Date
Yanbo Liang f92b7b98e9 [SPARK-11367][ML][PYSPARK] Python LinearRegression should support setting solver
[SPARK-10668](https://issues.apache.org/jira/browse/SPARK-10668) has provided ```WeightedLeastSquares``` solver("normal") in ```LinearRegression``` with L2 regularization in Scala and R, Python ML ```LinearRegression``` should also support setting solver("auto", "normal", "l-bfgs")

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9328 from yanboliang/spark-11367.
2015-10-28 08:54:20 -07:00
Sean Owen 826e1e304b [SPARK-11302][MLLIB] 2) Multivariate Gaussian Model with Covariance matrix returns incorrect answer in some cases
Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related Python mixture model test.

Supersedes https://github.com/apache/spark/pull/9293

Author: Sean Owen <sowen@cloudera.com>

Closes #9309 from srowen/SPARK-11302.2.
2015-10-27 23:07:37 -07:00
vectorijk 9dba5fb2b5 [SPARK-10024][PYSPARK] Python API RF and GBT related params clear up
implement {RandomForest, GBT, TreeEnsemble, TreeClassifier, TreeRegressor}Params for Python API
in pyspark/ml/{classification, regression}.py

Author: vectorijk <jiangkai@gmail.com>

Closes #9233 from vectorijk/spark-10024.
2015-10-27 13:55:03 -07:00
Mike Dusenberry 3bdbbc6c97 [SPARK-6488][MLLIB][PYTHON] Support addition/multiplication in PySpark's BlockMatrix
This PR adds addition and multiplication to PySpark's `BlockMatrix` class via `add` and `multiply` functions.

Author: Mike Dusenberry <mwdusenb@us.ibm.com>

Closes #9139 from dusenberrymw/SPARK-6488_Add_Addition_and_Multiplication_to_PySpark_BlockMatrix.
2015-10-27 11:05:14 -07:00
Nick Evans 8f888eea1a [SPARK-11270][STREAMING] Add improved equality testing for TopicAndPartition from the Kafka Streaming API
jerryshao tdas

I know this is kind of minor, and I know you all are busy, but this brings this class in line with the `OffsetRange` class, and makes tests a little more concise.

Instead of doing something like:
```
assert topic_and_partition_instance._topic == "foo"
assert topic_and_partition_instance._partition == 0
```

You can do something like:
```
assert topic_and_partition_instance == TopicAndPartition("foo", 0)
```

Before:
```
>>> from pyspark.streaming.kafka import TopicAndPartition
>>> TopicAndPartition("foo", 0) == TopicAndPartition("foo", 0)
False
```

After:
```
>>> from pyspark.streaming.kafka import TopicAndPartition
>>> TopicAndPartition("foo", 0) == TopicAndPartition("foo", 0)
True
```

I couldn't find any tests - am I missing something?

Author: Nick Evans <me@nicolasevans.org>

Closes #9236 from manygrams/topic_and_partition_equality.
2015-10-27 01:29:06 -07:00
noelsmith 5d4f6abec4 [SPARK-10271][PYSPARK][MLLIB] Added @since tags to pyspark.mllib.clustering
Duplicated the since decorator from pyspark.sql into pyspark (also tweaked to handle functions without docstrings).

Added since to methods + "versionadded::" to classes (derived from the git file history in pyspark).

Author: noelsmith <mail@noelsmith.com>

Closes #8627 from noel-smith/SPARK-10271-since-mllib-clustering.
2015-10-26 21:28:18 -07:00
Jeff Zhang 05c4bdb579 [SPARK-11279][PYSPARK] Add DataFrame#toDF in PySpark
Author: Jeff Zhang <zjffdu@apache.org>

Closes #9248 from zjffdu/SPARK-11279.
2015-10-26 09:25:19 +01:00
Yu ISHIKAWA 282a15f78e [SPARK-10277] [MLLIB] [PYSPARK] Add @since annotation to pyspark.mllib.regression
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8684 from yu-iskw/SPARK-10277.
2015-10-23 08:43:49 -07:00
Gábor Lipták 163d53e829 [SPARK-7021] Add JUnit output for Python unit tests
WIP

Author: Gábor Lipták <gliptak@gmail.com>

Closes #8323 from gliptak/SPARK-7021.
2015-10-22 15:27:11 -07:00
Jeff Zhang 5cdea7d1e5 [SPARK-11205][PYSPARK] Delegate to scala DataFrame API rather than p…
…rint in python

No test needed. Verify it manually in pyspark shell

Author: Jeff Zhang <zjffdu@apache.org>

Closes #9177 from zjffdu/SPARK-11205.
2015-10-20 23:58:27 -07:00
Xiangrui Meng 135ade9050 [MINOR][ML] fix doc warnings
Without an empty line, sphinx will treat doctest as docstring. holdenk

~~~
/Users/meng/src/spark/python/pyspark/ml/feature.py:docstring of pyspark.ml.feature.CountVectorizer:3: ERROR: Undefined substitution referenced: "label|raw |vectors | +-----+---------------+-------------------------+ |0 |[a, b, c] |(3,[0,1,2],[1.0,1.0,1.0])".
/Users/meng/src/spark/python/pyspark/ml/feature.py:docstring of pyspark.ml.feature.CountVectorizer:3: ERROR: Undefined substitution referenced: "1 |[a, b, b, c, a]|(3,[0,1,2],[2.0,2.0,1.0])".
~~~

Author: Xiangrui Meng <meng@databricks.com>

Closes #9188 from mengxr/py-count-vec-doc-fix.
2015-10-20 18:38:06 -07:00
Holden Karau aea7142c98 [SPARK-10767][PYSPARK] Make pyspark shared params codegen more consistent
Namely "." shows up in some places in the template when using the param docstring and not in others

Author: Holden Karau <holden@pigscanfly.ca>

Closes #9017 from holdenk/SPARK-10767-Make-pyspark-shared-params-codegen-more-consistent.
2015-10-20 16:51:32 -07:00
noelsmith 04521ea067 [SPARK-10269][PYSPARK][MLLIB] Add @since annotation to pyspark.mllib.classification
Duplicated the since decorator from pyspark.sql into pyspark (also tweaked to handle functions without docstrings).

Added since to methods + "versionadded::" to classes derived from the file history.

Note - some methods are inherited from the regression module (i.e. LinearModel.intercept) so these won't have version numbers in the API docs until that model is updated.

Author: noelsmith <mail@noelsmith.com>

Closes #8626 from noel-smith/SPARK-10269-since-mlib-classification.
2015-10-20 16:14:20 -07:00
noelsmith 82e9d9c81b [SPARK-10272][PYSPARK][MLLIB] Added @since tags to pyspark.mllib.evaluation
Duplicated the since decorator from pyspark.sql into pyspark (also tweaked to handle functions without docstrings).

Added since to public methods + "versionadded::" to classes (derived from the git file history in pyspark).

Note - I added also the tags to MultilabelMetrics even though it isn't declared as public in the __all__ statement... if that's incorrect - I'll remove.

Author: noelsmith <mail@noelsmith.com>

Closes #8628 from noel-smith/SPARK-10272-since-mllib-evalutation.
2015-10-20 15:05:02 -07:00
Holden Karau e18b571c33 [SPARK-10447][SPARK-3842][PYSPARK] upgrade pyspark to py4j0.9
Upgrade to Py4j0.9

Author: Holden Karau <holden@pigscanfly.ca>
Author: Holden Karau <holden@us.ibm.com>

Closes #8615 from holdenk/SPARK-10447-upgrade-pyspark-to-py4j0.9.
2015-10-20 10:52:49 -07:00
Davies Liu 232d7f8d42 [SPARK-11114][PYSPARK] add getOrCreate for SparkContext/SQLContext in Python
Also added SQLContext.newSession()

Author: Davies Liu <davies@databricks.com>

Closes #9122 from davies/py_create.
2015-10-19 16:18:20 -07:00
Brennon York d3180c25d8 [SPARK-7018][BUILD] Refactor dev/run-tests-jenkins into Python
This commit refactors the `run-tests-jenkins` script into Python. This refactoring was done by brennonyork in #7401; this PR contains a few minor edits from joshrosen in order to bring it up to date with other recent changes.

From the original PR description (by brennonyork):

Currently a few things are left out that, could and I think should, be smaller JIRA's after this.

1. There are still a few areas where we use environment variables where we don't need to (like `CURRENT_BLOCK`). I might get around to fixing this one in lieu of everything else, but wanted to point that out.
2. The PR tests are still written in bash. I opted to not change those and just rewrite the runner into Python. This is a great follow-on JIRA IMO.
3. All of the linting scripts are still in bash as well and would likely do to just add those in as follow-on JIRA's as well.

Closes #7401.

Author: Brennon York <brennon.york@capitalone.com>

Closes #9161 from JoshRosen/run-tests-jenkins-refactoring.
2015-10-18 22:45:27 -07:00
Mahmoud Lababidi a337c235a1 [SPARK-11158][SQL] Modified _verify_type() to be more informative on Errors by presenting the Object
The _verify_type() function had Errors that were raised when there were Type conversion issues but left out the Object in question. The Object is now added in the Error to reduce the strain on the user to debug through to figure out the Object that failed the Type conversion.

The use case for me was a Pandas DataFrame that contained 'nan' as values for columns of Strings.

Author: Mahmoud Lababidi <mahmoud@thehumangeo.com>
Author: Mahmoud Lababidi <lababidi@gmail.com>

Closes #9149 from lababidi/master.
2015-10-18 11:39:19 -07:00
Koert Kuipers 57f83e36d6 [SPARK-10185] [SQL] Feat sql comma separated paths
Make sure comma-separated paths get processed correcly in ResolvedDataSource for a HadoopFsRelationProvider

Author: Koert Kuipers <koert@tresata.com>

Closes #8416 from koertkuipers/feat-sql-comma-separated-paths.
2015-10-17 14:56:24 -07:00
zero323 8ac71d62d9 [SPARK-11084] [ML] [PYTHON] Check if index can contain non-zero value before binary search
At this moment `SparseVector.__getitem__` executes `np.searchsorted` first and checks if result is in an expected range after that. It is possible to check if index can contain non-zero value before executing `np.searchsorted`.

Author: zero323 <matthew.szymkiewicz@gmail.com>

Closes #9098 from zero323/sparse_vector_getitem_improved.
2015-10-16 15:53:26 -07:00
Bhargav Mangipudi 1ec0a0dc28 [SPARK-11050] [MLLIB] PySpark SparseVector can return wrong index in e…
…rror message

For negative indices in the SparseVector, we update the index value. If we have an incorrect index
at this point, the error message has the incorrect *updated* index instead of the original one. This
change contains the fix for the same.

Author: Bhargav Mangipudi <bhargav.mangipudi@gmail.com>

Closes #9069 from bhargav/spark-10759.
2015-10-16 14:36:05 -07:00
Joseph K. Bradley c75f058b72 [PYTHON] [MINOR] List modules in PySpark tests when given bad name
Output list of supported modules for python tests in error message when given bad module name.

CC: davies

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #9088 from jkbradley/python-tests-modules.
2015-10-13 12:03:46 -07:00
Ashwin Shankar 2e572c4135 [SPARK-8170] [PYTHON] Add signal handler to trap Ctrl-C in pyspark and cancel all running jobs
This patch adds a signal handler to trap Ctrl-C and cancels running job.

Author: Ashwin Shankar <ashankar@netflix.com>

Closes #9033 from ashwinshankar77/master.
2015-10-12 11:06:21 -07:00
Vladimir Vladimirov c1b4ce4326 [SPARK-10535] Sync up API for matrix factorization model between Scala and PySpark
Support for recommendUsersForProducts and recommendProductsForUsers in matrix factorization model for PySpark

Author: Vladimir Vladimirov <vladimir.vladimirov@magnetic.com>

Closes #8700 from smartkiwi/SPARK-10535_.
2015-10-09 14:16:13 -07:00
Bryan Cutler 5410747a84 [SPARK-10959] [PYSPARK] StreamingLogisticRegressionWithSGD does not train with given regParam and convergenceTol parameters
These params were being passed into the StreamingLogisticRegressionWithSGD constructor, but not transferred to the call for model training.  Same with StreamingLinearRegressionWithSGD.  I added the params as named arguments to the call and also fixed the intercept parameter, which was being passed as regularization value.

Author: Bryan Cutler <bjcutler@us.ibm.com>

Closes #9002 from BryanCutler/StreamingSGD-convergenceTol-bug-10959.
2015-10-08 22:21:07 -07:00
zero323 8e67882b90 [SPARK-10973] [ML] [PYTHON] __gettitem__ method throws IndexError exception when we…
__gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry

    from pyspark.mllib.linalg import Vectors
    sv = Vectors.sparse(5, {1: 3})
    sv[0]
    ## 0.0
    sv[1]
    ## 3.0
    sv[2]
    ## Traceback (most recent call last):
    ##   File "<stdin>", line 1, in <module>
    ##   File "/python/pyspark/mllib/linalg/__init__.py", line 734, in __getitem__
    ##     row_ind = inds[insert_index]
    ## IndexError: index out of bounds

Author: zero323 <matthew.szymkiewicz@gmail.com>

Closes #9009 from zero323/sparse_vector_index_error.
2015-10-08 18:34:15 -07:00
Holden Karau 3aff0866a8 [SPARK-9774] [ML] [PYSPARK] Add python api for ml regression isotonicregression
Add the Python API for isotonicregression.

Author: Holden Karau <holden@pigscanfly.ca>

Closes #8214 from holdenk/SPARK-9774-add-python-api-for-ml-regression-isotonicregression.
2015-10-07 17:50:35 -07:00
Evan Chen da936fbb74 [SPARK-10779] [PYSPARK] [MLLIB] Set initialModel for KMeans model in PySpark (spark.mllib)
Provide initialModel param for pyspark.mllib.clustering.KMeans

Author: Evan Chen <chene@us.ibm.com>

Closes #8967 from evanyc15/SPARK-10779-pyspark-mllib.
2015-10-07 15:04:53 -07:00
Xiangrui Meng 5e035403d4 [SPARK-10957] [ML] setParams changes quantileProbabilities unexpectly in PySpark's AFTSurvivalRegression
If user doesn't specify `quantileProbs` in `setParams`, it will get reset to the default value. We don't need special handling here. vectorijk yanboliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #9001 from mengxr/SPARK-10957.
2015-10-06 14:58:42 -07:00
vectorijk 5952bdb7df [SPARK-10688] [ML] [PYSPARK] Python API for AFTSurvivalRegression
Implement Python API for AFTSurvivalRegression

Author: vectorijk <jiangkai@gmail.com>

Closes #8926 from vectorijk/spark-10688.
2015-10-06 12:43:28 -07:00
asokadiggs c1ad373f26 [SPARK-10782] [PYTHON] Update dropDuplicates documentation
Documentation for dropDuplicates() and drop_duplicates() is one and the same.  Resolved the error in the example for drop_duplicates using the same approach used for groupby and groupBy, by indicating that dropDuplicates and drop_duplicates are aliases.

Author: asokadiggs <asoka.diggs@intel.com>

Closes #8930 from asokadiggs/jira-10782.
2015-09-29 17:45:18 -04:00
Erik Shilts 7d399c9daa [SPARK-6919] [PYSPARK] Add asDict method to StatCounter
Add method to easily convert a StatCounter instance into a Python dict

https://issues.apache.org/jira/browse/SPARK-6919

Note: This is my original work and the existing Spark license applies.

Author: Erik Shilts <erik.shilts@opower.com>

Closes #5516 from eshilts/statcounter-asdict.
2015-09-29 13:38:15 -07:00
noelsmith ab41864f91 [SPARK-10415] [PYSPARK] [MLLIB] [DOCS] Enhance Navigation Sidebar in PySpark API
These are CSS/JavaScript changes changes to make navigation in the PySpark API a bit simpler by adding the following to the sidebar:

* Classes
* Functions
* Tags to highlight experimental features

![screen shot 2015-09-02 at 08 50 12](https://cloud.githubusercontent.com/assets/11915197/9634781/301f853a-518b-11e5-8d5c-fda202f6202f.png)

Online example here: https://dl.dropboxusercontent.com/u/20821334/pyspark-api-nav-enhance/pyspark.mllib.html

(The contribution is my original work and that I license the work to the project under the project's open source license)

Author: noelsmith <mail@noelsmith.com>

Closes #8571 from noel-smith/pyspark-api-nav-enhance.
2015-09-29 13:25:38 -07:00
Eric Liang 922338812c [SPARK-9681] [ML] Support R feature interactions in RFormula
This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`).

To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms.

mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #8830 from ericl/interaction-2.
2015-09-25 00:43:22 -07:00
Reynold Xin 9952217749 [SPARK-10731] [SQL] Delegate to Scala's DataFrame.take implementation in Python DataFrame.
Python DataFrame.head/take now requires scanning all the partitions. This pull request changes them to delegate the actual implementation to Scala DataFrame (by calling DataFrame.take).

This is more of a hack for fixing this issue in 1.5.1. A more proper fix is to change executeCollect and executeTake to return InternalRow rather than Row, and thus eliminate the extra round-trip conversion.

Author: Reynold Xin <rxin@databricks.com>

Closes #8876 from rxin/SPARK-10731.
2015-09-23 16:43:21 -07:00
Liang-Chi Hsieh 1fcefef069 [SPARK-10446][SQL] Support to specify join type when calling join with usingColumns
JIRA: https://issues.apache.org/jira/browse/SPARK-10446

Currently the method `join(right: DataFrame, usingColumns: Seq[String])` only supports inner join. It is more convenient to have it support other join types.

Author: Liang-Chi Hsieh <viirya@appier.com>

Closes #8600 from viirya/usingcolumns_df.
2015-09-21 23:46:00 -07:00
Jian Feng 0180b849db [SPARK-10577] [PYSPARK] DataFrame hint for broadcast join
https://issues.apache.org/jira/browse/SPARK-10577

Author: Jian Feng <jzhang.chs@gmail.com>

Closes #8801 from Jianfeng-chs/master.
2015-09-21 23:36:41 -07:00
Sean Owen bf20d6c9f9 [SPARK-10716] [BUILD] spark-1.5.0-bin-hadoop2.6.tgz file doesn't uncompress on OS X due to hidden file
Remove ._SUCCESS.crc hidden file that may cause problems in distribution tar archive, and is not used

Author: Sean Owen <sowen@cloudera.com>

Closes #8846 from srowen/SPARK-10716.
2015-09-21 23:29:59 -07:00
Holden Karau 1cd6741572 [SPARK-9821] [PYSPARK] pyspark-reduceByKey-should-take-a-custom-partitioner
from the issue:

In Scala, I can supply a custom partitioner to reduceByKey (and other aggregation/repartitioning methods like aggregateByKey and combinedByKey), but as far as I can tell from the Pyspark API, there's no way to do the same in Python.
Here's an example of my code in Scala:
weblogs.map(s => (getFileType(s), 1)).reduceByKey(new FileTypePartitioner(),_+_)
But I can't figure out how to do the same in Python. The closest I can get is to call repartition before reduceByKey like so:
weblogs.map(lambda s: (getFileType(s), 1)).partitionBy(3,hash_filetype).reduceByKey(lambda v1,v2: v1+v2).collect()
But that defeats the purpose, because I'm shuffling twice instead of once, so my performance is worse instead of better.

Author: Holden Karau <holden@pigscanfly.ca>

Closes #8569 from holdenk/SPARK-9821-pyspark-reduceByKey-should-take-a-custom-partitioner.
2015-09-21 23:21:24 -07:00
noelsmith 7c4f852bfc [DOC] [PYSPARK] [MLLIB] Added newlines to docstrings to fix parameter formatting
Added newlines before `:param ...:` and `:return:` markup. Without these, parameter lists aren't formatted correctly in the API docs. I.e:

![screen shot 2015-09-21 at 21 49 26](https://cloud.githubusercontent.com/assets/11915197/10004686/de3c41d4-60aa-11e5-9c50-a46dcb51243f.png)

.. looks like this once newline is added:

![screen shot 2015-09-21 at 21 50 14](https://cloud.githubusercontent.com/assets/11915197/10004706/f86bfb08-60aa-11e5-8524-ae4436713502.png)

Author: noelsmith <mail@noelsmith.com>

Closes #8851 from noel-smith/docstring-missing-newline-fix.
2015-09-21 14:24:19 -07:00
Holden Karau ba882db6f4 [SPARK-9769] [ML] [PY] add python api for countvectorizermodel
From JIRA: Add Python API, user guide and example for ml.feature.CountVectorizerModel

Author: Holden Karau <holden@pigscanfly.ca>

Closes #8561 from holdenk/SPARK-9769-add-python-api-for-countvectorizermodel.
2015-09-21 13:06:23 -07:00
vinodkc 0144039517 [SPARK-10631] [DOCUMENTATION, MLLIB, PYSPARK] Added documentation for few APIs
There are some missing API docs in pyspark.mllib.linalg.Vector (including DenseVector and SparseVector). We should add them based on their Scala counterparts.

Author: vinodkc <vinod.kc.in@gmail.com>

Closes #8834 from vinodkc/fix_SPARK-10631.
2015-09-20 22:55:24 -07:00
Josh Rosen 2117eea71e [SPARK-10710] Remove ability to disable spilling in core and SQL
It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`.

This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.
2015-09-19 21:40:21 -07:00
Yanbo Liang 35e8ab9390 [SPARK-10615] [PYSPARK] change assertEquals to assertEqual
As ```assertEquals``` is deprecated, so we need to change ```assertEquals``` to ```assertEqual``` for existing python unit tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8814 from yanboliang/spark-10615.
2015-09-18 09:53:52 -07:00
Liang-Chi Hsieh 136c77d8bb [SPARK-10642] [PYSPARK] Fix crash when calling rdd.lookup() on tuple keys
JIRA: https://issues.apache.org/jira/browse/SPARK-10642

When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`.

Author: Liang-Chi Hsieh <viirya@appier.com>

Closes #8796 from viirya/fix-pyrdd-lookup.
2015-09-17 10:02:15 -07:00
Yu ISHIKAWA 268088b899 [SPARK-10282] [ML] [PYSPARK] [DOCS] Add @since annotation to pyspark.ml.recommendation
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8692 from yu-iskw/SPARK-10282.
2015-09-17 08:51:19 -07:00
Yu ISHIKAWA c74d38fd8f [SPARK-10274] [MLLIB] Add @since annotation to pyspark.mllib.fpm
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8665 from yu-iskw/SPARK-10274.
2015-09-17 08:50:46 -07:00
Yu ISHIKAWA 4a0b56e8db [SPARK-10279] [MLLIB] [PYSPARK] [DOCS] Add @since annotation to pyspark.mllib.util
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8689 from yu-iskw/SPARK-10279.
2015-09-17 08:50:00 -07:00
Yu ISHIKAWA 39b44cb52e [SPARK-10278] [MLLIB] [PYSPARK] Add @since annotation to pyspark.mllib.tree
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8685 from yu-iskw/SPARK-10278.
2015-09-17 08:48:45 -07:00
Yu ISHIKAWA 0ded87a4d4 [SPARK-10281] [ML] [PYSPARK] [DOCS] Add @since annotation to pyspark.ml.clustering
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8691 from yu-iskw/SPARK-10281.
2015-09-17 08:47:21 -07:00