Commit graph

8 commits

Author SHA1 Message Date
zero323 8ac71d62d9 [SPARK-11084] [ML] [PYTHON] Check if index can contain non-zero value before binary search
At this moment `SparseVector.__getitem__` executes `np.searchsorted` first and checks if result is in an expected range after that. It is possible to check if index can contain non-zero value before executing `np.searchsorted`.

Author: zero323 <matthew.szymkiewicz@gmail.com>

Closes #9098 from zero323/sparse_vector_getitem_improved.
2015-10-16 15:53:26 -07:00
Bhargav Mangipudi 1ec0a0dc28 [SPARK-11050] [MLLIB] PySpark SparseVector can return wrong index in e…
…rror message

For negative indices in the SparseVector, we update the index value. If we have an incorrect index
at this point, the error message has the incorrect *updated* index instead of the original one. This
change contains the fix for the same.

Author: Bhargav Mangipudi <bhargav.mangipudi@gmail.com>

Closes #9069 from bhargav/spark-10759.
2015-10-16 14:36:05 -07:00
zero323 8e67882b90 [SPARK-10973] [ML] [PYTHON] __gettitem__ method throws IndexError exception when we…
__gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry

    from pyspark.mllib.linalg import Vectors
    sv = Vectors.sparse(5, {1: 3})
    sv[0]
    ## 0.0
    sv[1]
    ## 3.0
    sv[2]
    ## Traceback (most recent call last):
    ##   File "<stdin>", line 1, in <module>
    ##   File "/python/pyspark/mllib/linalg/__init__.py", line 734, in __getitem__
    ##     row_ind = inds[insert_index]
    ## IndexError: index out of bounds

Author: zero323 <matthew.szymkiewicz@gmail.com>

Closes #9009 from zero323/sparse_vector_index_error.
2015-10-08 18:34:15 -07:00
noelsmith 7c4f852bfc [DOC] [PYSPARK] [MLLIB] Added newlines to docstrings to fix parameter formatting
Added newlines before `:param ...:` and `:return:` markup. Without these, parameter lists aren't formatted correctly in the API docs. I.e:

![screen shot 2015-09-21 at 21 49 26](https://cloud.githubusercontent.com/assets/11915197/10004686/de3c41d4-60aa-11e5-9c50-a46dcb51243f.png)

.. looks like this once newline is added:

![screen shot 2015-09-21 at 21 50 14](https://cloud.githubusercontent.com/assets/11915197/10004706/f86bfb08-60aa-11e5-8524-ae4436713502.png)

Author: noelsmith <mail@noelsmith.com>

Closes #8851 from noel-smith/docstring-missing-newline-fix.
2015-09-21 14:24:19 -07:00
vinodkc 0144039517 [SPARK-10631] [DOCUMENTATION, MLLIB, PYSPARK] Added documentation for few APIs
There are some missing API docs in pyspark.mllib.linalg.Vector (including DenseVector and SparseVector). We should add them based on their Scala counterparts.

Author: vinodkc <vinod.kc.in@gmail.com>

Closes #8834 from vinodkc/fix_SPARK-10631.
2015-09-20 22:55:24 -07:00
Vinod K C 95b6a8103f [SPARK-10516] [ MLLIB] Added values property in DenseVector
Author: Vinod K C <vinod.kc@huawei.com>

Closes #8682 from vinodkc/fix_SPARK-10516.
2015-09-15 23:25:51 -07:00
Yanbo Liang 4ae4d54794 [SPARK-9793] [MLLIB] [PYSPARK] PySpark DenseVector, SparseVector implement __eq__ and __hash__ correctly
PySpark DenseVector, SparseVector ```__eq__``` method should use semantics equality, and DenseVector can compared with SparseVector.
Implement PySpark DenseVector, SparseVector ```__hash__``` method based on the first 16 entries. That will make PySpark Vector objects can be used in collections.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8166 from yanboliang/spark-9793.
2015-09-14 21:37:43 -07:00
Xiangrui Meng ca71cc8c8b [SPARK-9408] [PYSPARK] [MLLIB] Refactor linalg.py to /linalg
This is based on MechCoder 's PR https://github.com/apache/spark/pull/7731. Hopefully it could pass tests. MechCoder I tried to make minimal changes. If this passes Jenkins, we can merge this one first and then try to move `__init__.py` to `local.py` in a separate PR.

Closes #7731

Author: Xiangrui Meng <meng@databricks.com>

Closes #7746 from mengxr/SPARK-9408 and squashes the following commits:

0e05a3b [Xiangrui Meng] merge master
1135551 [Xiangrui Meng] add a comment for str(...)
c48cae0 [Xiangrui Meng] update tests
173a805 [Xiangrui Meng] move linalg.py to linalg/__init__.py
2015-07-30 16:57:38 -07:00
Renamed from python/pyspark/mllib/linalg.py (Browse further)