Commit graph

5 commits

Author SHA1 Message Date
Burak Yavuz f5ff4a84c4 [SPARK-7383] [ML] Feature Parity in PySpark for ml.features
Implemented python wrappers for Scala functions that don't exist in `ml.features`

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #5991 from brkyvz/ml-feat-PR and squashes the following commits:

adcca55 [Burak Yavuz] add regex tokenizer to __all__
b91cb44 [Burak Yavuz] addressed comments
bd39fd2 [Burak Yavuz] remove addition
b82bd7c [Burak Yavuz] Parity in PySpark for ml.features
2015-05-08 11:14:39 -07:00
Burak Yavuz 9e2ffb1328 [SPARK-7388] [SPARK-7383] wrapper for VectorAssembler in Python
The wrapper required the implementation of the `ArrayParam`, because `Array[T]` is hard to obtain from Python. `ArrayParam` has an extra function called `wCast` which is an internal function to obtain `Array[T]` from `Seq[T]`

Author: Burak Yavuz <brkyvz@gmail.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #5930 from brkyvz/ml-feat and squashes the following commits:

73e745f [Burak Yavuz] Merge pull request #3 from mengxr/SPARK-7388
c221db9 [Xiangrui Meng] overload StringArrayParam.w
c81072d [Burak Yavuz] addressed comments
99c2ebf [Burak Yavuz] add to python_shared_params
39ecb07 [Burak Yavuz] fix scalastyle
7f7ea2a [Burak Yavuz] [SPARK-7388][SPARK-7383] wrapper for VectorAssembler in Python
2015-05-07 10:25:41 -07:00
Xiangrui Meng ee374e89cd [SPARK-7333] [MLLIB] Add BinaryClassificationEvaluator to PySpark
This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz

Author: Xiangrui Meng <meng@databricks.com>

Closes #5885 from mengxr/SPARK-7333 and squashes the following commits:

25d7451 [Xiangrui Meng] fix tests in python 3
babdde7 [Xiangrui Meng] fix doc
cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark
2015-05-05 11:45:37 -07:00
Xiangrui Meng 57cd1e86d1 [SPARK-6893][ML] default pipeline parameter handling in python
Same as #5431 but for Python. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #5534 from mengxr/SPARK-6893 and squashes the following commits:

d3b519b [Xiangrui Meng] address comments
ebaccc6 [Xiangrui Meng] style update
fce244e [Xiangrui Meng] update explainParams with test
4d6b07a [Xiangrui Meng] add tests
5294500 [Xiangrui Meng] update default param handling in python
2015-04-15 23:49:42 -07:00
Xiangrui Meng e80dc1c5a8 [SPARK-4586][MLLIB] Python API for ML pipeline and parameters
This PR adds Python API for ML pipeline and parameters. The design doc can be found on the JIRA page. It includes transformers and an estimator to demo the simple text classification example code.

TODO:
- [x] handle parameters in LRModel
- [x] unit tests
- [x] missing some docs

CC: davies jkbradley

Author: Xiangrui Meng <meng@databricks.com>
Author: Davies Liu <davies@databricks.com>

Closes #4151 from mengxr/SPARK-4586 and squashes the following commits:

415268e [Xiangrui Meng] remove inherit_doc from __init__
edbd6fe [Xiangrui Meng] move Identifiable to ml.util
44c2405 [Xiangrui Meng] Merge pull request #2 from davies/ml
dd1256b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
14ae7e2 [Davies Liu] fix docs
54ca7df [Davies Liu] fix tests
78638df [Davies Liu] Merge branch 'SPARK-4586' of github.com:mengxr/spark into ml
fc59a02 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
1dca16a [Davies Liu] refactor
090b3a3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into ml
0882513 [Xiangrui Meng] update doc style
a4f4dbf [Xiangrui Meng] add unit test for LR
7521d1c [Xiangrui Meng] add unit tests to HashingTF and Tokenizer
ba0ba1e [Xiangrui Meng] add unit tests for pipeline
0586c7b [Xiangrui Meng] add more comments to the example
5153cff [Xiangrui Meng] simplify java models
036ca04 [Xiangrui Meng] gen numFeatures
46fa147 [Xiangrui Meng] update mllib/pom.xml to include python files in the assembly
1dcc17e [Xiangrui Meng] update code gen and make param appear in the doc
f66ba0c [Xiangrui Meng] make params a property
d5efd34 [Xiangrui Meng] update doc conf and move embedded param map to instance attribute
f4d0fe6 [Xiangrui Meng] use LabeledDocument and Document in example
05e3e40 [Xiangrui Meng] update example
d3e8dbe [Xiangrui Meng] more docs optimize pipeline.fit impl
56de571 [Xiangrui Meng] fix style
d0c5bb8 [Xiangrui Meng] a working copy
bce72f4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
17ecfb9 [Xiangrui Meng] code gen for shared params
d9ea77c [Xiangrui Meng] update doc
c18dca1 [Xiangrui Meng] make the example working
dadd84e [Xiangrui Meng] add base classes and docs
a3015cf [Xiangrui Meng] add Estimator and Transformer
46eea43 [Xiangrui Meng] a pipeline in python
33b68e0 [Xiangrui Meng] a working LR
2015-01-28 17:14:23 -08:00