Frank Dai
a3da468d8b
Merge remote-tracking branch 'upstream/master' into code-style
2014-01-14 15:29:17 +08:00
Patrick Wendell
fdaabdc673
Merge pull request #380 from mateiz/py-bayes
...
Add Naive Bayes to Python MLlib, and some API fixes
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
algorithms better and in particular make it easier to call from Java
(added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
they could only be run individually through each file)
2014-01-13 23:08:26 -08:00
Frank Dai
c2852cf42e
Indent two spaces
2014-01-14 14:59:01 +08:00
Frank Dai
12386b3eea
Since getLong() and getInt() have side effect, get back parentheses, and remove an empty line
2014-01-14 14:53:10 +08:00
Frank Dai
0d94d74edf
Code clean up for mllib
2014-01-14 14:37:26 +08:00
Henry Saputra
91a563608e
Merge branch 'master' into remove_simpleredundantreturn_scala
2014-01-12 10:34:13 -08:00
Henry Saputra
93a65e5fde
Remove simple redundant return statement for Scala methods/functions:
...
-) Only change simple return statements at the end of method
-) Ignore the complex if-else check
-) Ignore the ones inside synchronized
2014-01-12 10:30:04 -08:00
Matei Zaharia
f00e949f84
Added Java unit test, data, and main method for Naive Bayes
...
Also fixes mains of a few other algorithms to print the final model
2014-01-11 22:30:48 -08:00
Matei Zaharia
9a0dfdf868
Add Naive Bayes to Python MLlib, and some API fixes
...
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
algorithms better and in particular make it easier to call from Java
(added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
they could only be run individually through each file)
2014-01-11 22:30:48 -08:00
jerryshao
cbfbc01938
Fix configure didn't work small problem in ALS
2014-01-11 16:22:45 +08:00
Reza Zadeh
21c8a54c08
Merge remote-tracking branch 'upstream/master' into sparsesvd
...
Conflicts:
docs/mllib-guide.md
2014-01-09 22:45:32 -08:00
Reza Zadeh
7d7490b67b
More sparse matrix usage.
2014-01-07 17:16:17 -08:00
Hossein Falaki
3a8beb46cb
Merge branch 'master' into MatrixFactorizationModel-fix
2014-01-07 15:22:42 -08:00
Hossein Falaki
04132ea9b2
Added Rating deserializer
2014-01-06 12:19:08 -08:00
Hossein Falaki
11a93fb5a8
Added serializing method for Rating object
2014-01-06 12:18:03 -08:00
Xusen Yin
05e6d5b454
Added GradientDescentSuite
2014-01-06 16:54:00 +08:00
Xusen Yin
a72107284a
fix logistic loss bug
2014-01-06 12:30:17 +08:00
Reynold Xin
d43ad3ef2c
Merge pull request #292 from soulmachine/naive-bayes
...
standard Naive Bayes classifier
Has implemented the standard Naive Bayes classifier. This is an updated version of #288 , which is closed because of misoperations.
2014-01-04 16:29:30 -08:00
Hossein Falaki
8d0c2f7399
Added python binding for bulk recommendation
2014-01-04 16:23:17 -08:00
Reza Zadeh
06c0f7628a
use SparseMatrix everywhere
2014-01-04 14:28:07 -08:00
Reza Zadeh
cdff9fc858
prettify
2014-01-04 12:44:04 -08:00
Reza Zadeh
e9bd6cb51d
new example file
2014-01-04 12:33:22 -08:00
Reza Zadeh
8bfcce1ad8
fix tests
2014-01-04 11:52:42 -08:00
Reza Zadeh
35adc72794
set methods
2014-01-04 11:30:36 -08:00
Reza Zadeh
73daa700bd
add k parameter
2014-01-04 01:52:28 -08:00
Reza Zadeh
26a74f0c41
using decomposed matrix struct now
2014-01-04 00:38:53 -08:00
Reza Zadeh
d2d5e5e062
new return struct
2014-01-04 00:15:04 -08:00
Reza Zadeh
7f631dd2a9
start using matrixentry
2014-01-03 22:17:24 -08:00
Reza Zadeh
6bcdb762a1
rename sparsesvd.scala
2014-01-03 21:55:38 -08:00
Reza Zadeh
b059a2a00c
New matrix entry file
2014-01-03 21:54:57 -08:00
Hossein Falaki
dfe57fa84c
Removed unnecessary blank line
2014-01-03 15:40:53 -08:00
Hossein Falaki
2c1cba851c
Added unit tests for bulk prediction in MatrixFactorizationModel
2014-01-03 15:35:20 -08:00
Hossein Falaki
67f937ec22
Added a method to enable bulk prediction
2014-01-03 15:34:16 -08:00
Reza Zadeh
e617ae2dad
fix error message
2014-01-02 01:51:38 -08:00
Reza Zadeh
61405785bc
Merge remote-tracking branch 'upstream/master' into sparsesvd
2014-01-02 01:50:30 -08:00
Reza Zadeh
2612164f85
more docs yay
2014-01-01 20:22:29 -08:00
Reza Zadeh
915d53f8ac
javadoc for sparsesvd
2014-01-01 20:20:16 -08:00
Reza Zadeh
185c882606
tweaks to docs
2014-01-01 19:53:14 -08:00
Lian, Cheng
dd6033e685
Aggregated all sample points to driver without any shuffle
2014-01-02 01:38:24 +08:00
Lian, Cheng
6d0e2e86df
Response to comments from Reynold, Ameet and Evan
...
* Arguments renamed according to Ameet's suggestion
* Using DoubleMatrix instead of Array[Double] in computation
* Removed arguments C (kinds of label) and D (dimension of feature vector) from NaiveBayes.train()
* Replaced reduceByKey with foldByKey to avoid modifying original input data
2013-12-30 22:46:32 +08:00
Matei Zaharia
b4ceed40d6
Merge remote-tracking branch 'origin/master' into conf2
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
2013-12-29 15:08:08 -05:00
Lian, Cheng
f150b6e76c
Response to Reynold's comments
2013-12-29 17:13:01 +08:00
Matei Zaharia
642029e7f4
Various fixes to configuration code
...
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing
This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
2013-12-28 17:13:15 -05:00
Reza Zadeh
ae5102acc0
large scale considerations
2013-12-27 04:15:13 -05:00
Reza Zadeh
642ab5c1e1
initial large scale testing begin
2013-12-27 01:51:19 -05:00
Reza Zadeh
3369c2d487
cleanup documentation
2013-12-27 00:41:46 -05:00
Reza Zadeh
bdb5037987
add all tests
2013-12-27 00:36:41 -05:00
Reza Zadeh
fa1e8d8cbf
test for truncated svd
2013-12-27 00:34:59 -05:00
Reza Zadeh
16de5268e3
full rank matrix test added
2013-12-26 23:21:57 -05:00
Lian, Cheng
d7086dc28a
Added Apache license header to NaiveBayesSuite
2013-12-27 08:20:41 +08:00
Reza Zadeh
fe1a132d40
Main method added for svd
2013-12-26 18:13:21 -05:00
Reza Zadeh
1a21ba2967
new main file
2013-12-26 18:09:33 -05:00
Reza Zadeh
6c3674cd23
Object to hold the svd methods
2013-12-26 17:39:25 -05:00
Reza Zadeh
6e740cc901
Some documentation
2013-12-26 16:12:40 -05:00
Lian, Cheng
654f42174a
Reformatted some lines commented by Matei
2013-12-27 04:45:04 +08:00
Reza Zadeh
1a173f00bd
Initial files - no tests
2013-12-26 15:01:03 -05:00
Lian, Cheng
c0337c5bbf
Let reduceByKey to take care of local combine
...
Also refactored some heavy FP code to improve readability and reduce memory footprint.
2013-12-25 22:45:57 +08:00
Lian, Cheng
3bb714eaa3
Refactored NaiveBayes
...
* Minimized shuffle output with mapPartitions.
* Reduced RDD actions from 3 to 1.
2013-12-25 17:15:38 +08:00
Frank Dai
3dc655aa19
standard Naive Bayes classifier
2013-12-25 16:50:42 +08:00
Tor Myklebust
4e821390bc
Scala stubs for updated Python bindings.
2013-12-25 00:09:00 -05:00
Tor Myklebust
58e2a7d6d4
Move PythonMLLibAPI into its own package.
2013-12-24 16:48:40 -05:00
Tor Myklebust
2402180b32
Fix error message ugliness.
2013-12-24 16:18:33 -05:00
Prashant Sharma
2573add94c
spark-544, introducing SparkConf and related configuration overhaul.
2013-12-25 00:09:36 +05:30
Tor Myklebust
20f85eca3d
Java stubs for ALSModel.
2013-12-21 14:54:13 -05:00
Tor Myklebust
b454fdc2eb
Javadocs; also, declare some things private.
2013-12-20 02:10:21 -05:00
Tor Myklebust
b835ddf3df
Licence notice.
2013-12-20 01:55:03 -05:00
Tor Myklebust
f99970e8cd
Scala classification and clustering stubs; matrix serialization/deserialization.
2013-12-20 00:12:22 -05:00
Tor Myklebust
ded67ee90c
Bindings for linear, Lasso, and ridge regression.
2013-12-19 22:42:12 -05:00
Tor Myklebust
2a41c9aad3
Un-semicolon PythonMLLibAPI.
2013-12-19 21:27:11 -05:00
Tor Myklebust
95915f8b3b
First cut at python mllib bindings. Only LinearRegression is supported.
2013-12-19 01:29:09 -05:00
Mark Hamstra
09ed7ddfa0
Use scala.binary.version in POMs
2013-12-15 12:39:58 -08:00
Prashant Sharma
17db6a9041
Style fixes and addressed review comments at #221
2013-12-10 11:47:16 +05:30
Prashant Sharma
7ad6921ae0
Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.
2013-12-07 12:45:57 +05:30
Prashant Sharma
44fd30d3fb
Merge branch 'master' into scala-2.10-wip
...
Conflicts:
core/src/main/scala/org/apache/spark/rdd/RDD.scala
project/SparkBuild.scala
2013-11-25 18:10:54 +05:30
Marek Kolodziej
22724659db
Make XORShiftRandom explicit in KMeans and roll it back for RDD
2013-11-20 07:03:36 -05:00
Marek Kolodziej
99cfe89c68
Updates to reflect pull request code review
2013-11-18 22:00:36 -05:00
Marek Kolodziej
09bdfe3b16
XORShift RNG with unit tests and benchmark
...
To run unit test, start SBT console and type:
compile
test-only org.apache.spark.util.XORShiftRandomSuite
To run benchmark, type:
project core
console
Once the Scala console starts, type:
org.apache.spark.util.XORShiftRandom.benchmark(100000000)
2013-11-18 15:21:43 -05:00
Prashant Sharma
026ab75661
Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10
2013-10-10 09:42:55 +05:30
Prashant Sharma
26860639c5
Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
project/SparkBuild.scala
2013-10-10 09:42:23 +05:30
Prashant Sharma
7be75682b9
Merge branch 'master' into wip-merge-master
...
Conflicts:
bagel/pom.xml
core/pom.xml
core/src/test/scala/org/apache/spark/ui/UISuite.scala
examples/pom.xml
mllib/pom.xml
pom.xml
project/SparkBuild.scala
repl/pom.xml
streaming/pom.xml
tools/pom.xml
In scala 2.10, a shorter representation is used for naming artifacts
so changed to shorter scala version for artifacts and made it a property in pom.
2013-10-08 11:29:40 +05:30
Nick Pentreath
a5e58b8f98
Merge branch 'master' into implicit-als
2013-10-07 11:46:17 +02:00
Nick Pentreath
b0f5f4d441
Bumping up test matrix size to eliminate random failures
2013-10-07 11:44:22 +02:00
Patrick Wendell
aa9fb84994
Merging build changes in from 0.8
2013-10-05 22:07:00 -07:00
Martin Weindel
e09f4a9601
fixed some warnings
2013-10-05 23:08:23 +02:00
Nick Pentreath
c6ceaeae50
Style fix using 'if' rather than 'match' on boolean
2013-10-04 13:52:53 +02:00
Nick Pentreath
6a7836cddc
Fixing closing brace indentation
2013-10-04 13:33:01 +02:00
Nick Pentreath
0bd9b373d1
Reverting to using comma-delimited split
2013-10-04 13:30:33 +02:00
Nick Pentreath
1cbdcb9cb6
Merge remote-tracking branch 'upstream/master' into implicit-als
2013-10-04 13:25:34 +02:00
Prashant Sharma
5829692885
Merge branch 'master' into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
docs/_config.yml
project/SparkBuild.scala
repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
2013-10-01 11:57:24 +05:30
Prashant Sharma
7ff4c2d399
fixed maven build for scala 2.10
2013-09-26 10:48:24 +05:30
Patrick Wendell
6079721fa1
Update build version in master
2013-09-24 11:41:51 -07:00
Nick Pentreath
d952f04c8e
Merge remote-tracking branch 'upstream/master' into implicit-als
2013-09-23 13:07:40 +02:00
Prashant Sharma
383e151fd7
Merge branch 'master' of git://github.com/mesos/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
project/SparkBuild.scala
2013-09-15 10:55:12 +05:30
Matei Zaharia
7a5c4b647b
Small tweaks to MLlib docs
2013-09-08 21:47:24 -07:00
Ameet Talwalkar
81a8bd46ac
respose to PR comments
2013-09-08 19:21:30 -07:00
Nick Pentreath
737f01a1ef
Adding algorithm for implicit feedback data to ALS
2013-09-06 14:45:05 +02:00
Prashant Sharma
4106ae9fbf
Merged with master
2013-09-06 17:53:01 +05:30
Matei Zaharia
12b2f1f9c9
Add missing license headers found with RAT
2013-09-02 12:23:03 -07:00
Matei Zaharia
0a8cc30921
Move some classes to more appropriate packages:
...
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia
5701eb92c7
Fix some URLs
2013-09-01 14:13:16 -07:00
Matei Zaharia
46eecd110a
Initial work to rename package to org.apache.spark
2013-09-01 14:13:13 -07:00
Shivaram Venkataraman
adc700582b
Fix broken build by removing addIntercept
2013-08-30 00:16:32 -07:00
Evan Sparks
016787de32
Merge pull request #863 from shivaram/etrain-ridge
...
Adding linear regression and refactoring Ridge regression to use SGD
2013-08-29 22:15:14 -07:00
Evan Sparks
852d810787
Merge pull request #819 from shivaram/sgd-cleanup
...
Change SVM to use {0,1} labels
2013-08-29 22:13:15 -07:00
Shivaram Venkataraman
dc06b52879
Add an option to turn off data validation, test it.
...
Also moves addIntercept to have default true to make it similar
to validateData option
2013-08-25 23:14:35 -07:00
Shivaram Venkataraman
b8c50a0642
Center & scale variables in Ridge, Lasso.
...
Also add a unit test that checks if ridge regression lowers
cross-validation error.
2013-08-25 22:24:27 -07:00
Matei Zaharia
215c13dd41
Fix code style and a nondeterministic RDD issue in ALS
2013-08-22 16:13:46 -07:00
Matei Zaharia
46ea0c1b47
Merge pull request #814 from holdenk/master
...
Create less instances of the random class during ALS initialization.
2013-08-22 15:57:28 -07:00
Jey Kottalam
23f4622aff
Remove redundant dependencies from POMs
2013-08-18 18:53:57 -07:00
Evan Sparks
07fe910669
Fixing typos in Java tests, and addressing alignment issues.
2013-08-18 15:03:13 -07:00
Evan Sparks
b291db712e
Centralizing linear data generator and mllib regression tests to use it.
2013-08-18 15:03:13 -07:00
Evan Sparks
b659af83d3
Adding Linear Regression, and refactoring Ridge Regression.
2013-08-18 15:03:13 -07:00
Jey Kottalam
ad580b94d5
Maven build now also works with YARN
2013-08-16 13:50:12 -07:00
Jey Kottalam
9dd15fe700
Don't mark hadoop-client as 'provided'
2013-08-16 13:50:12 -07:00
Jey Kottalam
11b42a84db
Maven build now works with CDH hadoop-2.0.0-mr1
2013-08-16 13:50:12 -07:00
Jey Kottalam
353fab2440
Initial changes to make Maven build agnostic of hadoop version
2013-08-16 13:50:12 -07:00
Holden Karau
8fc40818d7
Fix
2013-08-15 23:08:48 -07:00
Shivaram Venkataraman
c874625354
Specify label format in LogisticRegression.
2013-08-13 16:55:53 -07:00
Shivaram Venkataraman
0ab6ff4c32
Fix SVM model and unit test to work with {0,1}.
...
Also rename validateFuncs to validators.
2013-08-13 13:57:06 -07:00
Shivaram Venkataraman
654087194d
Change SVM to use {0,1} labels.
...
Also add a data validation check to make sure classification labels
are always 0 or 1 and add an appropriate test case.
2013-08-13 11:44:47 -07:00
Holden Karau
d145da818e
Code review feedback :)
2013-08-12 22:13:08 -07:00
Holden Karau
705c9ace2a
Use less instances of the random class during ALS setup
2013-08-12 22:08:36 -07:00
Matei Zaharia
9e02da2763
Merge pull request #812 from shivaram/maven-mllib-tests
...
Create SparkContext in beforeAll for MLLib tests
2013-08-12 20:22:27 -07:00
Shivaram Venkataraman
4935a2558b
Clean up scaladoc in ML Lib.
...
Also build and copy ML Lib scaladoc in Spark docs build.
Some more minor cleanup with respect to naming, test locations etc.
2013-08-11 19:02:43 -07:00
Shivaram Venkataraman
ecc9bfe377
Create SparkContext in beforeAll for MLLib tests
...
This overcomes test failures that occur using Maven
2013-08-11 17:04:00 -07:00
Evan Sparks
ff9ebfabb4
Merge pull request #762 from shivaram/sgd-cleanup
...
Refactor SGD options into a new class.
2013-08-11 10:52:55 -07:00
Shivaram Venkataraman
a65a6ed514
Fix GLM code review comments and move java tests
2013-08-10 18:54:10 -07:00
Matei Zaharia
cd247ba5bb
Merge pull request #786 from shivaram/mllib-java
...
Java fixes, tests and examples for ALS, KMeans
2013-08-09 20:41:13 -07:00
Reynold Xin
01f20a941e
Fixed a typo in mllib inline documentation.
2013-08-08 16:42:54 -07:00
Shivaram Venkataraman
2812e72200
Add setters for optimizer, gradient in SGD.
...
Also remove java-specific constructor for LabeledPoint.
2013-08-08 16:24:31 -07:00
Shivaram Venkataraman
e1a209f791
Remove Java-specific constructor for Rating.
...
The scala constructor works for native type java types. Modify examples
to match this.
2013-08-08 14:36:02 -07:00
Shivaram Venkataraman
338b7a7455
Merge branch 'master' of git://github.com/mesos/spark into sgd-cleanup
...
Conflicts:
mllib/src/main/scala/spark/mllib/util/MLUtils.scala
2013-08-06 21:21:55 -07:00
Shivaram Venkataraman
7db69d56f2
Refactor GLM algorithms and add Java tests
...
This change adds Java examples and unit tests for all GLM algorithms
to make sure the MLLib interface works from Java. Changes include
- Introduce LabeledPoint and avoid using Doubles in train arguments
- Rename train to run in class methods
- Make the optimizer a member variable of GLM to make sure the builder
pattern works
2013-08-06 17:23:22 -07:00
Shivaram Venkataraman
6caec3f441
Add a test case for random initialization.
...
Also workaround a bug where double[][] class cast fails
2013-08-06 16:35:47 -07:00
Shivaram Venkataraman
471fbadd0c
Java examples, tests for KMeans and ALS
...
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
examples project.
Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
KMeans init
2013-08-06 15:43:46 -07:00
Ginger Smith
bf7033f3eb
fixing formatting, style, and input
2013-08-05 21:26:24 -07:00
Ginger Smith
8c8947e2b6
fixing formatting
2013-08-05 11:22:18 -07:00
Shivaram Venkataraman
7388e27668
Move implicit arg to constructor for Java access.
2013-08-03 18:08:43 -07:00
Ginger Smith
4ab4df5edb
adding matrix factorization data generator
2013-08-02 22:22:36 -07:00
Shivaram Venkataraman
00339cc032
Refactor optimizers and create GLMs
...
This change refactors the structure of GLMs to use mixins which maintain
a similar interface to other ML lib algorithms. This change also creates
an Optimizer trait which allows GLMs to be extended to use other optimization
techniques.
2013-08-02 19:15:34 -07:00
Matei Zaharia
abfa9e6f70
Increase Kryo buffer size in ALS since some arrays become big
2013-08-02 16:17:32 -07:00
shivaram
58756b72f1
Merge pull request #761 from mateiz/kmeans-generator
...
Add data generator for K-means
2013-07-31 23:45:41 -07:00
Matei Zaharia
52dba89261
Turn on caching in KMeans.main
2013-07-31 23:08:12 -07:00
Matei Zaharia
b2b86c2575
Merge pull request #753 from shivaram/glm-refactor
...
Build changes for ML lib
2013-07-31 15:51:39 -07:00
Matei Zaharia
f607ffb9e1
Added data generator for K-means
...
Also made it possible to specify the number of runs in KMeans.main().
2013-07-31 14:31:07 -07:00
Shivaram Venkataraman
cef178873b
Refactor SGD options into a new class.
...
This refactoring pulls out code shared between SVM, Lasso, LR into
a common GradientDescentOpts class. Some style cleanup as well
2013-07-31 14:15:17 -07:00
Matei Zaharia
9a444cffe7
Use the Char version of split() instead of the String one for efficiency
2013-07-31 11:28:39 -07:00
Shivaram Venkataraman
48851d4dd9
Add bagel, mllib to SBT assembly.
...
Also add jblas dependency to mllib pom.xml
2013-07-30 14:03:15 -07:00
Reynold Xin
366f7735eb
Minor style cleanup of mllib.
2013-07-30 13:59:32 -07:00
Reynold Xin
47011e6854
Use a tigher bound in logistic regression unit test's prediction validation.
2013-07-30 13:58:23 -07:00
Reynold Xin
e35966ae9a
Renamed Classification.scala to ClassificationModel.scala and Regression.scala to RegressionModel.scala
2013-07-30 13:28:31 -07:00
Ameet Talwalkar
e4387ddf5d
made SimpleUpdater consistent with other updaters
2013-07-29 22:21:50 -07:00
Shivaram Venkataraman
3ca9faa341
Clarify how regVal is computed in Updater docs
2013-07-29 18:37:28 -07:00
Shivaram Venkataraman
07da72b451
Remove duplicate loss history and clarify why.
...
Also some minor style fixes.
2013-07-29 16:25:17 -07:00
Xinghao
2b2630ba3c
Style fix
...
Lines shortened to < 100 characters
2013-07-29 09:22:49 -07:00
Xinghao
07f17439a5
Fix validatePrediction functions for Classification models
...
Classifiers return categorical (Int) values that should be compared
directly
2013-07-29 09:22:31 -07:00
Xinghao
3a8d07df8c
Deleting extra LogisticRegressionGenerator and RidgeRegressionGenerator
2013-07-29 09:20:26 -07:00
Xinghao
75f3757300
Fix rounding error in LogisticRegression.scala
2013-07-29 09:19:56 -07:00
Xinghao
c823ee1e2b
Replace map-reduce with dot operator using DoubleMatrix
2013-07-28 22:17:53 -07:00
Xinghao
96e04f4cb7
Fixed SVM and LR train functions to take Int instead of Double for Classification
2013-07-28 22:12:39 -07:00
Xinghao
9398dced03
Changed Classification to return Int instead of Double
...
Also minor changes to formatting and comments
2013-07-28 21:39:19 -07:00
Xinghao
67de051bbb
SVMSuite and LassoSuite rewritten to follow closely with LogisticRegressionSuite
2013-07-28 21:09:56 -07:00
Xinghao
29e042940a
Move data generators to util
2013-07-28 20:39:52 -07:00
Xinghao
ccfa362dde
Change *_LocalRandomSGD to *LocalRandomSGD
2013-07-28 10:33:57 -07:00
Xinghao
b0bbc7f6a8
Resolve conflicts with master, removed regParam for LogisticRegression
2013-07-26 18:57:39 -07:00
Xinghao
071afe2a33
New files from merge with master
2013-07-26 18:21:20 -07:00
Xinghao
10fd3949e6
Making ClassificationModel serializable
2013-07-26 17:49:11 -07:00
Xinghao
f0a1f95228
Rename LogisticRegression, SVM and Lasso to *_LocalRandomSGD
2013-07-26 17:36:14 -07:00
Xinghao
f74a03c6d8
Multiple changes
...
- Changed LogisticRegression regularization parameter to 0
- Removed println from SVM predict function
- Fixed "Lasso" -> "SVM" in SVMGenerator
- Added comment in Updater.scala to indicate L1 regularization leads to
soft thresholding proximal function
2013-07-26 17:29:44 -07:00
Xinghao
eef678703e
Adding SVM and Lasso, moving LogisticRegression to classification from regression
...
Also, add regularization parameter to SGD
2013-07-24 15:32:50 -07:00
Reynold Xin
2210e8ccf8
Use a different validation dataset for Logistic Regression prediction testing.
2013-07-23 12:52:15 -07:00
Reynold Xin
87a9dd898f
Made RegressionModel serializable and added unit tests to make sure predict methods would work.
2013-07-23 12:13:27 -07:00
Matei Zaharia
c40f0f21f1
Merge pull request #711 from shivaram/ml-generators
...
Move ML lib data generator files to util/
2013-07-19 13:33:04 -07:00
Shivaram Venkataraman
2c9ea56db4
Rename classes to be called DataGenerator
2013-07-18 11:57:14 -07:00
Shivaram Venkataraman
7ab1170503
Refactor data generators to have a function that can be used in tests.
2013-07-18 11:55:19 -07:00
Shivaram Venkataraman
217667174e
Return Array[Double] from SGD instead of DoubleMatrix
2013-07-17 16:08:34 -07:00
Shivaram Venkataraman
45f3c85518
Change weights to be Array[Double] in LR model.
...
Also ensure weights are initialized to a column vector.
2013-07-17 16:03:29 -07:00
Shivaram Venkataraman
3bf9897136
Rename loss -> stochasticLoss and add a note to explain why we have
...
multiple train methods.
2013-07-17 14:20:24 -07:00
Shivaram Venkataraman
64b88e039a
Move ML lib data generator files to util/
2013-07-17 14:11:44 -07:00
Shivaram Venkataraman
84fa20c2a1
Allow initial weight vectors in LogisticRegression.
...
Also move LogisticGradient to the LogisticRegression file and fix the
unit tests log path.
2013-07-17 14:04:05 -07:00
Matei Zaharia
af3c9d5042
Add Apache license headers and LICENSE and NOTICE files
2013-07-16 17:21:33 -07:00
Matei Zaharia
4698a0d688
Shuffle ratings in a more efficient way at start of ALS
2013-07-15 02:54:11 +00:00
Matei Zaharia
ed7fd501cf
Make number of blocks in ALS configurable and lower the default
2013-07-15 00:30:10 +00:00
Matei Zaharia
931e4c96ef
Fix a comment
2013-07-14 08:03:13 +00:00
Matei Zaharia
c5c38d1987
Some optimizations to loading phase of ALS
2013-07-14 07:59:50 +00:00
Ameet Talwalkar
bf4c9a5e0f
renamed with labeled prefix
2013-07-08 14:37:42 -07:00
ryanlecompte
be123aa6ef
update to use ListBuffer, faster than Vector for append operations
2013-07-07 15:35:06 -07:00
ryanlecompte
f78f8d0b41
fix formatting and use Vector instead of List to maintain order
2013-07-06 16:46:53 -07:00
ryanlecompte
757e56dfc7
make binSearch a tail-recursive method
2013-07-05 19:54:28 -07:00
Matei Zaharia
8bbe907556
Replaced string constants in test
2013-07-05 17:25:23 -07:00
Matei Zaharia
653043beb6
Renamed files to match package
2013-07-05 17:18:55 -07:00
Matei Zaharia
de67deeaab
Addressed style comments from Ryan LeCompte
2013-07-05 17:16:49 -07:00
Matei Zaharia
43b24635ee
Renamed ML package to MLlib and added it to classpath
2013-07-05 11:38:53 -07:00