Shivaram Venkataraman
7db69d56f2
Refactor GLM algorithms and add Java tests
...
This change adds Java examples and unit tests for all GLM algorithms
to make sure the MLLib interface works from Java. Changes include
- Introduce LabeledPoint and avoid using Doubles in train arguments
- Rename train to run in class methods
- Make the optimizer a member variable of GLM to make sure the builder
pattern works
2013-08-06 17:23:22 -07:00
Shivaram Venkataraman
6caec3f441
Add a test case for random initialization.
...
Also workaround a bug where double[][] class cast fails
2013-08-06 16:35:47 -07:00
Shivaram Venkataraman
471fbadd0c
Java examples, tests for KMeans and ALS
...
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
examples project.
Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
KMeans init
2013-08-06 15:43:46 -07:00
Ginger Smith
bf7033f3eb
fixing formatting, style, and input
2013-08-05 21:26:24 -07:00
Ginger Smith
8c8947e2b6
fixing formatting
2013-08-05 11:22:18 -07:00
Shivaram Venkataraman
7388e27668
Move implicit arg to constructor for Java access.
2013-08-03 18:08:43 -07:00
Ginger Smith
4ab4df5edb
adding matrix factorization data generator
2013-08-02 22:22:36 -07:00
Shivaram Venkataraman
00339cc032
Refactor optimizers and create GLMs
...
This change refactors the structure of GLMs to use mixins which maintain
a similar interface to other ML lib algorithms. This change also creates
an Optimizer trait which allows GLMs to be extended to use other optimization
techniques.
2013-08-02 19:15:34 -07:00
Matei Zaharia
abfa9e6f70
Increase Kryo buffer size in ALS since some arrays become big
2013-08-02 16:17:32 -07:00
shivaram
58756b72f1
Merge pull request #761 from mateiz/kmeans-generator
...
Add data generator for K-means
2013-07-31 23:45:41 -07:00
Matei Zaharia
52dba89261
Turn on caching in KMeans.main
2013-07-31 23:08:12 -07:00
Matei Zaharia
b2b86c2575
Merge pull request #753 from shivaram/glm-refactor
...
Build changes for ML lib
2013-07-31 15:51:39 -07:00
Matei Zaharia
f607ffb9e1
Added data generator for K-means
...
Also made it possible to specify the number of runs in KMeans.main().
2013-07-31 14:31:07 -07:00
Shivaram Venkataraman
cef178873b
Refactor SGD options into a new class.
...
This refactoring pulls out code shared between SVM, Lasso, LR into
a common GradientDescentOpts class. Some style cleanup as well
2013-07-31 14:15:17 -07:00
Matei Zaharia
9a444cffe7
Use the Char version of split() instead of the String one for efficiency
2013-07-31 11:28:39 -07:00
Shivaram Venkataraman
48851d4dd9
Add bagel, mllib to SBT assembly.
...
Also add jblas dependency to mllib pom.xml
2013-07-30 14:03:15 -07:00
Reynold Xin
366f7735eb
Minor style cleanup of mllib.
2013-07-30 13:59:32 -07:00
Reynold Xin
47011e6854
Use a tigher bound in logistic regression unit test's prediction validation.
2013-07-30 13:58:23 -07:00
Reynold Xin
e35966ae9a
Renamed Classification.scala to ClassificationModel.scala and Regression.scala to RegressionModel.scala
2013-07-30 13:28:31 -07:00
Ameet Talwalkar
e4387ddf5d
made SimpleUpdater consistent with other updaters
2013-07-29 22:21:50 -07:00
Shivaram Venkataraman
3ca9faa341
Clarify how regVal is computed in Updater docs
2013-07-29 18:37:28 -07:00
Shivaram Venkataraman
07da72b451
Remove duplicate loss history and clarify why.
...
Also some minor style fixes.
2013-07-29 16:25:17 -07:00
Xinghao
2b2630ba3c
Style fix
...
Lines shortened to < 100 characters
2013-07-29 09:22:49 -07:00
Xinghao
07f17439a5
Fix validatePrediction functions for Classification models
...
Classifiers return categorical (Int) values that should be compared
directly
2013-07-29 09:22:31 -07:00
Xinghao
3a8d07df8c
Deleting extra LogisticRegressionGenerator and RidgeRegressionGenerator
2013-07-29 09:20:26 -07:00
Xinghao
75f3757300
Fix rounding error in LogisticRegression.scala
2013-07-29 09:19:56 -07:00
Xinghao
c823ee1e2b
Replace map-reduce with dot operator using DoubleMatrix
2013-07-28 22:17:53 -07:00
Xinghao
96e04f4cb7
Fixed SVM and LR train functions to take Int instead of Double for Classification
2013-07-28 22:12:39 -07:00
Xinghao
9398dced03
Changed Classification to return Int instead of Double
...
Also minor changes to formatting and comments
2013-07-28 21:39:19 -07:00
Xinghao
67de051bbb
SVMSuite and LassoSuite rewritten to follow closely with LogisticRegressionSuite
2013-07-28 21:09:56 -07:00
Xinghao
29e042940a
Move data generators to util
2013-07-28 20:39:52 -07:00
Xinghao
ccfa362dde
Change *_LocalRandomSGD to *LocalRandomSGD
2013-07-28 10:33:57 -07:00
Xinghao
b0bbc7f6a8
Resolve conflicts with master, removed regParam for LogisticRegression
2013-07-26 18:57:39 -07:00
Xinghao
071afe2a33
New files from merge with master
2013-07-26 18:21:20 -07:00
Xinghao
10fd3949e6
Making ClassificationModel serializable
2013-07-26 17:49:11 -07:00
Xinghao
f0a1f95228
Rename LogisticRegression, SVM and Lasso to *_LocalRandomSGD
2013-07-26 17:36:14 -07:00
Xinghao
f74a03c6d8
Multiple changes
...
- Changed LogisticRegression regularization parameter to 0
- Removed println from SVM predict function
- Fixed "Lasso" -> "SVM" in SVMGenerator
- Added comment in Updater.scala to indicate L1 regularization leads to
soft thresholding proximal function
2013-07-26 17:29:44 -07:00
Xinghao
eef678703e
Adding SVM and Lasso, moving LogisticRegression to classification from regression
...
Also, add regularization parameter to SGD
2013-07-24 15:32:50 -07:00
Reynold Xin
2210e8ccf8
Use a different validation dataset for Logistic Regression prediction testing.
2013-07-23 12:52:15 -07:00
Reynold Xin
87a9dd898f
Made RegressionModel serializable and added unit tests to make sure predict methods would work.
2013-07-23 12:13:27 -07:00
Matei Zaharia
c40f0f21f1
Merge pull request #711 from shivaram/ml-generators
...
Move ML lib data generator files to util/
2013-07-19 13:33:04 -07:00
Shivaram Venkataraman
2c9ea56db4
Rename classes to be called DataGenerator
2013-07-18 11:57:14 -07:00
Shivaram Venkataraman
7ab1170503
Refactor data generators to have a function that can be used in tests.
2013-07-18 11:55:19 -07:00
Shivaram Venkataraman
217667174e
Return Array[Double] from SGD instead of DoubleMatrix
2013-07-17 16:08:34 -07:00
Shivaram Venkataraman
45f3c85518
Change weights to be Array[Double] in LR model.
...
Also ensure weights are initialized to a column vector.
2013-07-17 16:03:29 -07:00
Shivaram Venkataraman
3bf9897136
Rename loss -> stochasticLoss and add a note to explain why we have
...
multiple train methods.
2013-07-17 14:20:24 -07:00
Shivaram Venkataraman
64b88e039a
Move ML lib data generator files to util/
2013-07-17 14:11:44 -07:00
Shivaram Venkataraman
84fa20c2a1
Allow initial weight vectors in LogisticRegression.
...
Also move LogisticGradient to the LogisticRegression file and fix the
unit tests log path.
2013-07-17 14:04:05 -07:00
Matei Zaharia
af3c9d5042
Add Apache license headers and LICENSE and NOTICE files
2013-07-16 17:21:33 -07:00
Matei Zaharia
4698a0d688
Shuffle ratings in a more efficient way at start of ALS
2013-07-15 02:54:11 +00:00
Matei Zaharia
ed7fd501cf
Make number of blocks in ALS configurable and lower the default
2013-07-15 00:30:10 +00:00
Matei Zaharia
931e4c96ef
Fix a comment
2013-07-14 08:03:13 +00:00
Matei Zaharia
c5c38d1987
Some optimizations to loading phase of ALS
2013-07-14 07:59:50 +00:00
Ameet Talwalkar
bf4c9a5e0f
renamed with labeled prefix
2013-07-08 14:37:42 -07:00
ryanlecompte
be123aa6ef
update to use ListBuffer, faster than Vector for append operations
2013-07-07 15:35:06 -07:00
ryanlecompte
f78f8d0b41
fix formatting and use Vector instead of List to maintain order
2013-07-06 16:46:53 -07:00
ryanlecompte
757e56dfc7
make binSearch a tail-recursive method
2013-07-05 19:54:28 -07:00
Matei Zaharia
8bbe907556
Replaced string constants in test
2013-07-05 17:25:23 -07:00
Matei Zaharia
653043beb6
Renamed files to match package
2013-07-05 17:18:55 -07:00
Matei Zaharia
de67deeaab
Addressed style comments from Ryan LeCompte
2013-07-05 17:16:49 -07:00
Matei Zaharia
43b24635ee
Renamed ML package to MLlib and added it to classpath
2013-07-05 11:38:53 -07:00