Shivaram Venkataraman
cef178873b
Refactor SGD options into a new class.
...
This refactoring pulls out code shared between SVM, Lasso, LR into
a common GradientDescentOpts class. Some style cleanup as well
2013-07-31 14:15:17 -07:00
Matei Zaharia
9a444cffe7
Use the Char version of split() instead of the String one for efficiency
2013-07-31 11:28:39 -07:00
Shivaram Venkataraman
48851d4dd9
Add bagel, mllib to SBT assembly.
...
Also add jblas dependency to mllib pom.xml
2013-07-30 14:03:15 -07:00
Reynold Xin
366f7735eb
Minor style cleanup of mllib.
2013-07-30 13:59:32 -07:00
Reynold Xin
47011e6854
Use a tigher bound in logistic regression unit test's prediction validation.
2013-07-30 13:58:23 -07:00
Reynold Xin
e35966ae9a
Renamed Classification.scala to ClassificationModel.scala and Regression.scala to RegressionModel.scala
2013-07-30 13:28:31 -07:00
Ameet Talwalkar
e4387ddf5d
made SimpleUpdater consistent with other updaters
2013-07-29 22:21:50 -07:00
Shivaram Venkataraman
3ca9faa341
Clarify how regVal is computed in Updater docs
2013-07-29 18:37:28 -07:00
Shivaram Venkataraman
07da72b451
Remove duplicate loss history and clarify why.
...
Also some minor style fixes.
2013-07-29 16:25:17 -07:00
Xinghao
2b2630ba3c
Style fix
...
Lines shortened to < 100 characters
2013-07-29 09:22:49 -07:00
Xinghao
07f17439a5
Fix validatePrediction functions for Classification models
...
Classifiers return categorical (Int) values that should be compared
directly
2013-07-29 09:22:31 -07:00
Xinghao
3a8d07df8c
Deleting extra LogisticRegressionGenerator and RidgeRegressionGenerator
2013-07-29 09:20:26 -07:00
Xinghao
75f3757300
Fix rounding error in LogisticRegression.scala
2013-07-29 09:19:56 -07:00
Xinghao
c823ee1e2b
Replace map-reduce with dot operator using DoubleMatrix
2013-07-28 22:17:53 -07:00
Xinghao
96e04f4cb7
Fixed SVM and LR train functions to take Int instead of Double for Classification
2013-07-28 22:12:39 -07:00
Xinghao
9398dced03
Changed Classification to return Int instead of Double
...
Also minor changes to formatting and comments
2013-07-28 21:39:19 -07:00
Xinghao
67de051bbb
SVMSuite and LassoSuite rewritten to follow closely with LogisticRegressionSuite
2013-07-28 21:09:56 -07:00
Xinghao
29e042940a
Move data generators to util
2013-07-28 20:39:52 -07:00
Xinghao
ccfa362dde
Change *_LocalRandomSGD to *LocalRandomSGD
2013-07-28 10:33:57 -07:00
Xinghao
b0bbc7f6a8
Resolve conflicts with master, removed regParam for LogisticRegression
2013-07-26 18:57:39 -07:00
Xinghao
071afe2a33
New files from merge with master
2013-07-26 18:21:20 -07:00
Xinghao
10fd3949e6
Making ClassificationModel serializable
2013-07-26 17:49:11 -07:00
Xinghao
f0a1f95228
Rename LogisticRegression, SVM and Lasso to *_LocalRandomSGD
2013-07-26 17:36:14 -07:00
Xinghao
f74a03c6d8
Multiple changes
...
- Changed LogisticRegression regularization parameter to 0
- Removed println from SVM predict function
- Fixed "Lasso" -> "SVM" in SVMGenerator
- Added comment in Updater.scala to indicate L1 regularization leads to
soft thresholding proximal function
2013-07-26 17:29:44 -07:00
Xinghao
eef678703e
Adding SVM and Lasso, moving LogisticRegression to classification from regression
...
Also, add regularization parameter to SGD
2013-07-24 15:32:50 -07:00
Reynold Xin
2210e8ccf8
Use a different validation dataset for Logistic Regression prediction testing.
2013-07-23 12:52:15 -07:00
Reynold Xin
87a9dd898f
Made RegressionModel serializable and added unit tests to make sure predict methods would work.
2013-07-23 12:13:27 -07:00
Matei Zaharia
c40f0f21f1
Merge pull request #711 from shivaram/ml-generators
...
Move ML lib data generator files to util/
2013-07-19 13:33:04 -07:00
Shivaram Venkataraman
2c9ea56db4
Rename classes to be called DataGenerator
2013-07-18 11:57:14 -07:00
Shivaram Venkataraman
7ab1170503
Refactor data generators to have a function that can be used in tests.
2013-07-18 11:55:19 -07:00
Shivaram Venkataraman
217667174e
Return Array[Double] from SGD instead of DoubleMatrix
2013-07-17 16:08:34 -07:00
Shivaram Venkataraman
45f3c85518
Change weights to be Array[Double] in LR model.
...
Also ensure weights are initialized to a column vector.
2013-07-17 16:03:29 -07:00
Shivaram Venkataraman
3bf9897136
Rename loss -> stochasticLoss and add a note to explain why we have
...
multiple train methods.
2013-07-17 14:20:24 -07:00
Shivaram Venkataraman
64b88e039a
Move ML lib data generator files to util/
2013-07-17 14:11:44 -07:00
Shivaram Venkataraman
84fa20c2a1
Allow initial weight vectors in LogisticRegression.
...
Also move LogisticGradient to the LogisticRegression file and fix the
unit tests log path.
2013-07-17 14:04:05 -07:00
Matei Zaharia
af3c9d5042
Add Apache license headers and LICENSE and NOTICE files
2013-07-16 17:21:33 -07:00
Matei Zaharia
4698a0d688
Shuffle ratings in a more efficient way at start of ALS
2013-07-15 02:54:11 +00:00
Matei Zaharia
ed7fd501cf
Make number of blocks in ALS configurable and lower the default
2013-07-15 00:30:10 +00:00
Matei Zaharia
931e4c96ef
Fix a comment
2013-07-14 08:03:13 +00:00
Matei Zaharia
c5c38d1987
Some optimizations to loading phase of ALS
2013-07-14 07:59:50 +00:00
Ameet Talwalkar
bf4c9a5e0f
renamed with labeled prefix
2013-07-08 14:37:42 -07:00
ryanlecompte
be123aa6ef
update to use ListBuffer, faster than Vector for append operations
2013-07-07 15:35:06 -07:00
ryanlecompte
f78f8d0b41
fix formatting and use Vector instead of List to maintain order
2013-07-06 16:46:53 -07:00
ryanlecompte
757e56dfc7
make binSearch a tail-recursive method
2013-07-05 19:54:28 -07:00
Matei Zaharia
8bbe907556
Replaced string constants in test
2013-07-05 17:25:23 -07:00
Matei Zaharia
653043beb6
Renamed files to match package
2013-07-05 17:18:55 -07:00
Matei Zaharia
de67deeaab
Addressed style comments from Ryan LeCompte
2013-07-05 17:16:49 -07:00
Matei Zaharia
43b24635ee
Renamed ML package to MLlib and added it to classpath
2013-07-05 11:38:53 -07:00