Nick Pentreath
c6ceaeae50
Style fix using 'if' rather than 'match' on boolean
2013-10-04 13:52:53 +02:00
Nick Pentreath
6a7836cddc
Fixing closing brace indentation
2013-10-04 13:33:01 +02:00
Nick Pentreath
0bd9b373d1
Reverting to using comma-delimited split
2013-10-04 13:30:33 +02:00
Nick Pentreath
1cbdcb9cb6
Merge remote-tracking branch 'upstream/master' into implicit-als
2013-10-04 13:25:34 +02:00
Prashant Sharma
5829692885
Merge branch 'master' into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
docs/_config.yml
project/SparkBuild.scala
repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
2013-10-01 11:57:24 +05:30
Prashant Sharma
7ff4c2d399
fixed maven build for scala 2.10
2013-09-26 10:48:24 +05:30
Patrick Wendell
6079721fa1
Update build version in master
2013-09-24 11:41:51 -07:00
Nick Pentreath
d952f04c8e
Merge remote-tracking branch 'upstream/master' into implicit-als
2013-09-23 13:07:40 +02:00
Prashant Sharma
383e151fd7
Merge branch 'master' of git://github.com/mesos/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
project/SparkBuild.scala
2013-09-15 10:55:12 +05:30
Matei Zaharia
7a5c4b647b
Small tweaks to MLlib docs
2013-09-08 21:47:24 -07:00
Ameet Talwalkar
81a8bd46ac
respose to PR comments
2013-09-08 19:21:30 -07:00
Nick Pentreath
737f01a1ef
Adding algorithm for implicit feedback data to ALS
2013-09-06 14:45:05 +02:00
Prashant Sharma
4106ae9fbf
Merged with master
2013-09-06 17:53:01 +05:30
Matei Zaharia
12b2f1f9c9
Add missing license headers found with RAT
2013-09-02 12:23:03 -07:00
Matei Zaharia
0a8cc30921
Move some classes to more appropriate packages:
...
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia
5701eb92c7
Fix some URLs
2013-09-01 14:13:16 -07:00
Matei Zaharia
46eecd110a
Initial work to rename package to org.apache.spark
2013-09-01 14:13:13 -07:00
Shivaram Venkataraman
adc700582b
Fix broken build by removing addIntercept
2013-08-30 00:16:32 -07:00
Evan Sparks
016787de32
Merge pull request #863 from shivaram/etrain-ridge
...
Adding linear regression and refactoring Ridge regression to use SGD
2013-08-29 22:15:14 -07:00
Evan Sparks
852d810787
Merge pull request #819 from shivaram/sgd-cleanup
...
Change SVM to use {0,1} labels
2013-08-29 22:13:15 -07:00
Shivaram Venkataraman
dc06b52879
Add an option to turn off data validation, test it.
...
Also moves addIntercept to have default true to make it similar
to validateData option
2013-08-25 23:14:35 -07:00
Shivaram Venkataraman
b8c50a0642
Center & scale variables in Ridge, Lasso.
...
Also add a unit test that checks if ridge regression lowers
cross-validation error.
2013-08-25 22:24:27 -07:00
Matei Zaharia
215c13dd41
Fix code style and a nondeterministic RDD issue in ALS
2013-08-22 16:13:46 -07:00
Matei Zaharia
46ea0c1b47
Merge pull request #814 from holdenk/master
...
Create less instances of the random class during ALS initialization.
2013-08-22 15:57:28 -07:00
Jey Kottalam
23f4622aff
Remove redundant dependencies from POMs
2013-08-18 18:53:57 -07:00
Evan Sparks
07fe910669
Fixing typos in Java tests, and addressing alignment issues.
2013-08-18 15:03:13 -07:00
Evan Sparks
b291db712e
Centralizing linear data generator and mllib regression tests to use it.
2013-08-18 15:03:13 -07:00
Evan Sparks
b659af83d3
Adding Linear Regression, and refactoring Ridge Regression.
2013-08-18 15:03:13 -07:00
Jey Kottalam
ad580b94d5
Maven build now also works with YARN
2013-08-16 13:50:12 -07:00
Jey Kottalam
9dd15fe700
Don't mark hadoop-client as 'provided'
2013-08-16 13:50:12 -07:00
Jey Kottalam
11b42a84db
Maven build now works with CDH hadoop-2.0.0-mr1
2013-08-16 13:50:12 -07:00
Jey Kottalam
353fab2440
Initial changes to make Maven build agnostic of hadoop version
2013-08-16 13:50:12 -07:00
Holden Karau
8fc40818d7
Fix
2013-08-15 23:08:48 -07:00
Shivaram Venkataraman
c874625354
Specify label format in LogisticRegression.
2013-08-13 16:55:53 -07:00
Shivaram Venkataraman
0ab6ff4c32
Fix SVM model and unit test to work with {0,1}.
...
Also rename validateFuncs to validators.
2013-08-13 13:57:06 -07:00
Shivaram Venkataraman
654087194d
Change SVM to use {0,1} labels.
...
Also add a data validation check to make sure classification labels
are always 0 or 1 and add an appropriate test case.
2013-08-13 11:44:47 -07:00
Holden Karau
d145da818e
Code review feedback :)
2013-08-12 22:13:08 -07:00
Holden Karau
705c9ace2a
Use less instances of the random class during ALS setup
2013-08-12 22:08:36 -07:00
Matei Zaharia
9e02da2763
Merge pull request #812 from shivaram/maven-mllib-tests
...
Create SparkContext in beforeAll for MLLib tests
2013-08-12 20:22:27 -07:00
Shivaram Venkataraman
4935a2558b
Clean up scaladoc in ML Lib.
...
Also build and copy ML Lib scaladoc in Spark docs build.
Some more minor cleanup with respect to naming, test locations etc.
2013-08-11 19:02:43 -07:00
Shivaram Venkataraman
ecc9bfe377
Create SparkContext in beforeAll for MLLib tests
...
This overcomes test failures that occur using Maven
2013-08-11 17:04:00 -07:00
Evan Sparks
ff9ebfabb4
Merge pull request #762 from shivaram/sgd-cleanup
...
Refactor SGD options into a new class.
2013-08-11 10:52:55 -07:00
Shivaram Venkataraman
a65a6ed514
Fix GLM code review comments and move java tests
2013-08-10 18:54:10 -07:00
Matei Zaharia
cd247ba5bb
Merge pull request #786 from shivaram/mllib-java
...
Java fixes, tests and examples for ALS, KMeans
2013-08-09 20:41:13 -07:00
Reynold Xin
01f20a941e
Fixed a typo in mllib inline documentation.
2013-08-08 16:42:54 -07:00
Shivaram Venkataraman
2812e72200
Add setters for optimizer, gradient in SGD.
...
Also remove java-specific constructor for LabeledPoint.
2013-08-08 16:24:31 -07:00
Shivaram Venkataraman
e1a209f791
Remove Java-specific constructor for Rating.
...
The scala constructor works for native type java types. Modify examples
to match this.
2013-08-08 14:36:02 -07:00
Shivaram Venkataraman
338b7a7455
Merge branch 'master' of git://github.com/mesos/spark into sgd-cleanup
...
Conflicts:
mllib/src/main/scala/spark/mllib/util/MLUtils.scala
2013-08-06 21:21:55 -07:00
Shivaram Venkataraman
7db69d56f2
Refactor GLM algorithms and add Java tests
...
This change adds Java examples and unit tests for all GLM algorithms
to make sure the MLLib interface works from Java. Changes include
- Introduce LabeledPoint and avoid using Doubles in train arguments
- Rename train to run in class methods
- Make the optimizer a member variable of GLM to make sure the builder
pattern works
2013-08-06 17:23:22 -07:00
Shivaram Venkataraman
6caec3f441
Add a test case for random initialization.
...
Also workaround a bug where double[][] class cast fails
2013-08-06 16:35:47 -07:00
Shivaram Venkataraman
471fbadd0c
Java examples, tests for KMeans and ALS
...
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
examples project.
Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
KMeans init
2013-08-06 15:43:46 -07:00
Ginger Smith
bf7033f3eb
fixing formatting, style, and input
2013-08-05 21:26:24 -07:00
Ginger Smith
8c8947e2b6
fixing formatting
2013-08-05 11:22:18 -07:00
Shivaram Venkataraman
7388e27668
Move implicit arg to constructor for Java access.
2013-08-03 18:08:43 -07:00
Ginger Smith
4ab4df5edb
adding matrix factorization data generator
2013-08-02 22:22:36 -07:00
Shivaram Venkataraman
00339cc032
Refactor optimizers and create GLMs
...
This change refactors the structure of GLMs to use mixins which maintain
a similar interface to other ML lib algorithms. This change also creates
an Optimizer trait which allows GLMs to be extended to use other optimization
techniques.
2013-08-02 19:15:34 -07:00
Matei Zaharia
abfa9e6f70
Increase Kryo buffer size in ALS since some arrays become big
2013-08-02 16:17:32 -07:00
shivaram
58756b72f1
Merge pull request #761 from mateiz/kmeans-generator
...
Add data generator for K-means
2013-07-31 23:45:41 -07:00
Matei Zaharia
52dba89261
Turn on caching in KMeans.main
2013-07-31 23:08:12 -07:00
Matei Zaharia
b2b86c2575
Merge pull request #753 from shivaram/glm-refactor
...
Build changes for ML lib
2013-07-31 15:51:39 -07:00
Matei Zaharia
f607ffb9e1
Added data generator for K-means
...
Also made it possible to specify the number of runs in KMeans.main().
2013-07-31 14:31:07 -07:00
Shivaram Venkataraman
cef178873b
Refactor SGD options into a new class.
...
This refactoring pulls out code shared between SVM, Lasso, LR into
a common GradientDescentOpts class. Some style cleanup as well
2013-07-31 14:15:17 -07:00
Matei Zaharia
9a444cffe7
Use the Char version of split() instead of the String one for efficiency
2013-07-31 11:28:39 -07:00
Shivaram Venkataraman
48851d4dd9
Add bagel, mllib to SBT assembly.
...
Also add jblas dependency to mllib pom.xml
2013-07-30 14:03:15 -07:00
Reynold Xin
366f7735eb
Minor style cleanup of mllib.
2013-07-30 13:59:32 -07:00
Reynold Xin
47011e6854
Use a tigher bound in logistic regression unit test's prediction validation.
2013-07-30 13:58:23 -07:00
Reynold Xin
e35966ae9a
Renamed Classification.scala to ClassificationModel.scala and Regression.scala to RegressionModel.scala
2013-07-30 13:28:31 -07:00
Ameet Talwalkar
e4387ddf5d
made SimpleUpdater consistent with other updaters
2013-07-29 22:21:50 -07:00
Shivaram Venkataraman
3ca9faa341
Clarify how regVal is computed in Updater docs
2013-07-29 18:37:28 -07:00
Shivaram Venkataraman
07da72b451
Remove duplicate loss history and clarify why.
...
Also some minor style fixes.
2013-07-29 16:25:17 -07:00
Xinghao
2b2630ba3c
Style fix
...
Lines shortened to < 100 characters
2013-07-29 09:22:49 -07:00
Xinghao
07f17439a5
Fix validatePrediction functions for Classification models
...
Classifiers return categorical (Int) values that should be compared
directly
2013-07-29 09:22:31 -07:00
Xinghao
3a8d07df8c
Deleting extra LogisticRegressionGenerator and RidgeRegressionGenerator
2013-07-29 09:20:26 -07:00
Xinghao
75f3757300
Fix rounding error in LogisticRegression.scala
2013-07-29 09:19:56 -07:00
Xinghao
c823ee1e2b
Replace map-reduce with dot operator using DoubleMatrix
2013-07-28 22:17:53 -07:00
Xinghao
96e04f4cb7
Fixed SVM and LR train functions to take Int instead of Double for Classification
2013-07-28 22:12:39 -07:00
Xinghao
9398dced03
Changed Classification to return Int instead of Double
...
Also minor changes to formatting and comments
2013-07-28 21:39:19 -07:00
Xinghao
67de051bbb
SVMSuite and LassoSuite rewritten to follow closely with LogisticRegressionSuite
2013-07-28 21:09:56 -07:00
Xinghao
29e042940a
Move data generators to util
2013-07-28 20:39:52 -07:00
Xinghao
ccfa362dde
Change *_LocalRandomSGD to *LocalRandomSGD
2013-07-28 10:33:57 -07:00
Xinghao
b0bbc7f6a8
Resolve conflicts with master, removed regParam for LogisticRegression
2013-07-26 18:57:39 -07:00
Xinghao
071afe2a33
New files from merge with master
2013-07-26 18:21:20 -07:00
Xinghao
10fd3949e6
Making ClassificationModel serializable
2013-07-26 17:49:11 -07:00
Xinghao
f0a1f95228
Rename LogisticRegression, SVM and Lasso to *_LocalRandomSGD
2013-07-26 17:36:14 -07:00
Xinghao
f74a03c6d8
Multiple changes
...
- Changed LogisticRegression regularization parameter to 0
- Removed println from SVM predict function
- Fixed "Lasso" -> "SVM" in SVMGenerator
- Added comment in Updater.scala to indicate L1 regularization leads to
soft thresholding proximal function
2013-07-26 17:29:44 -07:00
Xinghao
eef678703e
Adding SVM and Lasso, moving LogisticRegression to classification from regression
...
Also, add regularization parameter to SGD
2013-07-24 15:32:50 -07:00
Reynold Xin
2210e8ccf8
Use a different validation dataset for Logistic Regression prediction testing.
2013-07-23 12:52:15 -07:00
Reynold Xin
87a9dd898f
Made RegressionModel serializable and added unit tests to make sure predict methods would work.
2013-07-23 12:13:27 -07:00
Matei Zaharia
c40f0f21f1
Merge pull request #711 from shivaram/ml-generators
...
Move ML lib data generator files to util/
2013-07-19 13:33:04 -07:00
Shivaram Venkataraman
2c9ea56db4
Rename classes to be called DataGenerator
2013-07-18 11:57:14 -07:00
Shivaram Venkataraman
7ab1170503
Refactor data generators to have a function that can be used in tests.
2013-07-18 11:55:19 -07:00
Shivaram Venkataraman
217667174e
Return Array[Double] from SGD instead of DoubleMatrix
2013-07-17 16:08:34 -07:00
Shivaram Venkataraman
45f3c85518
Change weights to be Array[Double] in LR model.
...
Also ensure weights are initialized to a column vector.
2013-07-17 16:03:29 -07:00
Shivaram Venkataraman
3bf9897136
Rename loss -> stochasticLoss and add a note to explain why we have
...
multiple train methods.
2013-07-17 14:20:24 -07:00
Shivaram Venkataraman
64b88e039a
Move ML lib data generator files to util/
2013-07-17 14:11:44 -07:00
Shivaram Venkataraman
84fa20c2a1
Allow initial weight vectors in LogisticRegression.
...
Also move LogisticGradient to the LogisticRegression file and fix the
unit tests log path.
2013-07-17 14:04:05 -07:00
Matei Zaharia
af3c9d5042
Add Apache license headers and LICENSE and NOTICE files
2013-07-16 17:21:33 -07:00
Matei Zaharia
4698a0d688
Shuffle ratings in a more efficient way at start of ALS
2013-07-15 02:54:11 +00:00
Matei Zaharia
ed7fd501cf
Make number of blocks in ALS configurable and lower the default
2013-07-15 00:30:10 +00:00
Matei Zaharia
931e4c96ef
Fix a comment
2013-07-14 08:03:13 +00:00
Matei Zaharia
c5c38d1987
Some optimizations to loading phase of ALS
2013-07-14 07:59:50 +00:00
Ameet Talwalkar
bf4c9a5e0f
renamed with labeled prefix
2013-07-08 14:37:42 -07:00
ryanlecompte
be123aa6ef
update to use ListBuffer, faster than Vector for append operations
2013-07-07 15:35:06 -07:00
ryanlecompte
f78f8d0b41
fix formatting and use Vector instead of List to maintain order
2013-07-06 16:46:53 -07:00
ryanlecompte
757e56dfc7
make binSearch a tail-recursive method
2013-07-05 19:54:28 -07:00
Matei Zaharia
8bbe907556
Replaced string constants in test
2013-07-05 17:25:23 -07:00
Matei Zaharia
653043beb6
Renamed files to match package
2013-07-05 17:18:55 -07:00
Matei Zaharia
de67deeaab
Addressed style comments from Ryan LeCompte
2013-07-05 17:16:49 -07:00
Matei Zaharia
43b24635ee
Renamed ML package to MLlib and added it to classpath
2013-07-05 11:38:53 -07:00