Nick Pentreath
b0f5f4d441
Bumping up test matrix size to eliminate random failures
2013-10-07 11:44:22 +02:00
Nick Pentreath
c6ceaeae50
Style fix using 'if' rather than 'match' on boolean
2013-10-04 13:52:53 +02:00
Nick Pentreath
6a7836cddc
Fixing closing brace indentation
2013-10-04 13:33:01 +02:00
Nick Pentreath
0bd9b373d1
Reverting to using comma-delimited split
2013-10-04 13:30:33 +02:00
Nick Pentreath
d952f04c8e
Merge remote-tracking branch 'upstream/master' into implicit-als
2013-09-23 13:07:40 +02:00
Matei Zaharia
7a5c4b647b
Small tweaks to MLlib docs
2013-09-08 21:47:24 -07:00
Nick Pentreath
737f01a1ef
Adding algorithm for implicit feedback data to ALS
2013-09-06 14:45:05 +02:00
Matei Zaharia
12b2f1f9c9
Add missing license headers found with RAT
2013-09-02 12:23:03 -07:00
Matei Zaharia
0a8cc30921
Move some classes to more appropriate packages:
...
* RDD, *RDDFunctions -> org.apache.spark.rdd
* Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util
* JavaSerializer, KryoSerializer -> org.apache.spark.serializer
2013-09-01 14:13:16 -07:00
Matei Zaharia
46eecd110a
Initial work to rename package to org.apache.spark
2013-09-01 14:13:13 -07:00
Shivaram Venkataraman
adc700582b
Fix broken build by removing addIntercept
2013-08-30 00:16:32 -07:00
Evan Sparks
016787de32
Merge pull request #863 from shivaram/etrain-ridge
...
Adding linear regression and refactoring Ridge regression to use SGD
2013-08-29 22:15:14 -07:00
Evan Sparks
852d810787
Merge pull request #819 from shivaram/sgd-cleanup
...
Change SVM to use {0,1} labels
2013-08-29 22:13:15 -07:00
Shivaram Venkataraman
dc06b52879
Add an option to turn off data validation, test it.
...
Also moves addIntercept to have default true to make it similar
to validateData option
2013-08-25 23:14:35 -07:00
Shivaram Venkataraman
b8c50a0642
Center & scale variables in Ridge, Lasso.
...
Also add a unit test that checks if ridge regression lowers
cross-validation error.
2013-08-25 22:24:27 -07:00
Matei Zaharia
215c13dd41
Fix code style and a nondeterministic RDD issue in ALS
2013-08-22 16:13:46 -07:00
Evan Sparks
07fe910669
Fixing typos in Java tests, and addressing alignment issues.
2013-08-18 15:03:13 -07:00
Evan Sparks
b291db712e
Centralizing linear data generator and mllib regression tests to use it.
2013-08-18 15:03:13 -07:00
Evan Sparks
b659af83d3
Adding Linear Regression, and refactoring Ridge Regression.
2013-08-18 15:03:13 -07:00
Holden Karau
8fc40818d7
Fix
2013-08-15 23:08:48 -07:00
Shivaram Venkataraman
c874625354
Specify label format in LogisticRegression.
2013-08-13 16:55:53 -07:00
Shivaram Venkataraman
0ab6ff4c32
Fix SVM model and unit test to work with {0,1}.
...
Also rename validateFuncs to validators.
2013-08-13 13:57:06 -07:00
Shivaram Venkataraman
654087194d
Change SVM to use {0,1} labels.
...
Also add a data validation check to make sure classification labels
are always 0 or 1 and add an appropriate test case.
2013-08-13 11:44:47 -07:00
Holden Karau
d145da818e
Code review feedback :)
2013-08-12 22:13:08 -07:00
Holden Karau
705c9ace2a
Use less instances of the random class during ALS setup
2013-08-12 22:08:36 -07:00
Matei Zaharia
9e02da2763
Merge pull request #812 from shivaram/maven-mllib-tests
...
Create SparkContext in beforeAll for MLLib tests
2013-08-12 20:22:27 -07:00
Shivaram Venkataraman
4935a2558b
Clean up scaladoc in ML Lib.
...
Also build and copy ML Lib scaladoc in Spark docs build.
Some more minor cleanup with respect to naming, test locations etc.
2013-08-11 19:02:43 -07:00
Shivaram Venkataraman
ecc9bfe377
Create SparkContext in beforeAll for MLLib tests
...
This overcomes test failures that occur using Maven
2013-08-11 17:04:00 -07:00
Evan Sparks
ff9ebfabb4
Merge pull request #762 from shivaram/sgd-cleanup
...
Refactor SGD options into a new class.
2013-08-11 10:52:55 -07:00
Shivaram Venkataraman
a65a6ed514
Fix GLM code review comments and move java tests
2013-08-10 18:54:10 -07:00
Matei Zaharia
cd247ba5bb
Merge pull request #786 from shivaram/mllib-java
...
Java fixes, tests and examples for ALS, KMeans
2013-08-09 20:41:13 -07:00
Reynold Xin
01f20a941e
Fixed a typo in mllib inline documentation.
2013-08-08 16:42:54 -07:00
Shivaram Venkataraman
2812e72200
Add setters for optimizer, gradient in SGD.
...
Also remove java-specific constructor for LabeledPoint.
2013-08-08 16:24:31 -07:00
Shivaram Venkataraman
e1a209f791
Remove Java-specific constructor for Rating.
...
The scala constructor works for native type java types. Modify examples
to match this.
2013-08-08 14:36:02 -07:00
Shivaram Venkataraman
338b7a7455
Merge branch 'master' of git://github.com/mesos/spark into sgd-cleanup
...
Conflicts:
mllib/src/main/scala/spark/mllib/util/MLUtils.scala
2013-08-06 21:21:55 -07:00
Shivaram Venkataraman
7db69d56f2
Refactor GLM algorithms and add Java tests
...
This change adds Java examples and unit tests for all GLM algorithms
to make sure the MLLib interface works from Java. Changes include
- Introduce LabeledPoint and avoid using Doubles in train arguments
- Rename train to run in class methods
- Make the optimizer a member variable of GLM to make sure the builder
pattern works
2013-08-06 17:23:22 -07:00
Shivaram Venkataraman
6caec3f441
Add a test case for random initialization.
...
Also workaround a bug where double[][] class cast fails
2013-08-06 16:35:47 -07:00
Shivaram Venkataraman
471fbadd0c
Java examples, tests for KMeans and ALS
...
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
examples project.
Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
KMeans init
2013-08-06 15:43:46 -07:00
Ginger Smith
bf7033f3eb
fixing formatting, style, and input
2013-08-05 21:26:24 -07:00
Ginger Smith
8c8947e2b6
fixing formatting
2013-08-05 11:22:18 -07:00
Shivaram Venkataraman
7388e27668
Move implicit arg to constructor for Java access.
2013-08-03 18:08:43 -07:00
Ginger Smith
4ab4df5edb
adding matrix factorization data generator
2013-08-02 22:22:36 -07:00
Shivaram Venkataraman
00339cc032
Refactor optimizers and create GLMs
...
This change refactors the structure of GLMs to use mixins which maintain
a similar interface to other ML lib algorithms. This change also creates
an Optimizer trait which allows GLMs to be extended to use other optimization
techniques.
2013-08-02 19:15:34 -07:00
Matei Zaharia
abfa9e6f70
Increase Kryo buffer size in ALS since some arrays become big
2013-08-02 16:17:32 -07:00
shivaram
58756b72f1
Merge pull request #761 from mateiz/kmeans-generator
...
Add data generator for K-means
2013-07-31 23:45:41 -07:00
Matei Zaharia
52dba89261
Turn on caching in KMeans.main
2013-07-31 23:08:12 -07:00
Matei Zaharia
f607ffb9e1
Added data generator for K-means
...
Also made it possible to specify the number of runs in KMeans.main().
2013-07-31 14:31:07 -07:00
Shivaram Venkataraman
cef178873b
Refactor SGD options into a new class.
...
This refactoring pulls out code shared between SVM, Lasso, LR into
a common GradientDescentOpts class. Some style cleanup as well
2013-07-31 14:15:17 -07:00
Matei Zaharia
9a444cffe7
Use the Char version of split() instead of the String one for efficiency
2013-07-31 11:28:39 -07:00
Reynold Xin
366f7735eb
Minor style cleanup of mllib.
2013-07-30 13:59:32 -07:00