Matei Zaharia
d3ce898b8e
Scaffolding and model for K-means
2013-07-05 11:13:46 -07:00
Matei Zaharia
3c046a6eca
Some small fixes to ALS.
2013-07-05 11:13:46 -07:00
Matei Zaharia
6f0ebb2db2
Remove unused import
2013-07-05 11:13:46 -07:00
Matei Zaharia
d903b3887f
Initial implementation of Alternating Least Squares.
...
Includes unit tests and sample data to run on.
2013-07-05 11:13:46 -07:00
Matei Zaharia
05be233ce2
Removed dependency on Apache Commons Math
2013-07-05 11:13:46 -07:00
Shivaram Venkataraman
39ed41652b
Move to regression, util and gradient packages
2013-07-05 11:13:46 -07:00
Shivaram Venkataraman
43b398db6a
Fix logistic regression to not center data.
...
Also add a feature to get the intercept correct and test these
using a small unit test.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
6dd3a816c8
Use a private constructor instead of private vars
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
76acc9fe9d
Make regression arguments private and add method to predict one point
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
3a6924cb8f
Clean up some comments.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
6aadaf4d71
Move normalization to MLUtils and remove Regression trait.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
2d0e64900e
Convert regression classes to builder pattern.
...
Remove extraneous methods and classes
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
ed32ec2b3b
Update test based on interface changes
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
fd137bd7c6
Address Reynold's comments. Also use a builder pattern to construct the regression classes.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
48770419bd
Add random data used for LR testing.
...
Verified that results match with glm in R
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
282c8ed788
Add LogisticRegression using StochasticGradientDescent.
...
Also refactor RidgeRegression and LogisticRegression to re-use code
and update the test as well
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
b9d9b6f981
Add a unit test for Ridge Regression
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
4dc13bf5be
Revert back to closed form CV error
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
c8169c0a33
Add LPSA data set.
...
Data from
http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/prostate.data
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
c070decb8e
Add methods to normalize the data before training
...
Also update model after training based appropriately.
2013-07-05 11:13:45 -07:00
Reynold Xin
6a9a9a364c
Minor clean up of the RidgeRegression code. I am not even sure why I did
...
this :s.
2013-07-05 11:13:45 -07:00
Matei Zaharia
729e463f64
Import RidgeRegression example
...
Conflicts:
run
2013-07-05 11:13:41 -07:00
Matei Zaharia
6ad85d0918
Merge pull request #677 from jerryshao/fix_stage_clean
...
Clean StageToInfos periodically when spark.cleaner.ttl is enabled
2013-07-04 21:32:29 -07:00
jerryshao
e4ff544a8d
Clean StageToInfos periodically when spark.cleaner.ttl is enabled
2013-07-05 10:34:45 +08:00
Matei Zaharia
6d60fe571a
Merge pull request #666 from c0s/master
...
hbase dependency is missed in hadoop2-yarn profile of examples module
2013-07-01 18:24:03 -07:00
Konstantin Boudnik
6fdbc68f2c
Fixing missed hbase dependency in examples hadoop2-yarn profile
2013-07-01 17:45:07 -07:00
root
7cd490ef5b
Clarify that PySpark is not supported on Windows
2013-07-01 06:26:43 +00:00
root
ec31e68d5d
Fixed PySpark perf regression by not using socket.makefile(), and improved
...
debuggability by letting "print" statements show up in the executor's stderr
Conflicts:
core/src/main/scala/spark/api/python/PythonRDD.scala
2013-07-01 06:26:31 +00:00
root
3296d132b6
Fix performance bug with new Python code not using buffered streams
2013-07-01 06:25:43 +00:00
Matei Zaharia
39ae073b5c
Increase SLF4j version in Maven too
2013-06-30 17:11:14 -07:00
Matei Zaharia
5bbd0eec84
Update docs on SCALA_LIBRARY_PATH
2013-06-30 17:00:40 -07:00
Matei Zaharia
03d0b858c8
Made use of spark.executor.memory setting consistent and documented it
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
2013-06-30 15:46:46 -07:00
Matei Zaharia
ccfe953a4d
Merge pull request #577 from skumargithub/master
...
Example of cumulative counting using updateStateByKey
2013-06-29 17:57:53 -07:00
Matei Zaharia
5cfcd3c336
Remove Twitter4J specific repo since it's in Maven central
2013-06-29 15:37:27 -07:00
Matei Zaharia
4358acfe07
Initialize Twitter4J OAuth from system properties instead of prompting
2013-06-29 15:25:06 -07:00
Matei Zaharia
1667158544
Merge remote-tracking branch 'mrpotes/master'
2013-06-29 14:36:09 -07:00
Matei Zaharia
50ca17635a
Merge pull request #664 from pwendell/test-fix
...
Removing incorrect test statement
2013-06-27 22:24:52 -07:00
Matei Zaharia
4974b658ed
Look at JAVA_HOME before PATH to determine Java executable
2013-06-27 22:16:40 -07:00
Patrick Wendell
c767e74370
Removing incorrect test statement
2013-06-27 21:48:58 -07:00
Matei Zaharia
aea727f68d
Simplify Python docs a little to do substring search
2013-06-26 21:15:09 -07:00
Matei Zaharia
03906f7f0a
Fixes to compute-classpath on Windows
2013-06-26 17:40:22 -07:00
Matei Zaharia
e49bc8ca8c
Merge pull request #663 from stephenh/option_and_getenv
...
Be cute with Option and getenv.
2013-06-26 11:13:33 -07:00
Stephen Haberman
d7011632d1
Wrap lines.
2013-06-26 12:35:57 -05:00
Stephen Haberman
d11025dc6a
Be cute with Option and getenv.
2013-06-26 09:53:35 -05:00
Matei Zaharia
32370da4e4
Don't use forward slash in exclusion for JAR signature files
2013-06-25 22:08:19 -04:00
Matei Zaharia
9f0d913295
Refactored tests to share SparkContexts in some of them
...
Creating these seems to take a while and clutters the output with Akka
stuff, so it would be nice to share them.
2013-06-25 19:18:30 -04:00
Matei Zaharia
2bd04c3513
Formatting
2013-06-25 18:37:14 -04:00
Matei Zaharia
f2263350ed
Added a local-cluster mode test to ReplSuite
2013-06-25 18:35:35 -04:00
Matei Zaharia
6c8d1b2ca6
Fix computation of classpath when we launch java directly
...
The previous version assumed that a CLASSPATH environment variable was
set by the "run" script when launching the process that starts the
ExecutorRunner, but unfortunately this is not true in tests. Instead, we
factor the classpath calculation into an extenral script and call that.
NOTE: This includes a Windows version but hasn't yet been tested there.
2013-06-25 18:21:00 -04:00
James Phillpotts
176193b1e8
Fix usage and parameter extraction
2013-06-25 23:06:15 +01:00