Commit graph

3012 commits

Author SHA1 Message Date
Matei Zaharia 6f0ebb2db2 Remove unused import 2013-07-05 11:13:46 -07:00
Matei Zaharia d903b3887f Initial implementation of Alternating Least Squares.
Includes unit tests and sample data to run on.
2013-07-05 11:13:46 -07:00
Matei Zaharia 05be233ce2 Removed dependency on Apache Commons Math 2013-07-05 11:13:46 -07:00
Shivaram Venkataraman 39ed41652b Move to regression, util and gradient packages 2013-07-05 11:13:46 -07:00
Shivaram Venkataraman 43b398db6a Fix logistic regression to not center data.
Also add a feature to get the intercept correct and test these
using a small unit test.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 6dd3a816c8 Use a private constructor instead of private vars 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 76acc9fe9d Make regression arguments private and add method to predict one point 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 3a6924cb8f Clean up some comments. 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 6aadaf4d71 Move normalization to MLUtils and remove Regression trait. 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 2d0e64900e Convert regression classes to builder pattern.
Remove extraneous methods and classes
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman ed32ec2b3b Update test based on interface changes 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman fd137bd7c6 Address Reynold's comments. Also use a builder pattern to construct the regression classes. 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 48770419bd Add random data used for LR testing.
Verified that results match with glm in R
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 282c8ed788 Add LogisticRegression using StochasticGradientDescent.
Also refactor RidgeRegression and LogisticRegression to re-use code
and update the test as well
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman b9d9b6f981 Add a unit test for Ridge Regression 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 4dc13bf5be Revert back to closed form CV error 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman c8169c0a33 Add LPSA data set.
Data from
http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/prostate.data
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman c070decb8e Add methods to normalize the data before training
Also update model after training based appropriately.
2013-07-05 11:13:45 -07:00
Reynold Xin 6a9a9a364c Minor clean up of the RidgeRegression code. I am not even sure why I did
this :s.
2013-07-05 11:13:45 -07:00
Matei Zaharia 729e463f64 Import RidgeRegression example
Conflicts:
	run
2013-07-05 11:13:41 -07:00
Matei Zaharia 6ad85d0918 Merge pull request #677 from jerryshao/fix_stage_clean
Clean StageToInfos periodically when spark.cleaner.ttl is enabled
2013-07-04 21:32:29 -07:00
jerryshao e4ff544a8d Clean StageToInfos periodically when spark.cleaner.ttl is enabled 2013-07-05 10:34:45 +08:00
Matei Zaharia 6d60fe571a Merge pull request #666 from c0s/master
hbase dependency is missed in hadoop2-yarn profile of examples module
2013-07-01 18:24:03 -07:00
Konstantin Boudnik 6fdbc68f2c Fixing missed hbase dependency in examples hadoop2-yarn profile 2013-07-01 17:45:07 -07:00
root 7cd490ef5b Clarify that PySpark is not supported on Windows 2013-07-01 06:26:43 +00:00
root ec31e68d5d Fixed PySpark perf regression by not using socket.makefile(), and improved
debuggability by letting "print" statements show up in the executor's stderr

Conflicts:
	core/src/main/scala/spark/api/python/PythonRDD.scala
2013-07-01 06:26:31 +00:00
root 3296d132b6 Fix performance bug with new Python code not using buffered streams 2013-07-01 06:25:43 +00:00
Matei Zaharia 39ae073b5c Increase SLF4j version in Maven too 2013-06-30 17:11:14 -07:00
Matei Zaharia 5bbd0eec84 Update docs on SCALA_LIBRARY_PATH 2013-06-30 17:00:40 -07:00
Matei Zaharia 03d0b858c8 Made use of spark.executor.memory setting consistent and documented it
Conflicts:

	core/src/main/scala/spark/SparkContext.scala
2013-06-30 15:46:46 -07:00
Matei Zaharia ccfe953a4d Merge pull request #577 from skumargithub/master
Example of cumulative counting using updateStateByKey
2013-06-29 17:57:53 -07:00
Matei Zaharia 5cfcd3c336 Remove Twitter4J specific repo since it's in Maven central 2013-06-29 15:37:27 -07:00
Matei Zaharia 4358acfe07 Initialize Twitter4J OAuth from system properties instead of prompting 2013-06-29 15:25:06 -07:00
Matei Zaharia 1667158544 Merge remote-tracking branch 'mrpotes/master' 2013-06-29 14:36:09 -07:00
Matei Zaharia 50ca17635a Merge pull request #664 from pwendell/test-fix
Removing incorrect test statement
2013-06-27 22:24:52 -07:00
Matei Zaharia 4974b658ed Look at JAVA_HOME before PATH to determine Java executable 2013-06-27 22:16:40 -07:00
Patrick Wendell c767e74370 Removing incorrect test statement 2013-06-27 21:48:58 -07:00
Matei Zaharia aea727f68d Simplify Python docs a little to do substring search 2013-06-26 21:15:09 -07:00
Matei Zaharia 03906f7f0a Fixes to compute-classpath on Windows 2013-06-26 17:40:22 -07:00
Matei Zaharia e49bc8ca8c Merge pull request #663 from stephenh/option_and_getenv
Be cute with Option and getenv.
2013-06-26 11:13:33 -07:00
Stephen Haberman d7011632d1 Wrap lines. 2013-06-26 12:35:57 -05:00
Stephen Haberman d11025dc6a Be cute with Option and getenv. 2013-06-26 09:53:35 -05:00
Matei Zaharia 32370da4e4 Don't use forward slash in exclusion for JAR signature files 2013-06-25 22:08:19 -04:00
Matei Zaharia 9f0d913295 Refactored tests to share SparkContexts in some of them
Creating these seems to take a while and clutters the output with Akka
stuff, so it would be nice to share them.
2013-06-25 19:18:30 -04:00
Matei Zaharia 2bd04c3513 Formatting 2013-06-25 18:37:14 -04:00
Matei Zaharia f2263350ed Added a local-cluster mode test to ReplSuite 2013-06-25 18:35:35 -04:00
Matei Zaharia 6c8d1b2ca6 Fix computation of classpath when we launch java directly
The previous version assumed that a CLASSPATH environment variable was
set by the "run" script when launching the process that starts the
ExecutorRunner, but unfortunately this is not true in tests. Instead, we
factor the classpath calculation into an extenral script and call that.

NOTE: This includes a Windows version but hasn't yet been tested there.
2013-06-25 18:21:00 -04:00
James Phillpotts 176193b1e8 Fix usage and parameter extraction 2013-06-25 23:06:15 +01:00
James Phillpotts 366572edca Include a default OAuth implementation, and update examples and JavaStreamingContext 2013-06-25 22:59:34 +01:00
Matei Zaharia 15b00914c5 Some fixes to the launch-java-directly change:
- Split SPARK_JAVA_OPTS into multiple command-line arguments if it
  contains spaces; this splitting follows quoting rules in bash
- Add the Scala JARs to the classpath if they're not in the CLASSPATH
  variable because the ExecutorRunner is launched with "scala" (this can
  happen when using local-cluster URLs in spark-shell)
2013-06-25 17:17:27 -04:00