ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
shivaram	bf1311e6d2	Merge pull request #678 from mateiz/ml-examples Start of ML package	2013-07-05 17:32:44 -07:00
Matei Zaharia	8bbe907556	Replaced string constants in test	2013-07-05 17:25:23 -07:00
Matei Zaharia	653043beb6	Renamed files to match package	2013-07-05 17:18:55 -07:00
Matei Zaharia	de67deeaab	Addressed style comments from Ryan LeCompte	2013-07-05 17:16:49 -07:00
Matei Zaharia	43b24635ee	Renamed ML package to MLlib and added it to classpath	2013-07-05 11:38:53 -07:00
Matei Zaharia	399bd65ef5	Fixed compile error due to merge	2013-07-05 11:27:06 -07:00
Shivaram Venkataraman	0e33c88cbd	Rename package gradient to optimization	2013-07-05 11:15:19 -07:00
Shivaram Venkataraman	09f187a400	Add top-level methods for regression methods. Also add multiple versions of them to make it easier to call them from java.	2013-07-05 11:15:19 -07:00
Matei Zaharia	9441d3ef09	Use random seeds for K-means and ALS, and increase tolerance in tests Random seeds make more sense by default for a machine learning library because other libraries behave the same way (people expect to be able to run the algorithm multiple times and get a better answer), but we can add configuration later if needed. Tests that depend on specific seed choices seem brittle.	2013-07-05 11:15:19 -07:00
Matei Zaharia	e7d49388e3	Added unit test for K-means, and fixed some bugs	2013-07-05 11:15:19 -07:00
Matei Zaharia	652ea0f1d8	Allow RDD.takeSample to give samples bigger than the RDD Before, when withReplacement was set to true, we would not get a sample bigger than the RDD's count(). Conflicts: core/src/main/scala/spark/RDD.scala core/src/test/scala/spark/RDDSuite.scala	2013-07-05 11:15:13 -07:00
Matei Zaharia	cffe3340c5	Fix logistic regression test failure and test suite cleanup	2013-07-05 11:13:46 -07:00
Shivaram Venkataraman	496c7548bb	Change test to use fewer iterations	2013-07-05 11:13:46 -07:00
Matei Zaharia	52f491125e	Implementation of k-means and k-means\|\|	2013-07-05 11:13:46 -07:00
Matei Zaharia	39684eafe3	Formatting	2013-07-05 11:13:46 -07:00
Matei Zaharia	6586c5e28b	Added a SparkContext accessor to RDD	2013-07-05 11:13:46 -07:00
Matei Zaharia	43dae967d7	Renamed "als" package to "recommendation"	2013-07-05 11:13:46 -07:00
Matei Zaharia	d3ce898b8e	Scaffolding and model for K-means	2013-07-05 11:13:46 -07:00
Matei Zaharia	3c046a6eca	Some small fixes to ALS.	2013-07-05 11:13:46 -07:00
Matei Zaharia	6f0ebb2db2	Remove unused import	2013-07-05 11:13:46 -07:00
Matei Zaharia	d903b3887f	Initial implementation of Alternating Least Squares. Includes unit tests and sample data to run on.	2013-07-05 11:13:46 -07:00
Matei Zaharia	05be233ce2	Removed dependency on Apache Commons Math	2013-07-05 11:13:46 -07:00
Shivaram Venkataraman	39ed41652b	Move to regression, util and gradient packages	2013-07-05 11:13:46 -07:00
Shivaram Venkataraman	43b398db6a	Fix logistic regression to not center data. Also add a feature to get the intercept correct and test these using a small unit test.	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	6dd3a816c8	Use a private constructor instead of private vars	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	76acc9fe9d	Make regression arguments private and add method to predict one point	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	3a6924cb8f	Clean up some comments.	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	6aadaf4d71	Move normalization to MLUtils and remove Regression trait.	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	2d0e64900e	Convert regression classes to builder pattern. Remove extraneous methods and classes	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	ed32ec2b3b	Update test based on interface changes	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	fd137bd7c6	Address Reynold's comments. Also use a builder pattern to construct the regression classes.	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	48770419bd	Add random data used for LR testing. Verified that results match with glm in R	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	282c8ed788	Add LogisticRegression using StochasticGradientDescent. Also refactor RidgeRegression and LogisticRegression to re-use code and update the test as well	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	b9d9b6f981	Add a unit test for Ridge Regression	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	4dc13bf5be	Revert back to closed form CV error	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	c8169c0a33	Add LPSA data set. Data from http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/prostate.data	2013-07-05 11:13:45 -07:00
Shivaram Venkataraman	c070decb8e	Add methods to normalize the data before training Also update model after training based appropriately.	2013-07-05 11:13:45 -07:00
Reynold Xin	6a9a9a364c	Minor clean up of the RidgeRegression code. I am not even sure why I did this :s.	2013-07-05 11:13:45 -07:00
Matei Zaharia	729e463f64	Import RidgeRegression example Conflicts: run	2013-07-05 11:13:41 -07:00
Matei Zaharia	6ad85d0918	Merge pull request #677 from jerryshao/fix_stage_clean Clean StageToInfos periodically when spark.cleaner.ttl is enabled	2013-07-04 21:32:29 -07:00
jerryshao	e4ff544a8d	Clean StageToInfos periodically when spark.cleaner.ttl is enabled	2013-07-05 10:34:45 +08:00
Matei Zaharia	6d60fe571a	Merge pull request #666 from c0s/master hbase dependency is missed in hadoop2-yarn profile of examples module	2013-07-01 18:24:03 -07:00
Konstantin Boudnik	6fdbc68f2c	Fixing missed hbase dependency in examples hadoop2-yarn profile	2013-07-01 17:45:07 -07:00
root	7cd490ef5b	Clarify that PySpark is not supported on Windows	2013-07-01 06:26:43 +00:00
root	ec31e68d5d	Fixed PySpark perf regression by not using socket.makefile(), and improved debuggability by letting "print" statements show up in the executor's stderr Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala	2013-07-01 06:26:31 +00:00
root	3296d132b6	Fix performance bug with new Python code not using buffered streams	2013-07-01 06:25:43 +00:00
Matei Zaharia	39ae073b5c	Increase SLF4j version in Maven too	2013-06-30 17:11:14 -07:00
Matei Zaharia	5bbd0eec84	Update docs on SCALA_LIBRARY_PATH	2013-06-30 17:00:40 -07:00
Matei Zaharia	03d0b858c8	Made use of spark.executor.memory setting consistent and documented it Conflicts: core/src/main/scala/spark/SparkContext.scala	2013-06-30 15:46:46 -07:00
Matei Zaharia	ccfe953a4d	Merge pull request #577 from skumargithub/master Example of cumulative counting using updateStateByKey	2013-06-29 17:57:53 -07:00

1 2 3 4 5 ...

3031 commits