Commit graph

3227 commits

Author SHA1 Message Date
Matei Zaharia ebe1efc862 Merge remote-tracking branch 'pwendell/ui-updates' 2013-07-06 16:46:15 -07:00
Matei Zaharia fd6665122b Fix some other references to Cloudera Avro and updated Avro version 2013-07-06 16:45:15 -07:00
Patrick Wendell 32b9d21a97 Fix occasional failure in UI listener.
If a task fails before the metrics are initialized, it remains possible
that the metrics field will be `None`. This patch accounts for that possbility
by keeping metrics as an `Option` at all times.
2013-07-06 16:40:02 -07:00
Matei Zaharia 22161887ee Merge pull request #676 from c0s/asf-avro
Use standard ASF published avro module instead of a proprietory built one
2013-07-06 16:18:15 -07:00
Matei Zaharia 1ffadb2d9e Merge remote-tracking branch 'pwendell/ui-updates'
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
	core/src/main/scala/spark/util/AkkaUtils.scala
	pom.xml
2013-07-06 15:51:41 -07:00
Matei Zaharia 94871e4703 Merge pull request #655 from tgravescs/master
Add support for running Spark on Yarn on a secure Hadoop Cluster
2013-07-06 15:26:19 -07:00
Matei Zaharia 3f918b33f8 Merge pull request #672 from holdenk/master
s/ActorSystemImpl/ExtendedActorSystem/ as ActorSystemImpl results in a warning
2013-07-06 12:45:18 -07:00
Matei Zaharia 2a36e5449b Merge pull request #673 from xiajunluan/master
Add config template file for fair scheduler feature
2013-07-06 12:43:21 -07:00
Matei Zaharia 7ba7fa110b Merge pull request #674 from liancheng/master
Bug fix: SPARK-789
2013-07-06 11:45:08 -07:00
Matei Zaharia f4416a1d7e Merge pull request #681 from BlackNiuza/memory_leak
Remove active job from idToActiveJob when job finished or aborted
2013-07-06 11:41:58 -07:00
BlackNiuza 44a2440039 Remove active job from idToActiveJob when job finished or aborted 2013-07-07 01:33:09 +08:00
Patrick Wendell 37abe84212 Tracking some task metrics even during failures. 2013-07-06 09:19:59 -07:00
Matei Zaharia e063e29af8 Merge pull request #680 from tdas/master
Fixed major performance bug in Network Receiver
2013-07-05 21:54:52 -07:00
Tathagata Das 280418ac45 Reduced the number of Iterator to ArrayBuffer copies in NetworkReceiver. 2013-07-05 21:38:21 -07:00
ryanlecompte 757e56dfc7 make binSearch a tail-recursive method 2013-07-05 19:54:28 -07:00
shivaram bf1311e6d2 Merge pull request #678 from mateiz/ml-examples
Start of ML package
2013-07-05 17:32:44 -07:00
Matei Zaharia 8bbe907556 Replaced string constants in test 2013-07-05 17:25:23 -07:00
Patrick Wendell 84b7fc54e6 Enforcing correct sort order for formatted strings 2013-07-05 17:21:08 -07:00
Matei Zaharia 653043beb6 Renamed files to match package 2013-07-05 17:18:55 -07:00
Matei Zaharia de67deeaab Addressed style comments from Ryan LeCompte 2013-07-05 17:16:49 -07:00
Matei Zaharia 43b24635ee Renamed ML package to MLlib and added it to classpath 2013-07-05 11:38:53 -07:00
Matei Zaharia 399bd65ef5 Fixed compile error due to merge 2013-07-05 11:27:06 -07:00
Shivaram Venkataraman 0e33c88cbd Rename package gradient to optimization 2013-07-05 11:15:19 -07:00
Shivaram Venkataraman 09f187a400 Add top-level methods for regression methods.
Also add multiple versions of them to make it easier to call them from java.
2013-07-05 11:15:19 -07:00
Matei Zaharia 9441d3ef09 Use random seeds for K-means and ALS, and increase tolerance in tests
Random seeds make more sense by default for a machine learning library
because other libraries behave the same way (people expect to be able to
run the algorithm multiple times and get a better answer), but we can
add configuration later if needed. Tests that depend on specific seed
choices seem brittle.
2013-07-05 11:15:19 -07:00
Matei Zaharia e7d49388e3 Added unit test for K-means, and fixed some bugs 2013-07-05 11:15:19 -07:00
Matei Zaharia 652ea0f1d8 Allow RDD.takeSample to give samples bigger than the RDD
Before, when withReplacement was set to true, we would not get a sample
bigger than the RDD's count().

Conflicts:
	core/src/main/scala/spark/RDD.scala
	core/src/test/scala/spark/RDDSuite.scala
2013-07-05 11:15:13 -07:00
Matei Zaharia cffe3340c5 Fix logistic regression test failure and test suite cleanup 2013-07-05 11:13:46 -07:00
Shivaram Venkataraman 496c7548bb Change test to use fewer iterations 2013-07-05 11:13:46 -07:00
Matei Zaharia 52f491125e Implementation of k-means and k-means|| 2013-07-05 11:13:46 -07:00
Matei Zaharia 39684eafe3 Formatting 2013-07-05 11:13:46 -07:00
Matei Zaharia 6586c5e28b Added a SparkContext accessor to RDD 2013-07-05 11:13:46 -07:00
Matei Zaharia 43dae967d7 Renamed "als" package to "recommendation" 2013-07-05 11:13:46 -07:00
Matei Zaharia d3ce898b8e Scaffolding and model for K-means 2013-07-05 11:13:46 -07:00
Matei Zaharia 3c046a6eca Some small fixes to ALS. 2013-07-05 11:13:46 -07:00
Matei Zaharia 6f0ebb2db2 Remove unused import 2013-07-05 11:13:46 -07:00
Matei Zaharia d903b3887f Initial implementation of Alternating Least Squares.
Includes unit tests and sample data to run on.
2013-07-05 11:13:46 -07:00
Matei Zaharia 05be233ce2 Removed dependency on Apache Commons Math 2013-07-05 11:13:46 -07:00
Shivaram Venkataraman 39ed41652b Move to regression, util and gradient packages 2013-07-05 11:13:46 -07:00
Shivaram Venkataraman 43b398db6a Fix logistic regression to not center data.
Also add a feature to get the intercept correct and test these
using a small unit test.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 6dd3a816c8 Use a private constructor instead of private vars 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 76acc9fe9d Make regression arguments private and add method to predict one point 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 3a6924cb8f Clean up some comments. 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 6aadaf4d71 Move normalization to MLUtils and remove Regression trait. 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 2d0e64900e Convert regression classes to builder pattern.
Remove extraneous methods and classes
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman ed32ec2b3b Update test based on interface changes 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman fd137bd7c6 Address Reynold's comments. Also use a builder pattern to construct the regression classes. 2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 48770419bd Add random data used for LR testing.
Verified that results match with glm in R
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman 282c8ed788 Add LogisticRegression using StochasticGradientDescent.
Also refactor RidgeRegression and LogisticRegression to re-use code
and update the test as well
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman b9d9b6f981 Add a unit test for Ridge Regression 2013-07-05 11:13:45 -07:00