Matei Zaharia
fd6665122b
Fix some other references to Cloudera Avro and updated Avro version
2013-07-06 16:45:15 -07:00
Patrick Wendell
32b9d21a97
Fix occasional failure in UI listener.
...
If a task fails before the metrics are initialized, it remains possible
that the metrics field will be `None`. This patch accounts for that possbility
by keeping metrics as an `Option` at all times.
2013-07-06 16:40:02 -07:00
Matei Zaharia
22161887ee
Merge pull request #676 from c0s/asf-avro
...
Use standard ASF published avro module instead of a proprietory built one
2013-07-06 16:18:15 -07:00
Matei Zaharia
1ffadb2d9e
Merge remote-tracking branch 'pwendell/ui-updates'
...
Conflicts:
core/src/main/scala/spark/scheduler/DAGScheduler.scala
core/src/main/scala/spark/util/AkkaUtils.scala
pom.xml
2013-07-06 15:51:41 -07:00
Matei Zaharia
94871e4703
Merge pull request #655 from tgravescs/master
...
Add support for running Spark on Yarn on a secure Hadoop Cluster
2013-07-06 15:26:19 -07:00
Matei Zaharia
3f918b33f8
Merge pull request #672 from holdenk/master
...
s/ActorSystemImpl/ExtendedActorSystem/ as ActorSystemImpl results in a warning
2013-07-06 12:45:18 -07:00
Matei Zaharia
2a36e5449b
Merge pull request #673 from xiajunluan/master
...
Add config template file for fair scheduler feature
2013-07-06 12:43:21 -07:00
Matei Zaharia
7ba7fa110b
Merge pull request #674 from liancheng/master
...
Bug fix: SPARK-789
2013-07-06 11:45:08 -07:00
Matei Zaharia
f4416a1d7e
Merge pull request #681 from BlackNiuza/memory_leak
...
Remove active job from idToActiveJob when job finished or aborted
2013-07-06 11:41:58 -07:00
BlackNiuza
44a2440039
Remove active job from idToActiveJob when job finished or aborted
2013-07-07 01:33:09 +08:00
Patrick Wendell
37abe84212
Tracking some task metrics even during failures.
2013-07-06 09:19:59 -07:00
Matei Zaharia
e063e29af8
Merge pull request #680 from tdas/master
...
Fixed major performance bug in Network Receiver
2013-07-05 21:54:52 -07:00
Tathagata Das
280418ac45
Reduced the number of Iterator to ArrayBuffer copies in NetworkReceiver.
2013-07-05 21:38:21 -07:00
ryanlecompte
757e56dfc7
make binSearch a tail-recursive method
2013-07-05 19:54:28 -07:00
shivaram
bf1311e6d2
Merge pull request #678 from mateiz/ml-examples
...
Start of ML package
2013-07-05 17:32:44 -07:00
Matei Zaharia
8bbe907556
Replaced string constants in test
2013-07-05 17:25:23 -07:00
Patrick Wendell
84b7fc54e6
Enforcing correct sort order for formatted strings
2013-07-05 17:21:08 -07:00
Matei Zaharia
653043beb6
Renamed files to match package
2013-07-05 17:18:55 -07:00
Matei Zaharia
de67deeaab
Addressed style comments from Ryan LeCompte
2013-07-05 17:16:49 -07:00
Matei Zaharia
43b24635ee
Renamed ML package to MLlib and added it to classpath
2013-07-05 11:38:53 -07:00
Matei Zaharia
399bd65ef5
Fixed compile error due to merge
2013-07-05 11:27:06 -07:00
Shivaram Venkataraman
0e33c88cbd
Rename package gradient to optimization
2013-07-05 11:15:19 -07:00
Shivaram Venkataraman
09f187a400
Add top-level methods for regression methods.
...
Also add multiple versions of them to make it easier to call them from java.
2013-07-05 11:15:19 -07:00
Matei Zaharia
9441d3ef09
Use random seeds for K-means and ALS, and increase tolerance in tests
...
Random seeds make more sense by default for a machine learning library
because other libraries behave the same way (people expect to be able to
run the algorithm multiple times and get a better answer), but we can
add configuration later if needed. Tests that depend on specific seed
choices seem brittle.
2013-07-05 11:15:19 -07:00
Matei Zaharia
e7d49388e3
Added unit test for K-means, and fixed some bugs
2013-07-05 11:15:19 -07:00
Matei Zaharia
652ea0f1d8
Allow RDD.takeSample to give samples bigger than the RDD
...
Before, when withReplacement was set to true, we would not get a sample
bigger than the RDD's count().
Conflicts:
core/src/main/scala/spark/RDD.scala
core/src/test/scala/spark/RDDSuite.scala
2013-07-05 11:15:13 -07:00
Matei Zaharia
cffe3340c5
Fix logistic regression test failure and test suite cleanup
2013-07-05 11:13:46 -07:00
Shivaram Venkataraman
496c7548bb
Change test to use fewer iterations
2013-07-05 11:13:46 -07:00
Matei Zaharia
52f491125e
Implementation of k-means and k-means||
2013-07-05 11:13:46 -07:00
Matei Zaharia
39684eafe3
Formatting
2013-07-05 11:13:46 -07:00
Matei Zaharia
6586c5e28b
Added a SparkContext accessor to RDD
2013-07-05 11:13:46 -07:00
Matei Zaharia
43dae967d7
Renamed "als" package to "recommendation"
2013-07-05 11:13:46 -07:00
Matei Zaharia
d3ce898b8e
Scaffolding and model for K-means
2013-07-05 11:13:46 -07:00
Matei Zaharia
3c046a6eca
Some small fixes to ALS.
2013-07-05 11:13:46 -07:00
Matei Zaharia
6f0ebb2db2
Remove unused import
2013-07-05 11:13:46 -07:00
Matei Zaharia
d903b3887f
Initial implementation of Alternating Least Squares.
...
Includes unit tests and sample data to run on.
2013-07-05 11:13:46 -07:00
Matei Zaharia
05be233ce2
Removed dependency on Apache Commons Math
2013-07-05 11:13:46 -07:00
Shivaram Venkataraman
39ed41652b
Move to regression, util and gradient packages
2013-07-05 11:13:46 -07:00
Shivaram Venkataraman
43b398db6a
Fix logistic regression to not center data.
...
Also add a feature to get the intercept correct and test these
using a small unit test.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
6dd3a816c8
Use a private constructor instead of private vars
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
76acc9fe9d
Make regression arguments private and add method to predict one point
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
3a6924cb8f
Clean up some comments.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
6aadaf4d71
Move normalization to MLUtils and remove Regression trait.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
2d0e64900e
Convert regression classes to builder pattern.
...
Remove extraneous methods and classes
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
ed32ec2b3b
Update test based on interface changes
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
fd137bd7c6
Address Reynold's comments. Also use a builder pattern to construct the regression classes.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
48770419bd
Add random data used for LR testing.
...
Verified that results match with glm in R
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
282c8ed788
Add LogisticRegression using StochasticGradientDescent.
...
Also refactor RidgeRegression and LogisticRegression to re-use code
and update the test as well
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
b9d9b6f981
Add a unit test for Ridge Regression
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
4dc13bf5be
Revert back to closed form CV error
2013-07-05 11:13:45 -07:00