Matei Zaharia
52f491125e
Implementation of k-means and k-means||
2013-07-05 11:13:46 -07:00
Matei Zaharia
39684eafe3
Formatting
2013-07-05 11:13:46 -07:00
Matei Zaharia
6586c5e28b
Added a SparkContext accessor to RDD
2013-07-05 11:13:46 -07:00
Matei Zaharia
43dae967d7
Renamed "als" package to "recommendation"
2013-07-05 11:13:46 -07:00
Matei Zaharia
d3ce898b8e
Scaffolding and model for K-means
2013-07-05 11:13:46 -07:00
Matei Zaharia
3c046a6eca
Some small fixes to ALS.
2013-07-05 11:13:46 -07:00
Matei Zaharia
6f0ebb2db2
Remove unused import
2013-07-05 11:13:46 -07:00
Matei Zaharia
d903b3887f
Initial implementation of Alternating Least Squares.
...
Includes unit tests and sample data to run on.
2013-07-05 11:13:46 -07:00
Matei Zaharia
05be233ce2
Removed dependency on Apache Commons Math
2013-07-05 11:13:46 -07:00
Shivaram Venkataraman
39ed41652b
Move to regression, util and gradient packages
2013-07-05 11:13:46 -07:00
Shivaram Venkataraman
43b398db6a
Fix logistic regression to not center data.
...
Also add a feature to get the intercept correct and test these
using a small unit test.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
6dd3a816c8
Use a private constructor instead of private vars
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
76acc9fe9d
Make regression arguments private and add method to predict one point
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
3a6924cb8f
Clean up some comments.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
6aadaf4d71
Move normalization to MLUtils and remove Regression trait.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
2d0e64900e
Convert regression classes to builder pattern.
...
Remove extraneous methods and classes
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
ed32ec2b3b
Update test based on interface changes
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
fd137bd7c6
Address Reynold's comments. Also use a builder pattern to construct the regression classes.
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
48770419bd
Add random data used for LR testing.
...
Verified that results match with glm in R
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
282c8ed788
Add LogisticRegression using StochasticGradientDescent.
...
Also refactor RidgeRegression and LogisticRegression to re-use code
and update the test as well
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
b9d9b6f981
Add a unit test for Ridge Regression
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
4dc13bf5be
Revert back to closed form CV error
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
c8169c0a33
Add LPSA data set.
...
Data from
http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/prostate.data
2013-07-05 11:13:45 -07:00
Shivaram Venkataraman
c070decb8e
Add methods to normalize the data before training
...
Also update model after training based appropriately.
2013-07-05 11:13:45 -07:00
Reynold Xin
6a9a9a364c
Minor clean up of the RidgeRegression code. I am not even sure why I did
...
this :s.
2013-07-05 11:13:45 -07:00
Matei Zaharia
729e463f64
Import RidgeRegression example
...
Conflicts:
run
2013-07-05 11:13:41 -07:00
Matei Zaharia
6ad85d0918
Merge pull request #677 from jerryshao/fix_stage_clean
...
Clean StageToInfos periodically when spark.cleaner.ttl is enabled
2013-07-04 21:32:29 -07:00
jerryshao
e4ff544a8d
Clean StageToInfos periodically when spark.cleaner.ttl is enabled
2013-07-05 10:34:45 +08:00
Konstantin Boudnik
7687ed5292
Use standard ASF published avro module instead of a proprietory built one
2013-07-04 13:48:33 -07:00
Lian Cheng
c0c3155c3c
Bug fix: SPARK-789
...
https://spark-project.atlassian.net/browse/SPARK-789
2013-07-05 00:54:10 +08:00
Andrew xia
6ccfb73ca9
Add fair scheduler config template file
2013-07-04 19:19:44 +08:00
Holden Karau
0f06d6217d
s/ActorSystemImpl/ExtendedActorSystem/ as ActorSystemImpl results in a warning
2013-07-04 01:05:39 -07:00
Mingfei
04567a1771
update guava version from 11.0.1 to 14.0.1
2013-07-03 17:43:37 +08:00
Prashant Sharma
a5f1f6a907
Merge branch 'master' into master-merge
...
Conflicts:
core/pom.xml
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/RDDCheckpointData.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/Utils.scala
core/src/main/scala/spark/api/python/PythonRDD.scala
core/src/main/scala/spark/deploy/client/Client.scala
core/src/main/scala/spark/deploy/master/MasterWebUI.scala
core/src/main/scala/spark/deploy/worker/Worker.scala
core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/ZippedRDD.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/BlockManagerMasterActor.scala
core/src/main/scala/spark/storage/BlockManagerUI.scala
core/src/main/scala/spark/util/AkkaUtils.scala
core/src/test/scala/spark/SizeEstimatorSuite.scala
pom.xml
project/SparkBuild.scala
repl/src/main/scala/spark/repl/SparkILoop.scala
repl/src/test/scala/spark/repl/ReplSuite.scala
streaming/src/main/scala/spark/streaming/StreamingContext.scala
streaming/src/main/scala/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
streaming/src/main/scala/spark/streaming/util/MasterFailureTest.scala
2013-07-03 11:43:26 +05:30
Y.CORP.YAHOO.COM\tgraves
923cf92900
Rework from pull request. Removed --user option from Spark on Yarn Client, made the user of JAVA_HOME environment
...
variable conditional on if its set, and created addCredentials in each of the SparkHadoopUtil classes
to only add the credentials when the profile is hadoop2-yarn.
2013-07-02 21:18:59 -05:00
Patrick Wendell
39e2325675
Removing dead code
2013-07-02 16:28:40 -07:00
Patrick Wendell
8ca1cc1786
Adding truncation for log files
2013-07-02 16:10:50 -07:00
Matei Zaharia
6d60fe571a
Merge pull request #666 from c0s/master
...
hbase dependency is missed in hadoop2-yarn profile of examples module
2013-07-01 18:24:03 -07:00
Konstantin Boudnik
6fdbc68f2c
Fixing missed hbase dependency in examples hadoop2-yarn profile
2013-07-01 17:45:07 -07:00
Patrick Wendell
9a42d04efa
Throw exception for missing resource
2013-07-01 14:43:13 -07:00
Patrick Wendell
1025d7d1ef
Package refactoring
2013-07-01 14:40:53 -07:00
Patrick Wendell
30b9034241
Fixing bug where logs aren't shown
2013-07-01 13:48:01 -07:00
Patrick Wendell
8688689387
Various formatting changes
2013-07-01 13:40:12 -07:00
Patrick Wendell
735c951a09
Adding test script
2013-07-01 09:33:22 -07:00
Patrick Wendell
5de326db7d
Print exception message
2013-07-01 09:19:45 -07:00
root
7cd490ef5b
Clarify that PySpark is not supported on Windows
2013-07-01 06:26:43 +00:00
root
ec31e68d5d
Fixed PySpark perf regression by not using socket.makefile(), and improved
...
debuggability by letting "print" statements show up in the executor's stderr
Conflicts:
core/src/main/scala/spark/api/python/PythonRDD.scala
2013-07-01 06:26:31 +00:00
root
3296d132b6
Fix performance bug with new Python code not using buffered streams
2013-07-01 06:25:43 +00:00
Matei Zaharia
39ae073b5c
Increase SLF4j version in Maven too
2013-06-30 17:11:14 -07:00
Matei Zaharia
5bbd0eec84
Update docs on SCALA_LIBRARY_PATH
2013-06-30 17:00:40 -07:00