Sean Owen
e91ad3f164
Correct L2 regularized weight update with canonical form
2014-01-18 12:53:01 +00:00
Reynold Xin
84595ea3e2
Merge pull request #414 from soulmachine/code-style
...
Code clean up for mllib
* Removed unnecessary parentheses
* Removed unused imports
* Simplified `filter...size()` to `count ...`
* Removed obsoleted parameters' comments
2014-01-15 20:15:29 -08:00
Frank Dai
57fcfc75b3
Added parentheses for that getDouble() also has side effect
2014-01-14 18:56:11 +08:00
Patrick Wendell
23034798d7
Add missing header files
2014-01-14 01:17:13 -08:00
Frank Dai
a3da468d8b
Merge remote-tracking branch 'upstream/master' into code-style
2014-01-14 15:29:17 +08:00
Patrick Wendell
fdaabdc673
Merge pull request #380 from mateiz/py-bayes
...
Add Naive Bayes to Python MLlib, and some API fixes
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
algorithms better and in particular make it easier to call from Java
(added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
they could only be run individually through each file)
2014-01-13 23:08:26 -08:00
Frank Dai
c2852cf42e
Indent two spaces
2014-01-14 14:59:01 +08:00
Frank Dai
12386b3eea
Since getLong() and getInt() have side effect, get back parentheses, and remove an empty line
2014-01-14 14:53:10 +08:00
Frank Dai
0d94d74edf
Code clean up for mllib
2014-01-14 14:37:26 +08:00
Henry Saputra
91a563608e
Merge branch 'master' into remove_simpleredundantreturn_scala
2014-01-12 10:34:13 -08:00
Henry Saputra
93a65e5fde
Remove simple redundant return statement for Scala methods/functions:
...
-) Only change simple return statements at the end of method
-) Ignore the complex if-else check
-) Ignore the ones inside synchronized
2014-01-12 10:30:04 -08:00
Matei Zaharia
f00e949f84
Added Java unit test, data, and main method for Naive Bayes
...
Also fixes mains of a few other algorithms to print the final model
2014-01-11 22:30:48 -08:00
Matei Zaharia
9a0dfdf868
Add Naive Bayes to Python MLlib, and some API fixes
...
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
algorithms better and in particular make it easier to call from Java
(added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
they could only be run individually through each file)
2014-01-11 22:30:48 -08:00
jerryshao
cbfbc01938
Fix configure didn't work small problem in ALS
2014-01-11 16:22:45 +08:00
Hossein Falaki
3a8beb46cb
Merge branch 'master' into MatrixFactorizationModel-fix
2014-01-07 15:22:42 -08:00
Hossein Falaki
04132ea9b2
Added Rating deserializer
2014-01-06 12:19:08 -08:00
Hossein Falaki
11a93fb5a8
Added serializing method for Rating object
2014-01-06 12:18:03 -08:00
Xusen Yin
05e6d5b454
Added GradientDescentSuite
2014-01-06 16:54:00 +08:00
Xusen Yin
a72107284a
fix logistic loss bug
2014-01-06 12:30:17 +08:00
Reynold Xin
d43ad3ef2c
Merge pull request #292 from soulmachine/naive-bayes
...
standard Naive Bayes classifier
Has implemented the standard Naive Bayes classifier. This is an updated version of #288 , which is closed because of misoperations.
2014-01-04 16:29:30 -08:00
Hossein Falaki
8d0c2f7399
Added python binding for bulk recommendation
2014-01-04 16:23:17 -08:00
Hossein Falaki
dfe57fa84c
Removed unnecessary blank line
2014-01-03 15:40:53 -08:00
Hossein Falaki
2c1cba851c
Added unit tests for bulk prediction in MatrixFactorizationModel
2014-01-03 15:35:20 -08:00
Hossein Falaki
67f937ec22
Added a method to enable bulk prediction
2014-01-03 15:34:16 -08:00
Lian, Cheng
dd6033e685
Aggregated all sample points to driver without any shuffle
2014-01-02 01:38:24 +08:00
Lian, Cheng
6d0e2e86df
Response to comments from Reynold, Ameet and Evan
...
* Arguments renamed according to Ameet's suggestion
* Using DoubleMatrix instead of Array[Double] in computation
* Removed arguments C (kinds of label) and D (dimension of feature vector) from NaiveBayes.train()
* Replaced reduceByKey with foldByKey to avoid modifying original input data
2013-12-30 22:46:32 +08:00
Matei Zaharia
b4ceed40d6
Merge remote-tracking branch 'origin/master' into conf2
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
2013-12-29 15:08:08 -05:00
Lian, Cheng
f150b6e76c
Response to Reynold's comments
2013-12-29 17:13:01 +08:00
Matei Zaharia
642029e7f4
Various fixes to configuration code
...
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing
This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
2013-12-28 17:13:15 -05:00
Lian, Cheng
d7086dc28a
Added Apache license header to NaiveBayesSuite
2013-12-27 08:20:41 +08:00
Lian, Cheng
654f42174a
Reformatted some lines commented by Matei
2013-12-27 04:45:04 +08:00
Lian, Cheng
c0337c5bbf
Let reduceByKey to take care of local combine
...
Also refactored some heavy FP code to improve readability and reduce memory footprint.
2013-12-25 22:45:57 +08:00
Lian, Cheng
3bb714eaa3
Refactored NaiveBayes
...
* Minimized shuffle output with mapPartitions.
* Reduced RDD actions from 3 to 1.
2013-12-25 17:15:38 +08:00
Frank Dai
3dc655aa19
standard Naive Bayes classifier
2013-12-25 16:50:42 +08:00
Tor Myklebust
4e821390bc
Scala stubs for updated Python bindings.
2013-12-25 00:09:00 -05:00
Tor Myklebust
58e2a7d6d4
Move PythonMLLibAPI into its own package.
2013-12-24 16:48:40 -05:00
Tor Myklebust
2402180b32
Fix error message ugliness.
2013-12-24 16:18:33 -05:00
Prashant Sharma
2573add94c
spark-544, introducing SparkConf and related configuration overhaul.
2013-12-25 00:09:36 +05:30
Tor Myklebust
20f85eca3d
Java stubs for ALSModel.
2013-12-21 14:54:13 -05:00
Tor Myklebust
b454fdc2eb
Javadocs; also, declare some things private.
2013-12-20 02:10:21 -05:00
Tor Myklebust
b835ddf3df
Licence notice.
2013-12-20 01:55:03 -05:00
Tor Myklebust
f99970e8cd
Scala classification and clustering stubs; matrix serialization/deserialization.
2013-12-20 00:12:22 -05:00
Tor Myklebust
ded67ee90c
Bindings for linear, Lasso, and ridge regression.
2013-12-19 22:42:12 -05:00
Tor Myklebust
2a41c9aad3
Un-semicolon PythonMLLibAPI.
2013-12-19 21:27:11 -05:00
Tor Myklebust
95915f8b3b
First cut at python mllib bindings. Only LinearRegression is supported.
2013-12-19 01:29:09 -05:00
Mark Hamstra
09ed7ddfa0
Use scala.binary.version in POMs
2013-12-15 12:39:58 -08:00
Prashant Sharma
17db6a9041
Style fixes and addressed review comments at #221
2013-12-10 11:47:16 +05:30
Prashant Sharma
7ad6921ae0
Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.
2013-12-07 12:45:57 +05:30
Prashant Sharma
44fd30d3fb
Merge branch 'master' into scala-2.10-wip
...
Conflicts:
core/src/main/scala/org/apache/spark/rdd/RDD.scala
project/SparkBuild.scala
2013-11-25 18:10:54 +05:30
Marek Kolodziej
22724659db
Make XORShiftRandom explicit in KMeans and roll it back for RDD
2013-11-20 07:03:36 -05:00