Reza Zadeh
eb2d8c431f
replace this.type with SVD
2014-01-17 13:57:27 -08:00
Reza Zadeh
cb13b15a60
use 0-indexing
2014-01-17 13:55:42 -08:00
Reynold Xin
84595ea3e2
Merge pull request #414 from soulmachine/code-style
...
Code clean up for mllib
* Removed unnecessary parentheses
* Removed unused imports
* Simplified `filter...size()` to `count ...`
* Removed obsoleted parameters' comments
2014-01-15 20:15:29 -08:00
Frank Dai
57fcfc75b3
Added parentheses for that getDouble() also has side effect
2014-01-14 18:56:11 +08:00
Patrick Wendell
23034798d7
Add missing header files
2014-01-14 01:17:13 -08:00
Reza Zadeh
845e568fad
Merge remote-tracking branch 'upstream/master' into sparsesvd
2014-01-13 23:52:34 -08:00
Frank Dai
a3da468d8b
Merge remote-tracking branch 'upstream/master' into code-style
2014-01-14 15:29:17 +08:00
Patrick Wendell
fdaabdc673
Merge pull request #380 from mateiz/py-bayes
...
Add Naive Bayes to Python MLlib, and some API fixes
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
algorithms better and in particular make it easier to call from Java
(added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
they could only be run individually through each file)
2014-01-13 23:08:26 -08:00
Frank Dai
c2852cf42e
Indent two spaces
2014-01-14 14:59:01 +08:00
Frank Dai
12386b3eea
Since getLong() and getInt() have side effect, get back parentheses, and remove an empty line
2014-01-14 14:53:10 +08:00
Frank Dai
0d94d74edf
Code clean up for mllib
2014-01-14 14:37:26 +08:00
Henry Saputra
91a563608e
Merge branch 'master' into remove_simpleredundantreturn_scala
2014-01-12 10:34:13 -08:00
Henry Saputra
93a65e5fde
Remove simple redundant return statement for Scala methods/functions:
...
-) Only change simple return statements at the end of method
-) Ignore the complex if-else check
-) Ignore the ones inside synchronized
2014-01-12 10:30:04 -08:00
Matei Zaharia
f00e949f84
Added Java unit test, data, and main method for Naive Bayes
...
Also fixes mains of a few other algorithms to print the final model
2014-01-11 22:30:48 -08:00
Matei Zaharia
9a0dfdf868
Add Naive Bayes to Python MLlib, and some API fixes
...
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
algorithms better and in particular make it easier to call from Java
(added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
they could only be run individually through each file)
2014-01-11 22:30:48 -08:00
jerryshao
cbfbc01938
Fix configure didn't work small problem in ALS
2014-01-11 16:22:45 +08:00
Reza Zadeh
21c8a54c08
Merge remote-tracking branch 'upstream/master' into sparsesvd
...
Conflicts:
docs/mllib-guide.md
2014-01-09 22:45:32 -08:00
Reza Zadeh
7d7490b67b
More sparse matrix usage.
2014-01-07 17:16:17 -08:00
Hossein Falaki
3a8beb46cb
Merge branch 'master' into MatrixFactorizationModel-fix
2014-01-07 15:22:42 -08:00
Hossein Falaki
04132ea9b2
Added Rating deserializer
2014-01-06 12:19:08 -08:00
Hossein Falaki
11a93fb5a8
Added serializing method for Rating object
2014-01-06 12:18:03 -08:00
Xusen Yin
05e6d5b454
Added GradientDescentSuite
2014-01-06 16:54:00 +08:00
Xusen Yin
a72107284a
fix logistic loss bug
2014-01-06 12:30:17 +08:00
Reynold Xin
d43ad3ef2c
Merge pull request #292 from soulmachine/naive-bayes
...
standard Naive Bayes classifier
Has implemented the standard Naive Bayes classifier. This is an updated version of #288 , which is closed because of misoperations.
2014-01-04 16:29:30 -08:00
Hossein Falaki
8d0c2f7399
Added python binding for bulk recommendation
2014-01-04 16:23:17 -08:00
Reza Zadeh
06c0f7628a
use SparseMatrix everywhere
2014-01-04 14:28:07 -08:00
Reza Zadeh
cdff9fc858
prettify
2014-01-04 12:44:04 -08:00
Reza Zadeh
e9bd6cb51d
new example file
2014-01-04 12:33:22 -08:00
Reza Zadeh
8bfcce1ad8
fix tests
2014-01-04 11:52:42 -08:00
Reza Zadeh
35adc72794
set methods
2014-01-04 11:30:36 -08:00
Reza Zadeh
73daa700bd
add k parameter
2014-01-04 01:52:28 -08:00
Reza Zadeh
26a74f0c41
using decomposed matrix struct now
2014-01-04 00:38:53 -08:00
Reza Zadeh
d2d5e5e062
new return struct
2014-01-04 00:15:04 -08:00
Reza Zadeh
7f631dd2a9
start using matrixentry
2014-01-03 22:17:24 -08:00
Reza Zadeh
6bcdb762a1
rename sparsesvd.scala
2014-01-03 21:55:38 -08:00
Reza Zadeh
b059a2a00c
New matrix entry file
2014-01-03 21:54:57 -08:00
Hossein Falaki
dfe57fa84c
Removed unnecessary blank line
2014-01-03 15:40:53 -08:00
Hossein Falaki
2c1cba851c
Added unit tests for bulk prediction in MatrixFactorizationModel
2014-01-03 15:35:20 -08:00
Hossein Falaki
67f937ec22
Added a method to enable bulk prediction
2014-01-03 15:34:16 -08:00
Reza Zadeh
e617ae2dad
fix error message
2014-01-02 01:51:38 -08:00
Reza Zadeh
61405785bc
Merge remote-tracking branch 'upstream/master' into sparsesvd
2014-01-02 01:50:30 -08:00
Reza Zadeh
2612164f85
more docs yay
2014-01-01 20:22:29 -08:00
Reza Zadeh
915d53f8ac
javadoc for sparsesvd
2014-01-01 20:20:16 -08:00
Reza Zadeh
185c882606
tweaks to docs
2014-01-01 19:53:14 -08:00
Lian, Cheng
dd6033e685
Aggregated all sample points to driver without any shuffle
2014-01-02 01:38:24 +08:00
Lian, Cheng
6d0e2e86df
Response to comments from Reynold, Ameet and Evan
...
* Arguments renamed according to Ameet's suggestion
* Using DoubleMatrix instead of Array[Double] in computation
* Removed arguments C (kinds of label) and D (dimension of feature vector) from NaiveBayes.train()
* Replaced reduceByKey with foldByKey to avoid modifying original input data
2013-12-30 22:46:32 +08:00
Matei Zaharia
b4ceed40d6
Merge remote-tracking branch 'origin/master' into conf2
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
2013-12-29 15:08:08 -05:00
Lian, Cheng
f150b6e76c
Response to Reynold's comments
2013-12-29 17:13:01 +08:00
Matei Zaharia
642029e7f4
Various fixes to configuration code
...
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing
This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
2013-12-28 17:13:15 -05:00
Reza Zadeh
ae5102acc0
large scale considerations
2013-12-27 04:15:13 -05:00
Reza Zadeh
642ab5c1e1
initial large scale testing begin
2013-12-27 01:51:19 -05:00
Reza Zadeh
3369c2d487
cleanup documentation
2013-12-27 00:41:46 -05:00
Reza Zadeh
bdb5037987
add all tests
2013-12-27 00:36:41 -05:00
Reza Zadeh
fa1e8d8cbf
test for truncated svd
2013-12-27 00:34:59 -05:00
Reza Zadeh
16de5268e3
full rank matrix test added
2013-12-26 23:21:57 -05:00
Lian, Cheng
d7086dc28a
Added Apache license header to NaiveBayesSuite
2013-12-27 08:20:41 +08:00
Reza Zadeh
fe1a132d40
Main method added for svd
2013-12-26 18:13:21 -05:00
Reza Zadeh
1a21ba2967
new main file
2013-12-26 18:09:33 -05:00
Reza Zadeh
6c3674cd23
Object to hold the svd methods
2013-12-26 17:39:25 -05:00
Reza Zadeh
6e740cc901
Some documentation
2013-12-26 16:12:40 -05:00
Lian, Cheng
654f42174a
Reformatted some lines commented by Matei
2013-12-27 04:45:04 +08:00
Reza Zadeh
1a173f00bd
Initial files - no tests
2013-12-26 15:01:03 -05:00
Lian, Cheng
c0337c5bbf
Let reduceByKey to take care of local combine
...
Also refactored some heavy FP code to improve readability and reduce memory footprint.
2013-12-25 22:45:57 +08:00
Lian, Cheng
3bb714eaa3
Refactored NaiveBayes
...
* Minimized shuffle output with mapPartitions.
* Reduced RDD actions from 3 to 1.
2013-12-25 17:15:38 +08:00
Frank Dai
3dc655aa19
standard Naive Bayes classifier
2013-12-25 16:50:42 +08:00
Tor Myklebust
4e821390bc
Scala stubs for updated Python bindings.
2013-12-25 00:09:00 -05:00
Tor Myklebust
58e2a7d6d4
Move PythonMLLibAPI into its own package.
2013-12-24 16:48:40 -05:00
Tor Myklebust
2402180b32
Fix error message ugliness.
2013-12-24 16:18:33 -05:00
Prashant Sharma
2573add94c
spark-544, introducing SparkConf and related configuration overhaul.
2013-12-25 00:09:36 +05:30
Tor Myklebust
20f85eca3d
Java stubs for ALSModel.
2013-12-21 14:54:13 -05:00
Tor Myklebust
b454fdc2eb
Javadocs; also, declare some things private.
2013-12-20 02:10:21 -05:00
Tor Myklebust
b835ddf3df
Licence notice.
2013-12-20 01:55:03 -05:00
Tor Myklebust
f99970e8cd
Scala classification and clustering stubs; matrix serialization/deserialization.
2013-12-20 00:12:22 -05:00
Tor Myklebust
ded67ee90c
Bindings for linear, Lasso, and ridge regression.
2013-12-19 22:42:12 -05:00
Tor Myklebust
2a41c9aad3
Un-semicolon PythonMLLibAPI.
2013-12-19 21:27:11 -05:00
Tor Myklebust
95915f8b3b
First cut at python mllib bindings. Only LinearRegression is supported.
2013-12-19 01:29:09 -05:00
Mark Hamstra
09ed7ddfa0
Use scala.binary.version in POMs
2013-12-15 12:39:58 -08:00
Prashant Sharma
17db6a9041
Style fixes and addressed review comments at #221
2013-12-10 11:47:16 +05:30
Prashant Sharma
7ad6921ae0
Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.
2013-12-07 12:45:57 +05:30
Prashant Sharma
44fd30d3fb
Merge branch 'master' into scala-2.10-wip
...
Conflicts:
core/src/main/scala/org/apache/spark/rdd/RDD.scala
project/SparkBuild.scala
2013-11-25 18:10:54 +05:30
Marek Kolodziej
22724659db
Make XORShiftRandom explicit in KMeans and roll it back for RDD
2013-11-20 07:03:36 -05:00
Marek Kolodziej
99cfe89c68
Updates to reflect pull request code review
2013-11-18 22:00:36 -05:00
Marek Kolodziej
09bdfe3b16
XORShift RNG with unit tests and benchmark
...
To run unit test, start SBT console and type:
compile
test-only org.apache.spark.util.XORShiftRandomSuite
To run benchmark, type:
project core
console
Once the Scala console starts, type:
org.apache.spark.util.XORShiftRandom.benchmark(100000000)
2013-11-18 15:21:43 -05:00
Prashant Sharma
026ab75661
Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10
2013-10-10 09:42:55 +05:30
Prashant Sharma
26860639c5
Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
project/SparkBuild.scala
2013-10-10 09:42:23 +05:30
Prashant Sharma
7be75682b9
Merge branch 'master' into wip-merge-master
...
Conflicts:
bagel/pom.xml
core/pom.xml
core/src/test/scala/org/apache/spark/ui/UISuite.scala
examples/pom.xml
mllib/pom.xml
pom.xml
project/SparkBuild.scala
repl/pom.xml
streaming/pom.xml
tools/pom.xml
In scala 2.10, a shorter representation is used for naming artifacts
so changed to shorter scala version for artifacts and made it a property in pom.
2013-10-08 11:29:40 +05:30
Nick Pentreath
a5e58b8f98
Merge branch 'master' into implicit-als
2013-10-07 11:46:17 +02:00
Nick Pentreath
b0f5f4d441
Bumping up test matrix size to eliminate random failures
2013-10-07 11:44:22 +02:00
Patrick Wendell
aa9fb84994
Merging build changes in from 0.8
2013-10-05 22:07:00 -07:00
Martin Weindel
e09f4a9601
fixed some warnings
2013-10-05 23:08:23 +02:00
Nick Pentreath
c6ceaeae50
Style fix using 'if' rather than 'match' on boolean
2013-10-04 13:52:53 +02:00
Nick Pentreath
6a7836cddc
Fixing closing brace indentation
2013-10-04 13:33:01 +02:00
Nick Pentreath
0bd9b373d1
Reverting to using comma-delimited split
2013-10-04 13:30:33 +02:00
Nick Pentreath
1cbdcb9cb6
Merge remote-tracking branch 'upstream/master' into implicit-als
2013-10-04 13:25:34 +02:00
Prashant Sharma
5829692885
Merge branch 'master' into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
docs/_config.yml
project/SparkBuild.scala
repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
2013-10-01 11:57:24 +05:30
Prashant Sharma
7ff4c2d399
fixed maven build for scala 2.10
2013-09-26 10:48:24 +05:30
Patrick Wendell
6079721fa1
Update build version in master
2013-09-24 11:41:51 -07:00
Nick Pentreath
d952f04c8e
Merge remote-tracking branch 'upstream/master' into implicit-als
2013-09-23 13:07:40 +02:00
Prashant Sharma
383e151fd7
Merge branch 'master' of git://github.com/mesos/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
project/SparkBuild.scala
2013-09-15 10:55:12 +05:30
Matei Zaharia
7a5c4b647b
Small tweaks to MLlib docs
2013-09-08 21:47:24 -07:00