Reynold Xin
d43ad3ef2c
Merge pull request #292 from soulmachine/naive-bayes
...
standard Naive Bayes classifier
Has implemented the standard Naive Bayes classifier. This is an updated version of #288 , which is closed because of misoperations.
2014-01-04 16:29:30 -08:00
Lian, Cheng
dd6033e685
Aggregated all sample points to driver without any shuffle
2014-01-02 01:38:24 +08:00
Lian, Cheng
6d0e2e86df
Response to comments from Reynold, Ameet and Evan
...
* Arguments renamed according to Ameet's suggestion
* Using DoubleMatrix instead of Array[Double] in computation
* Removed arguments C (kinds of label) and D (dimension of feature vector) from NaiveBayes.train()
* Replaced reduceByKey with foldByKey to avoid modifying original input data
2013-12-30 22:46:32 +08:00
Matei Zaharia
b4ceed40d6
Merge remote-tracking branch 'origin/master' into conf2
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
2013-12-29 15:08:08 -05:00
Lian, Cheng
f150b6e76c
Response to Reynold's comments
2013-12-29 17:13:01 +08:00
Matei Zaharia
642029e7f4
Various fixes to configuration code
...
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing
This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
2013-12-28 17:13:15 -05:00
Lian, Cheng
d7086dc28a
Added Apache license header to NaiveBayesSuite
2013-12-27 08:20:41 +08:00
Lian, Cheng
654f42174a
Reformatted some lines commented by Matei
2013-12-27 04:45:04 +08:00
Lian, Cheng
c0337c5bbf
Let reduceByKey to take care of local combine
...
Also refactored some heavy FP code to improve readability and reduce memory footprint.
2013-12-25 22:45:57 +08:00
Lian, Cheng
3bb714eaa3
Refactored NaiveBayes
...
* Minimized shuffle output with mapPartitions.
* Reduced RDD actions from 3 to 1.
2013-12-25 17:15:38 +08:00
Frank Dai
3dc655aa19
standard Naive Bayes classifier
2013-12-25 16:50:42 +08:00
Tor Myklebust
4e821390bc
Scala stubs for updated Python bindings.
2013-12-25 00:09:00 -05:00
Tor Myklebust
58e2a7d6d4
Move PythonMLLibAPI into its own package.
2013-12-24 16:48:40 -05:00
Tor Myklebust
2402180b32
Fix error message ugliness.
2013-12-24 16:18:33 -05:00
Prashant Sharma
2573add94c
spark-544, introducing SparkConf and related configuration overhaul.
2013-12-25 00:09:36 +05:30
Tor Myklebust
20f85eca3d
Java stubs for ALSModel.
2013-12-21 14:54:13 -05:00
Tor Myklebust
b454fdc2eb
Javadocs; also, declare some things private.
2013-12-20 02:10:21 -05:00
Tor Myklebust
b835ddf3df
Licence notice.
2013-12-20 01:55:03 -05:00
Tor Myklebust
f99970e8cd
Scala classification and clustering stubs; matrix serialization/deserialization.
2013-12-20 00:12:22 -05:00
Tor Myklebust
ded67ee90c
Bindings for linear, Lasso, and ridge regression.
2013-12-19 22:42:12 -05:00
Tor Myklebust
2a41c9aad3
Un-semicolon PythonMLLibAPI.
2013-12-19 21:27:11 -05:00
Tor Myklebust
95915f8b3b
First cut at python mllib bindings. Only LinearRegression is supported.
2013-12-19 01:29:09 -05:00
Mark Hamstra
09ed7ddfa0
Use scala.binary.version in POMs
2013-12-15 12:39:58 -08:00
Prashant Sharma
17db6a9041
Style fixes and addressed review comments at #221
2013-12-10 11:47:16 +05:30
Prashant Sharma
7ad6921ae0
Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.
2013-12-07 12:45:57 +05:30
Prashant Sharma
44fd30d3fb
Merge branch 'master' into scala-2.10-wip
...
Conflicts:
core/src/main/scala/org/apache/spark/rdd/RDD.scala
project/SparkBuild.scala
2013-11-25 18:10:54 +05:30
Marek Kolodziej
22724659db
Make XORShiftRandom explicit in KMeans and roll it back for RDD
2013-11-20 07:03:36 -05:00
Marek Kolodziej
99cfe89c68
Updates to reflect pull request code review
2013-11-18 22:00:36 -05:00
Marek Kolodziej
09bdfe3b16
XORShift RNG with unit tests and benchmark
...
To run unit test, start SBT console and type:
compile
test-only org.apache.spark.util.XORShiftRandomSuite
To run benchmark, type:
project core
console
Once the Scala console starts, type:
org.apache.spark.util.XORShiftRandom.benchmark(100000000)
2013-11-18 15:21:43 -05:00
Prashant Sharma
026ab75661
Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10
2013-10-10 09:42:55 +05:30
Prashant Sharma
26860639c5
Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
project/SparkBuild.scala
2013-10-10 09:42:23 +05:30
Prashant Sharma
7be75682b9
Merge branch 'master' into wip-merge-master
...
Conflicts:
bagel/pom.xml
core/pom.xml
core/src/test/scala/org/apache/spark/ui/UISuite.scala
examples/pom.xml
mllib/pom.xml
pom.xml
project/SparkBuild.scala
repl/pom.xml
streaming/pom.xml
tools/pom.xml
In scala 2.10, a shorter representation is used for naming artifacts
so changed to shorter scala version for artifacts and made it a property in pom.
2013-10-08 11:29:40 +05:30
Nick Pentreath
a5e58b8f98
Merge branch 'master' into implicit-als
2013-10-07 11:46:17 +02:00
Nick Pentreath
b0f5f4d441
Bumping up test matrix size to eliminate random failures
2013-10-07 11:44:22 +02:00
Patrick Wendell
aa9fb84994
Merging build changes in from 0.8
2013-10-05 22:07:00 -07:00
Martin Weindel
e09f4a9601
fixed some warnings
2013-10-05 23:08:23 +02:00
Nick Pentreath
c6ceaeae50
Style fix using 'if' rather than 'match' on boolean
2013-10-04 13:52:53 +02:00
Nick Pentreath
6a7836cddc
Fixing closing brace indentation
2013-10-04 13:33:01 +02:00
Nick Pentreath
0bd9b373d1
Reverting to using comma-delimited split
2013-10-04 13:30:33 +02:00
Nick Pentreath
1cbdcb9cb6
Merge remote-tracking branch 'upstream/master' into implicit-als
2013-10-04 13:25:34 +02:00
Prashant Sharma
5829692885
Merge branch 'master' into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala
docs/_config.yml
project/SparkBuild.scala
repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
2013-10-01 11:57:24 +05:30
Prashant Sharma
7ff4c2d399
fixed maven build for scala 2.10
2013-09-26 10:48:24 +05:30
Patrick Wendell
6079721fa1
Update build version in master
2013-09-24 11:41:51 -07:00
Nick Pentreath
d952f04c8e
Merge remote-tracking branch 'upstream/master' into implicit-als
2013-09-23 13:07:40 +02:00
Prashant Sharma
383e151fd7
Merge branch 'master' of git://github.com/mesos/spark into scala-2.10
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
project/SparkBuild.scala
2013-09-15 10:55:12 +05:30
Matei Zaharia
7a5c4b647b
Small tweaks to MLlib docs
2013-09-08 21:47:24 -07:00
Ameet Talwalkar
81a8bd46ac
respose to PR comments
2013-09-08 19:21:30 -07:00
Nick Pentreath
737f01a1ef
Adding algorithm for implicit feedback data to ALS
2013-09-06 14:45:05 +02:00
Prashant Sharma
4106ae9fbf
Merged with master
2013-09-06 17:53:01 +05:30
Matei Zaharia
12b2f1f9c9
Add missing license headers found with RAT
2013-09-02 12:23:03 -07:00