Commit graph

5197 commits

Author SHA1 Message Date
Patrick Wendell 19672dca32 Merge pull request #305 from kayousterhout/line_spacing
Fixed >100char lines in DAGScheduler.scala

There's no changed functionality here -- only line spacing and one grammatical fix in a comment.
2013-12-27 13:37:10 -08:00
Tathagata Das 271e3237f3 Minor changes in comments and strings to address comments in PR 289. 2013-12-27 12:26:57 -08:00
Kay Ousterhout 0c71ffe924 Style fixes as per Reynold's review 2013-12-27 12:19:38 -08:00
Kay Ousterhout 8c81068e16 Fixed >100char lines in DAGScheduler.scala 2013-12-27 11:36:54 -08:00
Binh Nguyen 2c5bade4ee Fix failed unit tests
Also clean up a bit.
2013-12-27 11:24:30 -08:00
Kay Ousterhout baaabcedc9 Removed unused failed and causeOfFailure variables 2013-12-27 11:12:36 -08:00
Reynold Xin 7be1e57786 Merge pull request #298 from aarondav/minor
Minor: Decrease margin of left side of Log page

Before
![before](https://f.cloud.github.com/assets/1400247/1812647/1a4be53e-6e87-11e3-9d5b-f851274be0e9.png)

After
![after](https://f.cloud.github.com/assets/1400247/1812648/1ca1ea2c-6e87-11e3-946c-31be9258f450.png)

It's a start anyway...
2013-12-26 23:41:40 -10:00
Reynold Xin 7d811ba6f2 Merge pull request #302 from pwendell/SPARK-1007
SPARK-1007: spark-class2.cmd should change SCALA_VERSION to be 2.10

Reported by Qiuzhuang Lian
2013-12-26 23:39:58 -10:00
Patrick Wendell 0cc1e0d43d SPARK-1007: spark-class2.cmd should change SCALA_VERSION to be 2.10 2013-12-26 23:21:08 -08:00
Lian, Cheng d7086dc28a Added Apache license header to NaiveBayesSuite 2013-12-27 08:20:41 +08:00
Matei Zaharia 5e69fc5bb4 Merge pull request #295 from markhamstra/JobProgressListenerNPE
Avoid a lump of coal (NPE) in JobProgressListener's stocking.
2013-12-26 19:10:39 -05:00
Patrick Wendell 55c8bb741c Intermediate clean-up of tests to appease jenkins 2013-12-26 15:43:15 -08:00
Aaron Davidson 4f2fb761b0 Decrease margin of left side of log page 2013-12-26 15:38:45 -08:00
Patrick Wendell 5c1b4f6405 Minor fixes 2013-12-26 14:39:39 -08:00
Tathagata Das 5fde4566ea Added Apache boilerplate and class docs to PartitionerAwareUnionRDD. 2013-12-26 14:33:37 -08:00
Tathagata Das 577c8cc834 Removed unncessary options from WindowedDStream. 2013-12-26 14:17:16 -08:00
Tathagata Das 3618d70b2a Added warning if filestream adds files with no data in them (file RDDs have 0 partitions). 2013-12-26 12:45:40 -08:00
Lian, Cheng 654f42174a Reformatted some lines commented by Matei 2013-12-27 04:45:04 +08:00
Patrick Wendell c23d640516 Addressing smaller changes from Aaron's review 2013-12-26 12:38:39 -08:00
Tathagata Das be64719138 Changed file stream to not catch any exceptions related to finding new files (FileNotFound exception is still caught and ignored). 2013-12-26 12:33:12 -08:00
Tathagata Das 3579647cdc Merge branch 'apache-master' into window-improvement 2013-12-26 12:12:10 -08:00
Patrick Wendell da20270b83 Merge pull request #1 from aarondav/driver
Refactor DriverClient to be more Actor-based
2013-12-26 12:11:52 -08:00
Patrick Wendell a97ad55c45 Removing accidental file 2013-12-26 12:11:28 -08:00
Tathagata Das c4a54f51b5 Merge branch 'master' into window-improvement 2013-12-26 12:03:11 -08:00
Patrick Wendell 5938cfc153 Updated approach to driver restarting 2013-12-26 12:02:19 -08:00
Matei Zaharia e240bad03b Merge pull request #296 from witgo/master
Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn package
2013-12-26 12:30:48 -05:00
Tathagata Das 069cb14bdc Updated groupByKeyAndWindow to be computed incrementally, and added mapSideCombine to combineByKeyAndWindow. 2013-12-26 02:58:29 -08:00
Tathagata Das bacc65cf28 Removed slack time in file stream and added better handling of exceptions due to failures due FileNotFound exceptions. 2013-12-26 10:18:46 +00:00
liguoqiang b662c88a24 fix this import order 2013-12-26 15:49:33 +08:00
Mark Hamstra c529dceaff Avoid a lump of coal (NPE) in JobProgressListener's stocking. 2013-12-25 23:10:02 -08:00
Matei Zaharia c344ed04c7 Merge pull request #283 from tmyklebu/master
Python bindings for mllib

This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib.

For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py.  The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub.  The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model.

ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside.  The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method.

I have tested these bindings on an x86_64 machine running Linux.  There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.
2013-12-26 01:31:06 -05:00
liguoqiang 2bd76f693d Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn 2013-12-26 11:10:35 +08:00
liguoqiang 14fcef72db Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn 2013-12-26 11:05:07 +08:00
Tathagata Das 94479673eb Fixed bug in PartitionAwareUnionRDD 2013-12-26 00:07:45 +00:00
Tor Myklebust 9cbcf81453 Remove commented code in __init__.py. 2013-12-25 14:12:42 -05:00
Tor Myklebust 5e71354cb7 Fix copypasta in __init__.py. Don't import anything directly into pyspark.mllib. 2013-12-25 14:10:55 -05:00
Aaron Davidson 61372b11f4 Refactor DriverClient to be more Actor-based 2013-12-25 10:55:25 -08:00
Matei Zaharia 56094bcd8d Merge pull request #290 from ash211/patch-3
Typo: avaiable -> available
2013-12-25 13:14:33 -05:00
Lian, Cheng c0337c5bbf Let reduceByKey to take care of local combine
Also refactored some heavy FP code to improve readability and reduce memory footprint.
2013-12-25 22:45:57 +08:00
Reynold Xin 4842a07da8 Merge pull request #287 from azuryyu/master
Fixed job name in the java streaming example.
2013-12-25 01:52:15 -08:00
Patrick Wendell bbc362833b Removing un-used variable 2013-12-25 01:38:57 -08:00
Patrick Wendell 18ad419b52 Small fix from rebase 2013-12-25 01:22:38 -08:00
Patrick Wendell 55f833803a Minor bug fix 2013-12-25 01:19:25 -08:00
Patrick Wendell c9c0f745af Minor style clean-up 2013-12-25 01:19:25 -08:00
Patrick Wendell b2b7514ba3 Import clean-up (yay Aaron) 2013-12-25 01:19:25 -08:00
Patrick Wendell d5f23e0083 Adding scheduling and reporting based on cores 2013-12-25 01:19:01 -08:00
Patrick Wendell 760823d393 Adding better option parsing 2013-12-25 01:19:01 -08:00
Patrick Wendell 6a4acc4c2d Initial cut at driver submission. 2013-12-25 01:19:01 -08:00
Patrick Wendell 1070b566d4 Renaming Client => AppClient 2013-12-25 01:17:01 -08:00
Lian, Cheng 3bb714eaa3 Refactored NaiveBayes
* Minimized shuffle output with mapPartitions.
* Reduced RDD actions from 3 to 1.
2013-12-25 17:15:38 +08:00