Patrick Wendell
19672dca32
Merge pull request #305 from kayousterhout/line_spacing
...
Fixed >100char lines in DAGScheduler.scala
There's no changed functionality here -- only line spacing and one grammatical fix in a comment.
2013-12-27 13:37:10 -08:00
Tathagata Das
271e3237f3
Minor changes in comments and strings to address comments in PR 289.
2013-12-27 12:26:57 -08:00
Kay Ousterhout
0c71ffe924
Style fixes as per Reynold's review
2013-12-27 12:19:38 -08:00
Kay Ousterhout
8c81068e16
Fixed >100char lines in DAGScheduler.scala
2013-12-27 11:36:54 -08:00
Binh Nguyen
2c5bade4ee
Fix failed unit tests
...
Also clean up a bit.
2013-12-27 11:24:30 -08:00
Kay Ousterhout
baaabcedc9
Removed unused failed and causeOfFailure variables
2013-12-27 11:12:36 -08:00
Reynold Xin
7be1e57786
Merge pull request #298 from aarondav/minor
...
Minor: Decrease margin of left side of Log page
Before
![before](https://f.cloud.github.com/assets/1400247/1812647/1a4be53e-6e87-11e3-9d5b-f851274be0e9.png )
After
![after](https://f.cloud.github.com/assets/1400247/1812648/1ca1ea2c-6e87-11e3-946c-31be9258f450.png )
It's a start anyway...
2013-12-26 23:41:40 -10:00
Reynold Xin
7d811ba6f2
Merge pull request #302 from pwendell/SPARK-1007
...
SPARK-1007: spark-class2.cmd should change SCALA_VERSION to be 2.10
Reported by Qiuzhuang Lian
2013-12-26 23:39:58 -10:00
Patrick Wendell
0cc1e0d43d
SPARK-1007: spark-class2.cmd should change SCALA_VERSION to be 2.10
2013-12-26 23:21:08 -08:00
Lian, Cheng
d7086dc28a
Added Apache license header to NaiveBayesSuite
2013-12-27 08:20:41 +08:00
Matei Zaharia
5e69fc5bb4
Merge pull request #295 from markhamstra/JobProgressListenerNPE
...
Avoid a lump of coal (NPE) in JobProgressListener's stocking.
2013-12-26 19:10:39 -05:00
Patrick Wendell
55c8bb741c
Intermediate clean-up of tests to appease jenkins
2013-12-26 15:43:15 -08:00
Aaron Davidson
4f2fb761b0
Decrease margin of left side of log page
2013-12-26 15:38:45 -08:00
Patrick Wendell
5c1b4f6405
Minor fixes
2013-12-26 14:39:39 -08:00
Tathagata Das
5fde4566ea
Added Apache boilerplate and class docs to PartitionerAwareUnionRDD.
2013-12-26 14:33:37 -08:00
Tathagata Das
577c8cc834
Removed unncessary options from WindowedDStream.
2013-12-26 14:17:16 -08:00
Tathagata Das
3618d70b2a
Added warning if filestream adds files with no data in them (file RDDs have 0 partitions).
2013-12-26 12:45:40 -08:00
Lian, Cheng
654f42174a
Reformatted some lines commented by Matei
2013-12-27 04:45:04 +08:00
Patrick Wendell
c23d640516
Addressing smaller changes from Aaron's review
2013-12-26 12:38:39 -08:00
Tathagata Das
be64719138
Changed file stream to not catch any exceptions related to finding new files (FileNotFound exception is still caught and ignored).
2013-12-26 12:33:12 -08:00
Tathagata Das
3579647cdc
Merge branch 'apache-master' into window-improvement
2013-12-26 12:12:10 -08:00
Patrick Wendell
da20270b83
Merge pull request #1 from aarondav/driver
...
Refactor DriverClient to be more Actor-based
2013-12-26 12:11:52 -08:00
Patrick Wendell
a97ad55c45
Removing accidental file
2013-12-26 12:11:28 -08:00
Tathagata Das
c4a54f51b5
Merge branch 'master' into window-improvement
2013-12-26 12:03:11 -08:00
Patrick Wendell
5938cfc153
Updated approach to driver restarting
2013-12-26 12:02:19 -08:00
Matei Zaharia
e240bad03b
Merge pull request #296 from witgo/master
...
Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn package
2013-12-26 12:30:48 -05:00
Tathagata Das
069cb14bdc
Updated groupByKeyAndWindow to be computed incrementally, and added mapSideCombine to combineByKeyAndWindow.
2013-12-26 02:58:29 -08:00
Tathagata Das
bacc65cf28
Removed slack time in file stream and added better handling of exceptions due to failures due FileNotFound exceptions.
2013-12-26 10:18:46 +00:00
liguoqiang
b662c88a24
fix this import order
2013-12-26 15:49:33 +08:00
Mark Hamstra
c529dceaff
Avoid a lump of coal (NPE) in JobProgressListener's stocking.
2013-12-25 23:10:02 -08:00
Matei Zaharia
c344ed04c7
Merge pull request #283 from tmyklebu/master
...
Python bindings for mllib
This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib.
For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model.
ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method.
I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.
2013-12-26 01:31:06 -05:00
liguoqiang
2bd76f693d
Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn
2013-12-26 11:10:35 +08:00
liguoqiang
14fcef72db
Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn
2013-12-26 11:05:07 +08:00
Tathagata Das
94479673eb
Fixed bug in PartitionAwareUnionRDD
2013-12-26 00:07:45 +00:00
Tor Myklebust
9cbcf81453
Remove commented code in __init__.py.
2013-12-25 14:12:42 -05:00
Tor Myklebust
5e71354cb7
Fix copypasta in __init__.py. Don't import anything directly into pyspark.mllib.
2013-12-25 14:10:55 -05:00
Aaron Davidson
61372b11f4
Refactor DriverClient to be more Actor-based
2013-12-25 10:55:25 -08:00
Matei Zaharia
56094bcd8d
Merge pull request #290 from ash211/patch-3
...
Typo: avaiable -> available
2013-12-25 13:14:33 -05:00
Lian, Cheng
c0337c5bbf
Let reduceByKey to take care of local combine
...
Also refactored some heavy FP code to improve readability and reduce memory footprint.
2013-12-25 22:45:57 +08:00
Reynold Xin
4842a07da8
Merge pull request #287 from azuryyu/master
...
Fixed job name in the java streaming example.
2013-12-25 01:52:15 -08:00
Patrick Wendell
bbc362833b
Removing un-used variable
2013-12-25 01:38:57 -08:00
Patrick Wendell
18ad419b52
Small fix from rebase
2013-12-25 01:22:38 -08:00
Patrick Wendell
55f833803a
Minor bug fix
2013-12-25 01:19:25 -08:00
Patrick Wendell
c9c0f745af
Minor style clean-up
2013-12-25 01:19:25 -08:00
Patrick Wendell
b2b7514ba3
Import clean-up (yay Aaron)
2013-12-25 01:19:25 -08:00
Patrick Wendell
d5f23e0083
Adding scheduling and reporting based on cores
2013-12-25 01:19:01 -08:00
Patrick Wendell
760823d393
Adding better option parsing
2013-12-25 01:19:01 -08:00
Patrick Wendell
6a4acc4c2d
Initial cut at driver submission.
2013-12-25 01:19:01 -08:00
Patrick Wendell
1070b566d4
Renaming Client => AppClient
2013-12-25 01:17:01 -08:00
Lian, Cheng
3bb714eaa3
Refactored NaiveBayes
...
* Minimized shuffle output with mapPartitions.
* Reduced RDD actions from 3 to 1.
2013-12-25 17:15:38 +08:00