Commit graph

5053 commits

Author SHA1 Message Date
Reza Zadeh 3369c2d487 cleanup documentation 2013-12-27 00:41:46 -05:00
Reza Zadeh bdb5037987 add all tests 2013-12-27 00:36:41 -05:00
Reza Zadeh fa1e8d8cbf test for truncated svd 2013-12-27 00:34:59 -05:00
Reza Zadeh 16de5268e3 full rank matrix test added 2013-12-26 23:21:57 -05:00
Matei Zaharia 5e69fc5bb4 Merge pull request #295 from markhamstra/JobProgressListenerNPE
Avoid a lump of coal (NPE) in JobProgressListener's stocking.
2013-12-26 19:10:39 -05:00
Aaron Davidson 4f2fb761b0 Decrease margin of left side of log page 2013-12-26 15:38:45 -08:00
Reza Zadeh fe1a132d40 Main method added for svd 2013-12-26 18:13:21 -05:00
Reza Zadeh 1a21ba2967 new main file 2013-12-26 18:09:33 -05:00
Reza Zadeh 6c3674cd23 Object to hold the svd methods 2013-12-26 17:39:25 -05:00
Reza Zadeh 6e740cc901 Some documentation 2013-12-26 16:12:40 -05:00
Tathagata Das 3618d70b2a Added warning if filestream adds files with no data in them (file RDDs have 0 partitions). 2013-12-26 12:45:40 -08:00
Tathagata Das be64719138 Changed file stream to not catch any exceptions related to finding new files (FileNotFound exception is still caught and ignored). 2013-12-26 12:33:12 -08:00
Reza Zadeh 1a173f00bd Initial files - no tests 2013-12-26 15:01:03 -05:00
Matei Zaharia e240bad03b Merge pull request #296 from witgo/master
Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn package
2013-12-26 12:30:48 -05:00
Tathagata Das bacc65cf28 Removed slack time in file stream and added better handling of exceptions due to failures due FileNotFound exceptions. 2013-12-26 10:18:46 +00:00
liguoqiang b662c88a24 fix this import order 2013-12-26 15:49:33 +08:00
Mark Hamstra c529dceaff Avoid a lump of coal (NPE) in JobProgressListener's stocking. 2013-12-25 23:10:02 -08:00
Matei Zaharia c344ed04c7 Merge pull request #283 from tmyklebu/master
Python bindings for mllib

This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib.

For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py.  The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub.  The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model.

ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside.  The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method.

I have tested these bindings on an x86_64 machine running Linux.  There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.
2013-12-26 01:31:06 -05:00
liguoqiang 2bd76f693d Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn 2013-12-26 11:10:35 +08:00
liguoqiang 14fcef72db Renamed ClusterScheduler to TaskSchedulerImpl for yarn and new-yarn 2013-12-26 11:05:07 +08:00
Tor Myklebust 9cbcf81453 Remove commented code in __init__.py. 2013-12-25 14:12:42 -05:00
Tor Myklebust 5e71354cb7 Fix copypasta in __init__.py. Don't import anything directly into pyspark.mllib. 2013-12-25 14:10:55 -05:00
Matei Zaharia 56094bcd8d Merge pull request #290 from ash211/patch-3
Typo: avaiable -> available
2013-12-25 13:14:33 -05:00
Reynold Xin 4842a07da8 Merge pull request #287 from azuryyu/master
Fixed job name in the java streaming example.
2013-12-25 01:52:15 -08:00
Tor Myklebust 02208a175c Initial weights in Scala are ones; do that too. Also fix some errors. 2013-12-25 00:53:48 -05:00
Tor Myklebust 4e821390bc Scala stubs for updated Python bindings. 2013-12-25 00:09:00 -05:00
Tor Myklebust 05163057a1 Split the mllib bindings into a whole bunch of modules and rename some things. 2013-12-25 00:08:05 -05:00
Andrew Ash 3665c722b5 Typo: avaiable -> available 2013-12-24 17:25:04 -08:00
Patrick Wendell 85a344b4f0 Merge pull request #127 from kayousterhout/consolidate_schedulers
Deduplicate Local and Cluster schedulers.

The code in LocalScheduler/LocalTaskSetManager was nearly identical
to the code in ClusterScheduler/ClusterTaskSetManager. The redundancy
made making updating the schedulers unnecessarily painful and error-
prone. This commit combines the two into a single TaskScheduler/
TaskSetManager.

Unfortunately the diff makes this change look much more invasive than it is -- TaskScheduler.scala is only superficially changed (names updated, overrides removed) from the old ClusterScheduler.scala, and the same with
TaskSetManager.scala.

Thanks @rxin for suggesting this change!
2013-12-24 16:35:06 -08:00
Binh Nguyen 786f393a98 Fix imports order 2013-12-24 14:59:30 -08:00
Binh Nguyen 9115a5de62 Remove import * and fix some formatting 2013-12-24 14:59:30 -08:00
Binh Nguyen 040dd3ecd5 upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final 2013-12-24 14:58:18 -08:00
Patrick Wendell c2dd6bcd6e Merge pull request #279 from aarondav/shuffle-cleanup0
Clean up shuffle files once their metadata is gone

Previously, we would only clean the in-memory metadata for consolidated shuffle files.

Additionally, fixes a bug where the Metadata Cleaner was ignoring type-specific TTLs.
2013-12-24 14:36:47 -08:00
Kay Ousterhout 1efe3adf56 Responded to Reynold's style comments 2013-12-24 14:18:39 -08:00
Tathagata Das d4dfab503a Fixed Python API for sc.setCheckpointDir. Also other fixes based on Reynold's comments on PR 289. 2013-12-24 14:01:13 -08:00
Tor Myklebust 86e38c4942 Remove useless line from test stub. 2013-12-24 16:49:31 -05:00
Tor Myklebust 4efec6eb94 Python change for move of PythonMLLibAPI. 2013-12-24 16:49:03 -05:00
Tor Myklebust 58e2a7d6d4 Move PythonMLLibAPI into its own package. 2013-12-24 16:48:40 -05:00
Matei Zaharia 3bf7c708d3 Merge pull request #275 from ueshin/wip/changeclasspathorder
Change the order of CLASSPATH.

SPARK_TOOLS_JAR should be placed after CLASSPATH or at least after
SPARK_CLASSPATH.

If SPARK_TOOLS_JAR is placed before CLASSPATH, all assembled classes and
resources in spark-tools-assembly.jar beat those in CLASSPATH or
SPARK_CLASSPATH, which might be replaced by customized versions.
2013-12-24 16:37:13 -05:00
Tor Myklebust 2402180b32 Fix error message ugliness. 2013-12-24 16:18:33 -05:00
Tathagata Das 9f79fd89dc Merge branch 'apache-master' into filestream-fix 2013-12-24 11:38:17 -08:00
azuryyu 66b7bea7f8 Make App report interval configurable during 'run on Yarn' 2013-12-24 18:16:49 +08:00
azuryyu a8bb86389d Fixed job name in the java streaming example. 2013-12-24 16:52:20 +08:00
Reynold Xin d63856c361 Merge pull request #286 from rxin/build
Show full stack trace and time taken in unit tests.
2013-12-23 22:07:26 -08:00
Reynold Xin fc80b2e693 Show full stack trace and time taken in unit tests. 2013-12-23 21:20:20 -08:00
Matei Zaharia 23a9ae6be3 Merge pull request #277 from tdas/scheduler-update
Refactored the streaming scheduler and added StreamingListener interface

- Refactored the streaming scheduler for cleaner code. Specifically, the JobManager was renamed to JobScheduler, as it does the actual scheduling of Spark jobs to the SparkContext. The earlier Scheduler was renamed to JobGenerator, as it actually generates the jobs from the DStreams. The JobScheduler starts the JobGenerator. Also, moved all the scheduler related code from spark.streaming to spark.streaming.scheduler package.
- Implemented the StreamingListener interface, similar to SparkListener. The streaming version of StatusReportListener prints the batch processing time statistics (for now). Added StreamingListernerSuite to test it.
- Refactored streaming TestSuiteBase for deduping code in the other streaming testsuites.
2013-12-24 00:08:48 -05:00
Tathagata Das 0af7f84c8e Minor formatting fixes. 2013-12-23 17:47:16 -08:00
Tathagata Das 8ca14a1e51 Updated testsuites to work with the slack time of file stream. 2013-12-23 16:27:00 -08:00
Tathagata Das b31e91f927 Merge branch 'scheduler-update' into filestream-fix 2013-12-23 15:59:15 -08:00
Tathagata Das 6eaa050549 Minor change for PR 277. 2013-12-23 15:55:45 -08:00