ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Tathagata Das	934ecc829a	Removed streaming-env.sh.template	2013-01-06 14:15:07 -08:00
Stephen Haberman	8dc06069fe	Rename RDD.tupleBy to keyBy.	2013-01-06 15:21:45 -06:00
Matei Zaharia	8fd3a70c18	Add PairRDD.keys() and values() to Java API	2013-01-05 22:46:45 -05:00
Matei Zaharia	b1663752c6	Merge pull request #351 from stephenh/values Add PairRDDFunctions.keys and values.	2013-01-05 19:15:54 -08:00
Matei Zaharia	0982572519	Add methods called just 'accumulator' for int/double in Java API	2013-01-05 22:11:28 -05:00
Matei Zaharia	86af64b0a6	Fix Accumulators in Java, and add a test for them	2013-01-05 20:55:17 -05:00
Matei Zaharia	ecf9c08901	Fix Accumulators in Java, and add a test for them	2013-01-05 20:54:08 -05:00
Stephen Haberman	1fdb6946b5	Add RDD.tupleBy.	2013-01-05 13:07:59 -06:00
Stephen Haberman	6a0db3b449	Fix typo.	2013-01-05 12:56:17 -06:00
Matei Zaharia	7ab9f09140	Merge pull request #352 from stephenh/collect Add RDD.collect(PartialFunction).	2013-01-05 10:17:20 -08:00
Stephen Haberman	f4e6b9361f	Add RDD.collect(PartialFunction).	2013-01-05 12:14:08 -06:00
Stephen Haberman	8d57c78c83	Add PairRDDFunctions.keys and values.	2013-01-05 12:04:01 -06:00
Josh Rosen	33beba3965	Change PySpark RDD.take() to not call iterator().	2013-01-03 14:52:21 -08:00
Patrick Wendell	c438faeac4	Merge pull request #10 from radlab/datahandler-fix Several code-quality improvements to DataHandler.	2013-01-02 17:07:12 -08:00
Patrick Wendell	2ef993d159	BufferingBlockCreator -> NetworkReceiver.BlockGenerator	2013-01-02 14:19:51 -08:00
Patrick Wendell	96a6ff0b09	Merge branch 'dev-merge' into datahandler-fix Conflicts: streaming/src/main/scala/spark/streaming/dstream/DataHandler.scala	2013-01-02 14:08:15 -08:00
Patrick Wendell	493d65ce65	Several code-quality improvements to DataHandler. - Changed to more accurate name: BufferingBlockCreator - Docstring now correctly reflects the abstraction offered by the class - Made internal methods private - Fixed indentation problems	2013-01-02 13:39:18 -08:00
Josh Rosen	ce9f1bbe20	Add `pyspark` script to replace the other scripts. Expand the PySpark programming guide.	2013-01-01 21:25:49 -08:00
Tathagata Das	3dc87dd923	Fixed compilation bug in RDDSuite created during merge for mesos/master.	2013-01-01 16:38:04 -08:00
Tathagata Das	d34dba25c2	Merge branch 'mesos' into dev-merge	2013-01-01 15:48:39 -08:00
Josh Rosen	b58340dbd9	Rename top-level 'pyspark' directory to 'python'	2013-01-01 15:05:00 -08:00
Josh Rosen	170e451fbd	Minor documentation and style fixes for PySpark.	2013-01-01 13:52:14 -08:00
Tathagata Das	02497f0cd4	Updated Streaming Programming Guide.	2013-01-01 12:21:32 -08:00
Matei Zaharia	55809fbc6d	Merge pull request #349 from woggling/cache-finally Avoid stalls when computation of cached RDD throws exception	2013-01-01 08:21:33 -08:00
Matei Zaharia	c593f6329e	Merge pull request #348 from JoshRosen/spark-597 Raise exception when hashing Java arrays (SPARK-597)	2013-01-01 08:20:06 -08:00
Charles Reiss	58072a7340	Remove some dead comments	2013-01-01 08:07:44 -08:00
Charles Reiss	21636ee4fa	Test with exception while computing cached RDD.	2013-01-01 08:07:40 -08:00
Charles Reiss	feadaf72f4	Mark key as not loading in CacheTracker even when compute() fails	2013-01-01 07:57:20 -08:00
Josh Rosen	f803953998	Raise exception when hashing Java arrays (SPARK-597)	2012-12-31 20:20:11 -08:00
Josh Rosen	6f6a6b79c4	Launch with `scala` by default in run-pyspark	2012-12-31 14:57:18 -08:00
Tathagata Das	18b9b3b99f	More classes made private[streaming] to hide from scala docs.	2012-12-30 20:00:42 -08:00
Tathagata Das	7e0271b438	Refactored a whole lot to push all DStreams into the spark.streaming.dstream package.	2012-12-30 15:19:55 -08:00
Tathagata Das	9e644402c1	Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs.	2012-12-29 18:31:51 -08:00
Josh Rosen	099898b439	Port LR example to PySpark using numpy. This version of the example crashes after the first iteration with "OverflowError: math range error" because Python's math.exp() behaves differently than Scala's; see SPARK-646.	2012-12-29 18:00:28 -08:00
Josh Rosen	39dd953fd8	Add test for pyspark.RDD.saveAsTextFile().	2012-12-29 17:06:50 -08:00
Josh Rosen	59195c68ec	Update PySpark for compatibility with TaskContext.	2012-12-29 16:01:03 -08:00
Josh Rosen	c5cee53f20	Merge remote-tracking branch 'origin/master' into python-api Conflicts: docs/quick-start.md	2012-12-29 16:00:51 -08:00
Josh Rosen	26186e2d25	Use batching in pyspark parallelize(); fix cartesian()	2012-12-29 15:34:57 -08:00
Matei Zaharia	3f74f729a1	Merge pull request #345 from JoshRosen/fix/add-file Fix deletion of files in current working directory by clearFiles()	2012-12-29 15:01:33 -08:00
Josh Rosen	6ee1ff2663	Fix bug in pyspark.serializers.batch; add .gitignore.	2012-12-29 22:25:34 +00:00
Patrick Wendell	518111573f	Merge pull request #8 from radlab/twitter-example Adding a Twitter InputDStream with an example	2012-12-29 14:23:01 -08:00
Josh Rosen	c2b105af34	Add documentation for Python API.	2012-12-28 22:51:28 -08:00
Josh Rosen	7ec3595de2	Fix bug (introduced by batching) in PySpark take()	2012-12-28 22:21:16 -08:00
Josh Rosen	397e67103c	Change Utils.fetchFile() warning to SparkException.	2012-12-28 17:37:13 -08:00
Josh Rosen	d64fa72d2e	Add addFile() and addJar() to JavaSparkContext.	2012-12-28 17:00:57 -08:00
Josh Rosen	bd237d4a9d	Add synchronization to LocalScheduler.updateDependencies().	2012-12-28 17:00:57 -08:00
Josh Rosen	f1bf4f0385	Skip deletion of files in clearFiles(). This fixes an issue where Spark could delete original files in the current working directory that were added to the job using addFile(). There was also the potential for addFile() to overwrite local files, which is addressed by changing Utils.fetchFile() to log a warning instead of overwriting a file with new contents. This is a short-term fix; a better long-term solution would be to remove the dependence on storing files in the current working directory, since we can't change the cwd from Java.	2012-12-28 17:00:57 -08:00
Josh Rosen	fbadb1cda5	Mark api.python classes as private; echo Java output to stderr.	2012-12-28 09:06:11 -08:00
Josh Rosen	665466dfff	Simplify PySpark installation. - Bundle Py4J binaries, since it's hard to install - Uses Spark's `run` script to launch the Py4J gateway, inheriting the settings in spark-env.sh With these changes, (hopefully) nothing more than running `sbt/sbt package` will be necessary to run PySpark.	2012-12-27 22:47:37 -08:00
Josh Rosen	ac32447cd3	Use addFile() to ship code to cluster in PySpark. Add options to pyspark.SparkContext constructor.	2012-12-27 19:59:04 -08:00

... 3 4 5 6 7 ...

1951 commits