ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Patrick Wendell	96a6ff0b09	Merge branch 'dev-merge' into datahandler-fix Conflicts: streaming/src/main/scala/spark/streaming/dstream/DataHandler.scala	2013-01-02 14:08:15 -08:00
Patrick Wendell	493d65ce65	Several code-quality improvements to DataHandler. - Changed to more accurate name: BufferingBlockCreator - Docstring now correctly reflects the abstraction offered by the class - Made internal methods private - Fixed indentation problems	2013-01-02 13:39:18 -08:00
Josh Rosen	ce9f1bbe20	Add `pyspark` script to replace the other scripts. Expand the PySpark programming guide.	2013-01-01 21:25:49 -08:00
Tathagata Das	3dc87dd923	Fixed compilation bug in RDDSuite created during merge for mesos/master.	2013-01-01 16:38:04 -08:00
Tathagata Das	d34dba25c2	Merge branch 'mesos' into dev-merge	2013-01-01 15:48:39 -08:00
Josh Rosen	b58340dbd9	Rename top-level 'pyspark' directory to 'python'	2013-01-01 15:05:00 -08:00
Josh Rosen	170e451fbd	Minor documentation and style fixes for PySpark.	2013-01-01 13:52:14 -08:00
Tathagata Das	02497f0cd4	Updated Streaming Programming Guide.	2013-01-01 12:21:32 -08:00
Matei Zaharia	55809fbc6d	Merge pull request #349 from woggling/cache-finally Avoid stalls when computation of cached RDD throws exception	2013-01-01 08:21:33 -08:00
Matei Zaharia	c593f6329e	Merge pull request #348 from JoshRosen/spark-597 Raise exception when hashing Java arrays (SPARK-597)	2013-01-01 08:20:06 -08:00
Charles Reiss	58072a7340	Remove some dead comments	2013-01-01 08:07:44 -08:00
Charles Reiss	21636ee4fa	Test with exception while computing cached RDD.	2013-01-01 08:07:40 -08:00
Charles Reiss	feadaf72f4	Mark key as not loading in CacheTracker even when compute() fails	2013-01-01 07:57:20 -08:00
Josh Rosen	f803953998	Raise exception when hashing Java arrays (SPARK-597)	2012-12-31 20:20:11 -08:00
Josh Rosen	6f6a6b79c4	Launch with `scala` by default in run-pyspark	2012-12-31 14:57:18 -08:00
Tathagata Das	18b9b3b99f	More classes made private[streaming] to hide from scala docs.	2012-12-30 20:00:42 -08:00
Tathagata Das	7e0271b438	Refactored a whole lot to push all DStreams into the spark.streaming.dstream package.	2012-12-30 15:19:55 -08:00
Tathagata Das	9e644402c1	Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs.	2012-12-29 18:31:51 -08:00
Josh Rosen	099898b439	Port LR example to PySpark using numpy. This version of the example crashes after the first iteration with "OverflowError: math range error" because Python's math.exp() behaves differently than Scala's; see SPARK-646.	2012-12-29 18:00:28 -08:00
Josh Rosen	39dd953fd8	Add test for pyspark.RDD.saveAsTextFile().	2012-12-29 17:06:50 -08:00
Josh Rosen	59195c68ec	Update PySpark for compatibility with TaskContext.	2012-12-29 16:01:03 -08:00
Josh Rosen	c5cee53f20	Merge remote-tracking branch 'origin/master' into python-api Conflicts: docs/quick-start.md	2012-12-29 16:00:51 -08:00
Josh Rosen	26186e2d25	Use batching in pyspark parallelize(); fix cartesian()	2012-12-29 15:34:57 -08:00
Matei Zaharia	3f74f729a1	Merge pull request #345 from JoshRosen/fix/add-file Fix deletion of files in current working directory by clearFiles()	2012-12-29 15:01:33 -08:00
Josh Rosen	6ee1ff2663	Fix bug in pyspark.serializers.batch; add .gitignore.	2012-12-29 22:25:34 +00:00
Patrick Wendell	518111573f	Merge pull request #8 from radlab/twitter-example Adding a Twitter InputDStream with an example	2012-12-29 14:23:01 -08:00
Josh Rosen	c2b105af34	Add documentation for Python API.	2012-12-28 22:51:28 -08:00
Josh Rosen	7ec3595de2	Fix bug (introduced by batching) in PySpark take()	2012-12-28 22:21:16 -08:00
Josh Rosen	397e67103c	Change Utils.fetchFile() warning to SparkException.	2012-12-28 17:37:13 -08:00
Josh Rosen	d64fa72d2e	Add addFile() and addJar() to JavaSparkContext.	2012-12-28 17:00:57 -08:00
Josh Rosen	bd237d4a9d	Add synchronization to LocalScheduler.updateDependencies().	2012-12-28 17:00:57 -08:00
Josh Rosen	f1bf4f0385	Skip deletion of files in clearFiles(). This fixes an issue where Spark could delete original files in the current working directory that were added to the job using addFile(). There was also the potential for addFile() to overwrite local files, which is addressed by changing Utils.fetchFile() to log a warning instead of overwriting a file with new contents. This is a short-term fix; a better long-term solution would be to remove the dependence on storing files in the current working directory, since we can't change the cwd from Java.	2012-12-28 17:00:57 -08:00
Josh Rosen	fbadb1cda5	Mark api.python classes as private; echo Java output to stderr.	2012-12-28 09:06:11 -08:00
Josh Rosen	665466dfff	Simplify PySpark installation. - Bundle Py4J binaries, since it's hard to install - Uses Spark's `run` script to launch the Py4J gateway, inheriting the settings in spark-env.sh With these changes, (hopefully) nothing more than running `sbt/sbt package` will be necessary to run PySpark.	2012-12-27 22:47:37 -08:00
Josh Rosen	ac32447cd3	Use addFile() to ship code to cluster in PySpark. Add options to pyspark.SparkContext constructor.	2012-12-27 19:59:04 -08:00
Josh Rosen	85b8f2c64f	Add epydoc API documentation for PySpark.	2012-12-27 18:04:10 -08:00
Tathagata Das	0bc0a60d30	Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests.	2012-12-27 15:37:33 -08:00
Josh Rosen	2d98fff065	Add IPython support to pyspark-shell. Suggested by / based on code from @MLnick	2012-12-27 10:17:36 -08:00
Tathagata Das	7c33f76291	Merge branch 'mesos' into dev-merge	2012-12-26 19:19:07 -08:00
Tathagata Das	836042bb9f	Merge branch 'dev-checkpoint' of github.com:radlab/spark into dev-merge Conflicts: core/src/main/scala/spark/ParallelCollection.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/CartesianRDD.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/CoalescedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala core/src/main/scala/spark/rdd/FlatMappedRDD.scala core/src/main/scala/spark/rdd/GlommedRDD.scala core/src/main/scala/spark/rdd/HadoopRDD.scala core/src/main/scala/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala core/src/main/scala/spark/rdd/MappedRDD.scala core/src/main/scala/spark/rdd/PipedRDD.scala core/src/main/scala/spark/rdd/SampledRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala core/src/main/scala/spark/rdd/UnionRDD.scala core/src/main/scala/spark/scheduler/ResultTask.scala core/src/test/scala/spark/CheckpointSuite.scala	2012-12-26 19:09:01 -08:00
Josh Rosen	1dca0c5180	Remove debug output from PythonPartitioner.	2012-12-26 18:23:06 -08:00
Josh Rosen	e2dad15621	Add support for batched serialization of Python objects in PySpark.	2012-12-26 18:16:09 -08:00
Josh Rosen	4608902fb8	Use filesystem to collect RDDs in PySpark. Passing large volumes of data through Py4J seems to be slow. It appears to be faster to write the data to the local filesystem and read it back from Python.	2012-12-24 17:20:10 -08:00
Matei Zaharia	84587a9bf3	Merge pull request #343 from markhamstra/spark-601 lookup() needn't fail when there is no partitioner	2012-12-24 15:28:05 -08:00
Josh Rosen	ccd075cf96	Reduce object overhead in Pyspark shuffle and collect	2012-12-24 15:01:13 -08:00
Mark Hamstra	903f3518df	fall back to filter-map-collect when calling lookup() on an RDD without a partitioner	2012-12-24 13:18:45 -08:00
Matei Zaharia	b575cbe069	Merge pull request #342 from markhamstra/spark-645 Allow distinct() to be called without parentheses	2012-12-24 08:04:50 -08:00
Mark Hamstra	61be8566e2	Allow distinct() to be called without parentheses when using the default number of splits.	2012-12-24 02:36:47 -08:00
Patrick Wendell	bce84ceabb	Minor changes after review and general cleanup. - Added filters to Twitter example - Removed un-used import - Some code clean-up	2012-12-21 20:57:46 -08:00
Patrick Wendell	9ac4cb1c5f	Adding a Twitter InputDStream with an example	2012-12-21 17:18:19 -08:00

... 191 192 193 194 195 ...

11336 commits