spark-instrumented-optimizer

History

Patrick Wendell 55b7e2fdff Merge pull request #289 from tdas/filestream-fix Bug fixes for file input stream and checkpointing - Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.) - Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration. - Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten. - Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.		2013-12-31 10:12:51 -08:00
..
mllib	Remove commented code in __init__.py.	2013-12-25 14:12:42 -05:00
__init__.py	Split the mllib bindings into a whole bunch of modules and rename some things.	2013-12-25 00:08:05 -05:00
accumulators.py	Add custom serializer support to PySpark.	2013-11-10 16:45:38 -08:00
broadcast.py	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
cloudpickle.py	Rename top-level 'pyspark' directory to 'python'	2013-01-01 15:05:00 -08:00
context.py	Fixed Python API for sc.setCheckpointDir. Also other fixes based on Reynold's comments on PR 289.	2013-12-24 14:01:13 -08:00
daemon.py	Add Apache license headers and LICENSE and NOTICE files	2013-07-16 17:21:33 -07:00
files.py	Initial work to rename package to org.apache.spark	2013-09-01 14:13:13 -07:00
java_gateway.py	Python change for move of PythonMLLibAPI.	2013-12-24 16:49:03 -05:00
join.py	Change numSplits to numPartitions in PySpark.	2013-02-24 13:25:09 -08:00
rdd.py	Merge pull request #276 from shivaram/collectPartition	2013-12-19 13:35:09 -08:00
rddsampler.py	RDD sample() and takeSample() prototypes for PySpark	2013-08-28 16:46:13 -07:00
serializers.py	The rest of the Python side of those bindings.	2013-12-19 01:29:51 -05:00
shell.py	Typo: avaiable -> available	2013-12-24 17:25:04 -08:00
statcounter.py	Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark	2013-08-21 17:05:58 -07:00
storagelevel.py	Export StorageLevel and refactor	2013-09-07 14:41:31 -07:00
tests.py	Fixed Python API for sc.setCheckpointDir. Also other fixes based on Reynold's comments on PR 289.	2013-12-24 14:01:13 -08:00
worker.py	FramedSerializer: _dumps => dumps, _loads => loads.	2013-11-10 17:53:25 -08:00