spark-instrumented-optimizer/python
Patrick Wendell 55b7e2fdff Merge pull request #289 from tdas/filestream-fix
Bug fixes for file input stream and checkpointing

- Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.)
- Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration.
- Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten.
- Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.
2013-12-31 10:12:51 -08:00
..
examples Add banner to PySpark and make wordcount output nicer 2013-09-01 14:13:16 -07:00
lib Fix PySpark for assembly run and include it in dist 2013-08-29 21:19:06 -07:00
pyspark Merge pull request #289 from tdas/filestream-fix 2013-12-31 10:12:51 -08:00
test_support License headers 2013-12-09 16:41:01 -08:00
.gitignore Rename top-level 'pyspark' directory to 'python' 2013-01-01 15:05:00 -08:00
epydoc.conf Add custom serializer support to PySpark. 2013-11-10 16:45:38 -08:00
run-tests Add custom serializer support to PySpark. 2013-11-10 16:45:38 -08:00