ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
root	ec31e68d5d	Fixed PySpark perf regression by not using socket.makefile(), and improved debuggability by letting "print" statements show up in the executor's stderr Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala	2013-07-01 06:26:31 +00:00
Jey Kottalam	c75bed0eeb	Fix reporting of PySpark exceptions	2013-06-21 12:14:16 -04:00
Jey Kottalam	7c5ff733ee	PySpark daemon: fix deadlock, improve error handling	2013-06-21 12:14:16 -04:00
Jey Kottalam	62c4781400	Add tests and fixes for Python daemon shutdown	2013-06-21 12:14:16 -04:00
Jey Kottalam	c79a6078c3	Prefork Python worker processes	2013-06-21 12:14:16 -04:00
Jey Kottalam	40afe0d2a5	Add Python timing instrumentation	2013-06-21 12:14:16 -04:00
Jey Kottalam	9a731f5a6d	Fix Python saveAsTextFile doctest to not expect order to be preserved	2013-04-02 11:59:20 -07:00
Jey Kottalam	20604001e2	Fix argv handling in Python transitive closure example	2013-04-02 11:59:07 -07:00
Josh Rosen	2c966c98fb	Change numSplits to numPartitions in PySpark.	2013-02-24 13:25:09 -08:00
Mark Hamstra	b7a1fb5c5d	Add commutative requirement for 'reduce' to Python docstring.	2013-02-09 12:14:11 -08:00
Josh Rosen	e61729113d	Remove unnecessary doctest __main__ methods.	2013-02-03 21:29:40 -08:00
Josh Rosen	8fbd5380b7	Fetch fewer objects in PySpark's take() method.	2013-02-03 06:44:49 +00:00
Josh Rosen	2415c18f48	Fix reporting of PySpark doctest failures.	2013-02-03 06:44:11 +00:00
Josh Rosen	e211f405bc	Use spark.local.dir for PySpark temp files (SPARK-580).	2013-02-01 11:50:27 -08:00
Josh Rosen	9cc6ff9c4e	Do not launch JavaGateways on workers (SPARK-674). The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.	2013-02-01 11:13:10 -08:00
Josh Rosen	57b64d0d19	Fix stdout redirection in PySpark.	2013-02-01 00:25:19 -08:00
Patrick Wendell	3446d5c8d6	SPARK-673: Capture and re-throw Python exceptions This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.	2013-01-31 18:06:11 -08:00
Matei Zaharia	55327a283e	Merge pull request #430 from pwendell/pyspark-guide Minor improvements to PySpark docs	2013-01-30 15:35:29 -08:00
Patrick Wendell	3f945e3b83	Make module help available in python shell. Also, adds a line in doc explaining how to use.	2013-01-30 15:04:06 -08:00
Stephen Haberman	7dfb82a992	Replace old 'master' term with 'driver'.	2013-01-25 11:03:00 -06:00
Matei Zaharia	a2f4891d1d	Merge pull request #396 from JoshRosen/spark-653 Make PySpark AccumulatorParam an abstract base class	2013-01-24 13:05:03 -08:00
Josh Rosen	b47d054cfc	Remove use of abc.ABCMeta due to cloudpickle issue. cloudpickle runs into issues while pickling subclasses of AccumulatorParam, which may be related to this Python issue: http://bugs.python.org/issue7689 This seems hard to fix and the ABCMeta wasn't necessary, so I removed it.	2013-01-23 11:47:27 -08:00
Josh Rosen	ae2ed2947d	Allow PySpark's SparkFiles to be used from driver Fix minor documentation formatting issues.	2013-01-23 10:58:50 -08:00
Josh Rosen	35168d9c89	Fix sys.path bug in PySpark SparkContext.addPyFile	2013-01-22 17:54:11 -08:00
Josh Rosen	c75ae3622e	Make AccumulatorParam an abstract base class.	2013-01-21 22:32:57 -08:00
Josh Rosen	ef711902c1	Don't download files to master's working directory. This should avoid exceptions caused by existing files with different contents. I also removed some unused code.	2013-01-21 17:34:17 -08:00
Matei Zaharia	c7b5e5f1ec	Merge pull request #389 from JoshRosen/python_rdd_checkpointing Add checkpointing to the Python API	2013-01-20 17:10:44 -08:00
Josh Rosen	9f211dd3f0	Fix PythonPartitioner equality; see SPARK-654. PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future.	2013-01-20 15:41:42 -08:00
Josh Rosen	00d70cd660	Clean up setup code in PySpark checkpointing tests	2013-01-20 15:38:11 -08:00
Josh Rosen	5b6ea9e9a0	Update checkpointing API docs in Python/Java.	2013-01-20 15:31:41 -08:00
Josh Rosen	d0ba80dc72	Add checkpointFile() and more tests to PySpark.	2013-01-20 13:59:45 -08:00
Josh Rosen	7ed1bf4b48	Add RDD checkpointing to Python API.	2013-01-20 13:19:19 -08:00
Josh Rosen	17035db159	Add __repr__ to Accumulator; fix bug in sc.accumulator	2013-01-20 11:58:57 -08:00
Josh Rosen	9f54d7e1f5	Merge pull request #387 from mateiz/python-accumulators Add accumulators to PySpark	2013-01-20 11:00:36 -08:00
Matei Zaharia	2a8c2a6790	Minor formatting fixes	2013-01-20 10:24:53 -08:00
Matei Zaharia	a23ed25f3c	Add a class comment to Accumulator	2013-01-20 02:10:25 -08:00
Matei Zaharia	61b6382a35	Launch accumulator tests in run-tests	2013-01-20 01:59:07 -08:00
Matei Zaharia	8e7f098a2c	Added accumulators to PySpark	2013-01-20 01:57:44 -08:00
Nick Pentreath	b77f7390a5	Python ALS example	2013-01-15 09:04:32 +02:00
Josh Rosen	49c74ba2af	Change PYSPARK_PYTHON_EXEC to PYSPARK_PYTHON.	2013-01-10 08:10:59 -08:00
Josh Rosen	d55f2b9882	Use take() instead of takeSample() in PySpark kmeans example. This is a temporary change until we port takeSample().	2013-01-09 21:21:23 -08:00
Josh Rosen	1a64432ba5	Indicate success/failure in PySpark test script.	2013-01-09 20:30:36 -08:00
Josh Rosen	b57dd0f160	Add mapPartitionsWithSplit() to PySpark.	2013-01-08 16:05:02 -08:00
Josh Rosen	33beba3965	Change PySpark RDD.take() to not call iterator().	2013-01-03 14:52:21 -08:00
Josh Rosen	ce9f1bbe20	Add `pyspark` script to replace the other scripts. Expand the PySpark programming guide.	2013-01-01 21:25:49 -08:00
Josh Rosen	b58340dbd9	Rename top-level 'pyspark' directory to 'python'	2013-01-01 15:05:00 -08:00
Josh Rosen	9abdfa6633	Fix Python 2.6 compatibility in Python API.	2012-09-17 00:09:16 -07:00
Josh Rosen	886b39de55	Add Python API.	2012-08-18 22:33:51 -07:00

48 commits