Commit graph

679 commits

Author SHA1 Message Date
Matei Zaharia fbb3fc4143 Merge pull request #346 from JoshRosen/python-api
Python API (PySpark)
2013-01-12 23:49:36 -08:00
Tyson 1731f1fed4 Added an optional format parameter for individual job queries and optimized the jobId query 2013-01-11 15:01:43 -05:00
Tyson c063e8777e Added implicit json writers for JobDescription and ExecutorRunner 2013-01-11 14:57:38 -05:00
Matei Zaharia 2e914d9983 Formatting 2013-01-10 19:13:08 -08:00
Matei Zaharia 3548c9c0c8 Merge branch 'master' of github.com:mesos/spark 2013-01-10 19:06:40 -08:00
Matei Zaharia 6d1c230281 Merge pull request #357 from tysonjh/master
JSON support added to WebUI
2013-01-10 19:06:07 -08:00
Matei Zaharia 248995c535 Merge pull request #356 from shane-huang/master
Fix an issue in ConnectionManager where sendMessage may create too many unnecessary connections
2013-01-10 17:52:23 -08:00
shane-huang 9930a95d21 Modified Patch according to comments 2013-01-10 20:09:55 +08:00
Tyson 549ee388a1 Removed io.spray spray-json dependency as it is not needed. 2013-01-09 15:12:23 -05:00
Tyson bf9d9946f9 Query parameter reformatted to be more extensible and routing more robust 2013-01-09 11:29:58 -05:00
Tyson 0da2ff102e Added url query parameter json and handler 2013-01-09 10:40:48 -05:00
Tyson 269fe018c7 JSON object definitions 2013-01-09 10:40:43 -05:00
Matei Zaharia 9cc764f523 Code style 2013-01-08 22:29:57 -08:00
Matei Zaharia 14972141f9 Merge pull request #344 from mbautin/log_preferred_hosts
Log preferred hosts
2013-01-08 22:26:34 -08:00
Josh Rosen b57dd0f160 Add mapPartitionsWithSplit() to PySpark. 2013-01-08 16:05:02 -08:00
Stephen Haberman 8ac0f35be4 Add JavaRDDLike.keyBy. 2013-01-08 09:57:45 -06:00
Stephen Haberman 4ee6b22775 Merge branch 'master' into tupleBy
Conflicts:
	core/src/test/scala/spark/RDDSuite.scala
2013-01-08 09:10:10 -06:00
shane-huang e4cb72da8a Fix an issue in ConnectionManager where sendingMessage may create too many unnecessary SendingConnections. 2013-01-08 22:40:58 +08:00
Mikhail Bautin 4725b0f643 Fixing if/else coding style for preferred hosts logging 2013-01-07 20:09:26 -08:00
Mikhail Bautin c41042c816 Log preferred hosts 2013-01-07 20:06:09 -08:00
Shivaram Venkataraman 77d751731c Remove unused BoundedMemoryCache file and associated test case. 2013-01-07 15:57:46 -08:00
Shivaram Venkataraman aed368a970 Update Hadoop dependency to 1.0.3 as 0.20 has Sun specific dependencies. Also
fix SequenceFileRDDFunctions to pick the right type conversion across Hadoop
versions
2013-01-07 15:57:33 -08:00
Shivaram Venkataraman f8d579a0c0 Remove dependencies on sun jvm classes. Instead use reflection to infer
HotSpot options and total physical memory size
2013-01-07 15:57:18 -08:00
Matei Zaharia 1941d9602d Merge branch 'master' of github.com:mesos/spark 2013-01-07 16:50:39 -05:00
Matei Zaharia 9c32f300fb Add Accumulable.setValue for easier use in Java 2013-01-07 16:50:23 -05:00
Stephen Haberman 8dc06069fe Rename RDD.tupleBy to keyBy. 2013-01-06 15:21:45 -06:00
Matei Zaharia 8fd3a70c18 Add PairRDD.keys() and values() to Java API 2013-01-05 22:46:45 -05:00
Matei Zaharia b1663752c6 Merge pull request #351 from stephenh/values
Add PairRDDFunctions.keys and values.
2013-01-05 19:15:54 -08:00
Matei Zaharia 0982572519 Add methods called just 'accumulator' for int/double in Java API 2013-01-05 22:11:28 -05:00
Matei Zaharia 86af64b0a6 Fix Accumulators in Java, and add a test for them 2013-01-05 20:55:17 -05:00
Stephen Haberman 1fdb6946b5 Add RDD.tupleBy. 2013-01-05 13:07:59 -06:00
Stephen Haberman f4e6b9361f Add RDD.collect(PartialFunction). 2013-01-05 12:14:08 -06:00
Stephen Haberman 8d57c78c83 Add PairRDDFunctions.keys and values. 2013-01-05 12:04:01 -06:00
Josh Rosen 33beba3965 Change PySpark RDD.take() to not call iterator(). 2013-01-03 14:52:21 -08:00
Josh Rosen b58340dbd9 Rename top-level 'pyspark' directory to 'python' 2013-01-01 15:05:00 -08:00
Josh Rosen 170e451fbd Minor documentation and style fixes for PySpark. 2013-01-01 13:52:14 -08:00
Matei Zaharia 55809fbc6d Merge pull request #349 from woggling/cache-finally
Avoid stalls when computation of cached RDD throws exception
2013-01-01 08:21:33 -08:00
Charles Reiss 58072a7340 Remove some dead comments 2013-01-01 08:07:44 -08:00
Charles Reiss feadaf72f4 Mark key as not loading in CacheTracker even when compute() fails 2013-01-01 07:57:20 -08:00
Josh Rosen f803953998 Raise exception when hashing Java arrays (SPARK-597) 2012-12-31 20:20:11 -08:00
Josh Rosen 59195c68ec Update PySpark for compatibility with TaskContext. 2012-12-29 16:01:03 -08:00
Josh Rosen c5cee53f20 Merge remote-tracking branch 'origin/master' into python-api
Conflicts:
	docs/quick-start.md
2012-12-29 16:00:51 -08:00
Josh Rosen 7ec3595de2 Fix bug (introduced by batching) in PySpark take() 2012-12-28 22:21:16 -08:00
Josh Rosen 397e67103c Change Utils.fetchFile() warning to SparkException. 2012-12-28 17:37:13 -08:00
Josh Rosen d64fa72d2e Add addFile() and addJar() to JavaSparkContext. 2012-12-28 17:00:57 -08:00
Josh Rosen bd237d4a9d Add synchronization to LocalScheduler.updateDependencies(). 2012-12-28 17:00:57 -08:00
Josh Rosen f1bf4f0385 Skip deletion of files in clearFiles().
This fixes an issue where Spark could delete
original files in the current working directory
that were added to the job using addFile().

There was also the potential for addFile() to
overwrite local files, which is addressed by
changing Utils.fetchFile() to log a warning
instead of overwriting a file with new contents.

This is a short-term fix; a better long-term
solution would be to remove the dependence on
storing files in the current working directory,
since we can't change the cwd from Java.
2012-12-28 17:00:57 -08:00
Josh Rosen fbadb1cda5 Mark api.python classes as private; echo Java output to stderr. 2012-12-28 09:06:11 -08:00
Josh Rosen 1dca0c5180 Remove debug output from PythonPartitioner. 2012-12-26 18:23:06 -08:00
Josh Rosen 4608902fb8 Use filesystem to collect RDDs in PySpark.
Passing large volumes of data through Py4J seems
to be slow.  It appears to be faster to write the
data to the local filesystem and read it back from
Python.
2012-12-24 17:20:10 -08:00