Commit graph

2987 commits

Author SHA1 Message Date
root ec31e68d5d Fixed PySpark perf regression by not using socket.makefile(), and improved
debuggability by letting "print" statements show up in the executor's stderr

Conflicts:
	core/src/main/scala/spark/api/python/PythonRDD.scala
2013-07-01 06:26:31 +00:00
root 3296d132b6 Fix performance bug with new Python code not using buffered streams 2013-07-01 06:25:43 +00:00
Matei Zaharia 39ae073b5c Increase SLF4j version in Maven too 2013-06-30 17:11:14 -07:00
Matei Zaharia 5bbd0eec84 Update docs on SCALA_LIBRARY_PATH 2013-06-30 17:00:40 -07:00
Matei Zaharia 03d0b858c8 Made use of spark.executor.memory setting consistent and documented it
Conflicts:

	core/src/main/scala/spark/SparkContext.scala
2013-06-30 15:46:46 -07:00
Matei Zaharia ccfe953a4d Merge pull request #577 from skumargithub/master
Example of cumulative counting using updateStateByKey
2013-06-29 17:57:53 -07:00
Matei Zaharia 5cfcd3c336 Remove Twitter4J specific repo since it's in Maven central 2013-06-29 15:37:27 -07:00
Matei Zaharia 4358acfe07 Initialize Twitter4J OAuth from system properties instead of prompting 2013-06-29 15:25:06 -07:00
Matei Zaharia 1667158544 Merge remote-tracking branch 'mrpotes/master' 2013-06-29 14:36:09 -07:00
Matei Zaharia 50ca17635a Merge pull request #664 from pwendell/test-fix
Removing incorrect test statement
2013-06-27 22:24:52 -07:00
Matei Zaharia 4974b658ed Look at JAVA_HOME before PATH to determine Java executable 2013-06-27 22:16:40 -07:00
Patrick Wendell c767e74370 Removing incorrect test statement 2013-06-27 21:48:58 -07:00
Matei Zaharia aea727f68d Simplify Python docs a little to do substring search 2013-06-26 21:15:09 -07:00
Matei Zaharia 03906f7f0a Fixes to compute-classpath on Windows 2013-06-26 17:40:22 -07:00
Matei Zaharia e49bc8ca8c Merge pull request #663 from stephenh/option_and_getenv
Be cute with Option and getenv.
2013-06-26 11:13:33 -07:00
Stephen Haberman d7011632d1 Wrap lines. 2013-06-26 12:35:57 -05:00
Stephen Haberman d11025dc6a Be cute with Option and getenv. 2013-06-26 09:53:35 -05:00
Matei Zaharia 32370da4e4 Don't use forward slash in exclusion for JAR signature files 2013-06-25 22:08:19 -04:00
Matei Zaharia 9f0d913295 Refactored tests to share SparkContexts in some of them
Creating these seems to take a while and clutters the output with Akka
stuff, so it would be nice to share them.
2013-06-25 19:18:30 -04:00
Matei Zaharia 2bd04c3513 Formatting 2013-06-25 18:37:14 -04:00
Matei Zaharia f2263350ed Added a local-cluster mode test to ReplSuite 2013-06-25 18:35:35 -04:00
Matei Zaharia 6c8d1b2ca6 Fix computation of classpath when we launch java directly
The previous version assumed that a CLASSPATH environment variable was
set by the "run" script when launching the process that starts the
ExecutorRunner, but unfortunately this is not true in tests. Instead, we
factor the classpath calculation into an extenral script and call that.

NOTE: This includes a Windows version but hasn't yet been tested there.
2013-06-25 18:21:00 -04:00
James Phillpotts 176193b1e8 Fix usage and parameter extraction 2013-06-25 23:06:15 +01:00
James Phillpotts 366572edca Include a default OAuth implementation, and update examples and JavaStreamingContext 2013-06-25 22:59:34 +01:00
Matei Zaharia 15b00914c5 Some fixes to the launch-java-directly change:
- Split SPARK_JAVA_OPTS into multiple command-line arguments if it
  contains spaces; this splitting follows quoting rules in bash
- Add the Scala JARs to the classpath if they're not in the CLASSPATH
  variable because the ExecutorRunner is launched with "scala" (this can
  happen when using local-cluster URLs in spark-shell)
2013-06-25 17:17:27 -04:00
Matei Zaharia 7680ce0bd6 Fixed deprecated use of expect in SizeEstimatorSuite 2013-06-25 16:11:44 -04:00
Matei Zaharia 7e0191c6ea Merge remote-tracking branch 'cgrothaus/SPARK-698'
Conflicts:
	run
2013-06-25 15:47:40 -04:00
Matei Zaharia f5e32ed13a Merge pull request #661 from mesos/streaming
Kafka fixes and DStream.count fix for master
2013-06-25 09:16:57 -07:00
Tathagata Das c89af0a7f9 Merge branch 'master' into streaming
Conflicts:
	.gitignore
2013-06-24 23:57:47 -07:00
Tathagata Das 48c7e373c6 Minor formatting fixes 2013-06-24 23:11:04 -07:00
Tathagata Das 1249e9153b Merge pull request #572 from Reinvigorate/sm-block-interval
Adding spark.streaming.blockInterval property
2013-06-24 21:46:33 -07:00
Tathagata Das cfcda95f86 Merge pull request #571 from Reinvigorate/sm-kafka-serializers
Surfacing decoders on KafkaInputDStream
2013-06-24 21:44:50 -07:00
Tathagata Das 575aff6b71 Merge pull request #567 from Reinvigorate/sm-count-fix
Fixing count() in Spark Streaming
2013-06-24 21:35:50 -07:00
James Phillpotts 8955787a59 Twitter API v1 is retired - username/password auth no longer possible 2013-06-24 09:15:17 +01:00
Matei Zaharia 78ffe164b3 Clone the zero value for each key in foldByKey
The old version reused the object within each task, leading to
overwriting of the object when a mutable type is used, which is expected
to be common in fold.

Conflicts:

	core/src/test/scala/spark/ShuffleSuite.scala
2013-06-23 10:26:53 -07:00
Matei Zaharia 0e0f9d3069 Fix search path for REPL class loader to really find added JARs 2013-06-22 17:44:04 -07:00
Matei Zaharia b5df1cd668 ADD_JARS environment variable for spark-shell 2013-06-22 17:14:44 -07:00
Matei Zaharia 3e61beff7b Merge pull request #648 from shivaram/netty-dbg
Shuffle fixes and cleanup
2013-06-22 16:22:47 -07:00
Matei Zaharia d92d3f7938 Fix resolution of example code with Maven builds 2013-06-22 10:24:19 -07:00
Matei Zaharia 1ef5d0d2c9 Merge pull request #644 from shimingfei/joblogger
add Joblogger to Spark (on new Spark code)
2013-06-22 09:35:57 -07:00
Matei Zaharia 7e4b266678 Merge pull request #563 from jey/python-optimization
Optimize PySpark worker invocation
2013-06-22 07:53:18 -07:00
Matei Zaharia b350f34703 Increase memory for tests to prevent a crash on JDK 7 2013-06-22 07:48:20 -07:00
Jey Kottalam c75bed0eeb Fix reporting of PySpark exceptions 2013-06-21 12:14:16 -04:00
Jey Kottalam 1ba3c17303 use parens when calling method with side-effects 2013-06-21 12:14:16 -04:00
Jey Kottalam 7c5ff733ee PySpark daemon: fix deadlock, improve error handling 2013-06-21 12:14:16 -04:00
Jey Kottalam edb18ca928 Rename PythonWorker to PythonWorkerFactory 2013-06-21 12:14:16 -04:00
Jey Kottalam 62c4781400 Add tests and fixes for Python daemon shutdown 2013-06-21 12:14:16 -04:00
Jey Kottalam c79a6078c3 Prefork Python worker processes 2013-06-21 12:14:16 -04:00
Jey Kottalam 40afe0d2a5 Add Python timing instrumentation 2013-06-21 12:14:16 -04:00
James Phillpotts 93a1643405 Allow other twitter authorizations than username/password 2013-06-21 14:21:52 +01:00