Fix 'IPYTHON=1 ./pyspark' throwing ValueError
This fixes an annoying issue where running ```IPYTHON=1 ./pyspark``` resulted in:
```
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 0.8.0
/_/
Using Python version 2.7.5 (default, Jun 20 2013 11:06:30)
Spark context avaiable as sc.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
202 else:
203 filename = fname
--> 204 __builtin__.execfile(filename, *where)
/Users/Nick/workspace/scala/spark-0.8.0-incubating-bin-hadoop1/python/pyspark/shell.py in <module>()
30 add_files = os.environ.get("ADD_FILES").split(',') if os.environ.get("ADD_FILES") != None else None
31
---> 32 sc = SparkContext(os.environ.get("MASTER", "local"), "PySparkShell", pyFiles=add_files)
33
34 print """Welcome to
/Users/Nick/workspace/scala/spark-0.8.0-incubating-bin-hadoop1/python/pyspark/context.pyc in __init__(self, master, jobName, sparkHome, pyFiles, environment, batchSize)
70 with SparkContext._lock:
71 if SparkContext._active_spark_context:
---> 72 raise ValueError("Cannot run multiple SparkContexts at once")
73 else:
74 SparkContext._active_spark_context = self
ValueError: Cannot run multiple SparkContexts at once
```
The issue arises since previously IPython didn't seem to respect ```$PYTHONSTARTUP```, but since at least 1.0.0 it has. Technically this might break for older versions of IPython, but most users should be able to upgrade IPython to at least 1.0.0 (and should be encouraged to do so :).
New behaviour:
```
Nicks-MacBook-Pro:incubator-spark-mlnick Nick$ IPYTHON=1 ./pyspark
Python 2.7.5 (default, Jun 20 2013, 11:06:30)
Type "copyright", "credits" or "license" for more information.
IPython 1.1.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/Nick/workspace/scala/incubator-spark-mlnick/tools/target/scala-2.9.3/spark-tools-assembly-0.9.0-incubating-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/Nick/workspace/scala/incubator-spark-mlnick/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
13/12/12 13:08:15 WARN Utils: Your hostname, Nicks-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 10.0.0.4 instead (on interface en0)
13/12/12 13:08:15 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
13/12/12 13:08:15 INFO Slf4jEventHandler: Slf4jEventHandler started
13/12/12 13:08:15 INFO SparkEnv: Registering BlockManagerMaster
13/12/12 13:08:15 INFO DiskBlockManager: Created local directory at /var/folders/_l/06wxljt13wqgm7r08jlc44_r0000gn/T/spark-local-20131212130815-0e76
13/12/12 13:08:15 INFO MemoryStore: MemoryStore started with capacity 326.7 MB.
13/12/12 13:08:15 INFO ConnectionManager: Bound socket to port 53732 with id = ConnectionManagerId(10.0.0.4,53732)
13/12/12 13:08:15 INFO BlockManagerMaster: Trying to register BlockManager
13/12/12 13:08:15 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager 10.0.0.4:53732 with 326.7 MB RAM
13/12/12 13:08:15 INFO BlockManagerMaster: Registered BlockManager
13/12/12 13:08:15 INFO HttpBroadcast: Broadcast server started at http://10.0.0.4:53733
13/12/12 13:08:15 INFO SparkEnv: Registering MapOutputTracker
13/12/12 13:08:15 INFO HttpFileServer: HTTP File server directory is /var/folders/_l/06wxljt13wqgm7r08jlc44_r0000gn/T/spark-8f40e897-8211-4628-a7a8-755562d5244c
13/12/12 13:08:16 INFO SparkUI: Started Spark Web UI at http://10.0.0.4:4040
2013-12-12 13:08:16.337 java[56801:4003] Unable to load realm info from SCDynamicStore
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 0.9.0-SNAPSHOT
/_/
Using Python version 2.7.5 (default, Jun 20 2013 11:06:30)
Spark context avaiable as sc.
```
Fix the --name option for Spark on Yarn
Looks like the --name option accidentally got broken in one of the merges. The Client hangs if the --name option is used right now.
Migration to Scala 2.10
== Below description was written by Prashant Sharma ==
This PR migrates spark to scala 2.10.
Summary of changes apart from scala 2.10 migration:
(has no implications for user.)
1. Migrated Akka to 2.2.3.
Does not use remote death watch for it has a bug, where it tries to send message to dead node infinitely.
Uses an indestructible actorsystem which tolerates errors only on executors.
(Might be useful for user.)
4. New configuration settings introduced:
System.getProperty("spark.akka.heartbeat.pauses", "600")
System.getProperty("spark.akka.failure-detector.threshold", "300.0")
System.getProperty("spark.akka.heartbeat.interval", "1000")
Defaults for these are fairly large to only disable Failure detector that comes with akka. The reason for doing so is we have our own failure detector like mechanism in place and then this is just an overhead on top of that + it leads to a lot of false positives. But with these properties it is possible to enable them. A good use case for enabling it could be when someone wants spark to be sensitive (in a controllable manner ofc.) to GC pauses/Network lags and quickly evict executors that experienced it. More information is included in configuration.md
Once we have the SPARK-544 merged, I had like to deprecate atleast these akka properties and may be others too.
This PR is duplicate of #221(where all the discussion happened.) for that one pointed to master this one points to scala-2.10 branch.
- Refactored Scheduler + JobManager to JobGenerator + JobScheduler and
added JobSet for cleaner code. Moved scheduler related code to
streaming.scheduler package.
- Added StreamingListener trait (similar to SparkListener) to enable
gathering to streaming stats like processing times and delays.
StreamingContext.addListener() to added listeners.
- Deduped some code in streaming tests by modifying TestSuiteBase, and
added StreamingListenerSuite.
Scala 2.10 migration
This PR migrates spark to scala 2.10.
Summary of changes apart from scala 2.10 migration:
(has no implications for user.)
1. Migrated Akka to 2.2.3.
Does not use remote death watch for it has a bug, where it tries to send message to dead node infinitely.
Uses an indestructible actorsystem which tolerates errors only on executors.
(Might be useful for user.)
4. New configuration settings introduced:
System.getProperty("spark.akka.heartbeat.pauses", "600")
System.getProperty("spark.akka.failure-detector.threshold", "300.0")
System.getProperty("spark.akka.heartbeat.interval", "1000")
Defaults for these are fairly large to only disable Failure detector that comes with akka. The reason for doing so is we have our own failure detector like mechanism in place and then this is just an overhead on top of that + it leads to a lot of false positives. But with these properties it is possible to enable them. A good use case for enabling it could be when someone wants spark to be sensitive (in a controllable manner ofc.) to GC pauses/Network lags and quickly evict executors that experienced it. More information is included in configuration.md
Once we have the SPARK-544 merged, I had like to deprecate atleast these akka properties and may be others too.
This PR is duplicate of #221(where all the discussion happened.) for that one pointed to master this one points to scala-2.10 branch.
- Made file stream more robust to transient failures.
- Changed Spark.setCheckpointDir API to not have the second
'useExisting' parameter. Spark will always create a unique directory
for checkpointing underneath the directory provide to the funtion.
- Fixed bug wrt local relative paths as checkpoint directory.
- Made DStream and RDD checkpointing use
SparkContext.hadoopConfiguration, so that more HDFS compatible
filesystems are supported for checkpointing.
README incorrectly suggests build sources spark-env.sh
This is misleading because the build doesn't source that file. IMO
it's better to force people to specify build environment variables
on the command line always, like we do in every example, so I'm
just removing this doc.
This is misleading because the build doesn't source that file. IMO
it's better to force people to specify build environment variables
on the command line always, like we do in every example.