ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Patrick Wendell	7a8169be9a	Merge pull request #268 from pwendell/shaded-protobuf Add support for 2.2. to master (via shaded jars) This patch does a few related things. NOTE: This may not compile correctly for ~24 hours until artifacts fully propagate to Maven Central. 1. Uses shaded versions of akka/protobuf. For more information on how these versions were prepared, see [1]. 2. Brings the `new-yarn` project up-to-date with the changes for Akka 2.2.3. 3. Some clean-up of the build now that we don't have to switch akka groups for different YARN versions. [1] `933a309ef8/shaded-protobuf`	2013-12-16 22:42:21 -08:00
Patrick Wendell	10c0ffa1eb	One other fix	2013-12-16 22:10:55 -08:00
Patrick Wendell	c1c0f8099f	Clean-up	2013-12-16 22:01:27 -08:00
Patrick Wendell	c1fec89895	Cleanup	2013-12-16 21:56:21 -08:00
Patrick Wendell	24f8220dc8	Removing extra code in new yarn	2013-12-16 21:53:51 -08:00
Patrick Wendell	ceb013f8b9	Remove trailing slashes from repository specifications. The correct format is to not have a trailing slash. For me this caused non-deterministic failures due to issues fetching certain artifacts. The issue was that some of the maven caches would fail to fetch the artifact (due to the way that the artifact path was concatenated with the repository) and this short-circuited the download process in a silent way. Here is what the log output looked like: Downloading: http://repo.maven.apache.org/maven2/org/spark-project/akka/akka-remote_2.10/2.2.3-shaded-protobuf/akka-remote_2.10-2.2.3-shaded-protobuf.pom [WARNING] The POM for org.spark-project.akka:akka-remote_2.10🫙2.2.3-shaded-protobuf is missing, no dependency information available This was pretty brutal to debug since there was no error message anywhere and the path looks correct as reported by the Maven log.	2013-12-16 21:53:51 -08:00
Patrick Wendell	c6f95e603e	Attempt with extra repositories	2013-12-16 21:53:51 -08:00
Tor Myklebust	b2f0329511	Missed a spot; had an objectSer here too.	2013-12-17 00:18:46 -05:00
Tor Myklebust	25fa976580	Merge branch 'master' of git://github.com/apache/incubator-spark	2013-12-16 23:48:37 -05:00
Tor Myklebust	963d6f065a	Incorporate pwendell's code review suggestions.	2013-12-16 23:14:52 -05:00
Patrick Wendell	964a3b6971	Merge pull request #270 from ewencp/really-force-ssh-pseudo-tty-master Force pseudo-tty allocation in spark-ec2 script. ssh commands need the -t argument repeated twice if there is no local tty, e.g. if the process running spark-ec2 uses nohup and the parent process exits. Without this change, if you run the script this way (e.g. using nohup from a cron job), it will fail setting up the nodes because some of the ssh commands complain about missing ttys and then fail. (This version is for the master branch. I've filed a separate request for the 0.8 since changes to the script caused the patches to be different.)	2013-12-16 15:23:51 -08:00
Reynold Xin	883e034aeb	Merge pull request #245 from gregakespret/task-maxfailures-fix Fix for spark.task.maxFailures not enforced correctly. Docs at http://spark.incubator.apache.org/docs/latest/configuration.html say: ``` spark.task.maxFailures Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1. ``` Previous implementation worked incorrectly. When for example `spark.task.maxFailures` was set to 1, the job was aborted only after the second task failure, not after the first one.	2013-12-16 14:16:02 -08:00
Tor Myklebust	882d544856	UI to display serialisation time of a stage.	2013-12-16 13:27:03 -05:00
Tor Myklebust	8a397a959b	Track task value serialisation time in TaskMetrics.	2013-12-16 12:07:39 -05:00
Ewen Cheslack-Postava	d17c142615	Force pseudo-tty allocation in spark-ec2 script. ssh commands need the -t argument repeated twice if there is no local tty, e.g. if the process running spark-ec2 uses nohup and the parent process exits.	2013-12-16 08:09:37 -08:00
wangda.tan	8ab8c6a526	Merge branch 'master' of git://github.com/apache/incubator-spark	2013-12-16 21:45:43 +08:00
Patrick Wendell	a51f3404ad	Merge pull request #265 from markhamstra/scala.binary.version DRY out the POMs with scala.binary.version ...instead of hard-coding 2.10 repeatedly. As long as it's not a `<project>`-level `<artifactId>`, I think that we are okay parameterizing these.	2013-12-15 22:02:30 -08:00
Josh Rosen	f8ba89da21	Fix Cygwin support in several scripts. This allows the spark-shell, spark-class, run-example, make-distribution.sh, and ./bin/start-* scripts to work under Cygwin. Note that this doesn't support PySpark under Cygwin, since that requires many additional `cygpath` calls from within Python and will be non-trivial to implement. This PR was inspired by, and subsumes, #253 (so close #253 after this is merged).	2013-12-15 18:51:31 -08:00
Josh Rosen	d2ced6d58c	Merge pull request #256 from MLnick/master Fix 'IPYTHON=1 ./pyspark' throwing ValueError This fixes an annoying issue where running ```IPYTHON=1 ./pyspark``` resulted in: ``` Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 0.8.0 /_/ Using Python version 2.7.5 (default, Jun 20 2013 11:06:30) Spark context avaiable as sc. --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /usr/local/lib/python2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, where) 202 else: 203 filename = fname --> 204 __builtin__.execfile(filename, where) /Users/Nick/workspace/scala/spark-0.8.0-incubating-bin-hadoop1/python/pyspark/shell.py in <module>() 30 add_files = os.environ.get("ADD_FILES").split(',') if os.environ.get("ADD_FILES") != None else None 31 ---> 32 sc = SparkContext(os.environ.get("MASTER", "local"), "PySparkShell", pyFiles=add_files) 33 34 print """Welcome to /Users/Nick/workspace/scala/spark-0.8.0-incubating-bin-hadoop1/python/pyspark/context.pyc in __init__(self, master, jobName, sparkHome, pyFiles, environment, batchSize) 70 with SparkContext._lock: 71 if SparkContext._active_spark_context: ---> 72 raise ValueError("Cannot run multiple SparkContexts at once") 73 else: 74 SparkContext._active_spark_context = self ValueError: Cannot run multiple SparkContexts at once ``` The issue arises since previously IPython didn't seem to respect ```$PYTHONSTARTUP```, but since at least 1.0.0 it has. Technically this might break for older versions of IPython, but most users should be able to upgrade IPython to at least 1.0.0 (and should be encouraged to do so :). New behaviour: ``` Nicks-MacBook-Pro:incubator-spark-mlnick Nick$ IPYTHON=1 ./pyspark Python 2.7.5 (default, Jun 20 2013, 11:06:30) Type "copyright", "credits" or "license" for more information. IPython 1.1.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/Nick/workspace/scala/incubator-spark-mlnick/tools/target/scala-2.9.3/spark-tools-assembly-0.9.0-incubating-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/Nick/workspace/scala/incubator-spark-mlnick/assembly/target/scala-2.9.3/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 13/12/12 13:08:15 WARN Utils: Your hostname, Nicks-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 10.0.0.4 instead (on interface en0) 13/12/12 13:08:15 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 13/12/12 13:08:15 INFO Slf4jEventHandler: Slf4jEventHandler started 13/12/12 13:08:15 INFO SparkEnv: Registering BlockManagerMaster 13/12/12 13:08:15 INFO DiskBlockManager: Created local directory at /var/folders/_l/06wxljt13wqgm7r08jlc44_r0000gn/T/spark-local-20131212130815-0e76 13/12/12 13:08:15 INFO MemoryStore: MemoryStore started with capacity 326.7 MB. 13/12/12 13:08:15 INFO ConnectionManager: Bound socket to port 53732 with id = ConnectionManagerId(10.0.0.4,53732) 13/12/12 13:08:15 INFO BlockManagerMaster: Trying to register BlockManager 13/12/12 13:08:15 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager 10.0.0.4:53732 with 326.7 MB RAM 13/12/12 13:08:15 INFO BlockManagerMaster: Registered BlockManager 13/12/12 13:08:15 INFO HttpBroadcast: Broadcast server started at http://10.0.0.4:53733 13/12/12 13:08:15 INFO SparkEnv: Registering MapOutputTracker 13/12/12 13:08:15 INFO HttpFileServer: HTTP File server directory is /var/folders/_l/06wxljt13wqgm7r08jlc44_r0000gn/T/spark-8f40e897-8211-4628-a7a8-755562d5244c 13/12/12 13:08:16 INFO SparkUI: Started Spark Web UI at http://10.0.0.4:4040 2013-12-12 13:08:16.337 java[56801:4003] Unable to load realm info from SCDynamicStore Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 0.9.0-SNAPSHOT /_/ Using Python version 2.7.5 (default, Jun 20 2013 11:06:30) Spark context avaiable as sc. ```	2013-12-15 14:11:34 -08:00
Reynold Xin	c55e698559	Merge pull request #257 from tgravescs/sparkYarnFixName Fix the --name option for Spark on Yarn Looks like the --name option accidentally got broken in one of the merges. The Client hangs if the --name option is used right now.	2013-12-15 12:49:02 -08:00
Reynold Xin	ab85f88fd7	Merge pull request #264 from shivaram/spark-class-fix Use CoarseGrainedExecutorBackend in spark-class	2013-12-15 12:48:32 -08:00
Mark Hamstra	09ed7ddfa0	Use scala.binary.version in POMs	2013-12-15 12:39:58 -08:00
Shivaram Venkataraman	fc96ca9f62	Use CoarseGrainedExecutorBackend in spark-class	2013-12-15 11:53:44 -08:00
Nick Pentreath	bb5277b10a	Making IPython PySpark compatible across versions <1.0.0. Also cleaned up '-i' option and made IPYTHON_OPTS work	2013-12-15 09:39:45 +02:00
Nick Pentreath	d36ee3b159	Merge remote-tracking branch 'upstream/master'	2013-12-15 08:34:05 +02:00
Reynold Xin	7db9165961	Merge pull request #251 from pwendell/master Fix list rendering in YARN markdown docs. This is some minor clean-up which makes the list render correctly.	2013-12-14 14:16:34 -08:00
Josh Rosen	2fd781d347	Merge pull request #249 from ngbinh/partitionInJavaSortByKey Expose numPartitions parameter in JavaPairRDD.sortByKey() This change makes Java and Scala API on sortByKey() the same.	2013-12-14 12:59:37 -08:00
Patrick Wendell	97ac060182	Merge pull request #259 from pwendell/scala-2.10 Migration to Scala 2.10 == Below description was written by Prashant Sharma == This PR migrates spark to scala 2.10. Summary of changes apart from scala 2.10 migration: (has no implications for user.) 1. Migrated Akka to 2.2.3. Does not use remote death watch for it has a bug, where it tries to send message to dead node infinitely. Uses an indestructible actorsystem which tolerates errors only on executors. (Might be useful for user.) 4. New configuration settings introduced: System.getProperty("spark.akka.heartbeat.pauses", "600") System.getProperty("spark.akka.failure-detector.threshold", "300.0") System.getProperty("spark.akka.heartbeat.interval", "1000") Defaults for these are fairly large to only disable Failure detector that comes with akka. The reason for doing so is we have our own failure detector like mechanism in place and then this is just an overhead on top of that + it leads to a lot of false positives. But with these properties it is possible to enable them. A good use case for enabling it could be when someone wants spark to be sensitive (in a controllable manner ofc.) to GC pauses/Network lags and quickly evict executors that experienced it. More information is included in configuration.md Once we have the SPARK-544 merged, I had like to deprecate atleast these akka properties and may be others too. This PR is duplicate of #221(where all the discussion happened.) for that one pointed to master this one points to scala-2.10 branch.	2013-12-14 00:22:45 -08:00
Patrick Wendell	7ac944fc27	Merge pull request #262 from pwendell/mvn-fix Fix maven build issues in 2.10 branch Found some issues when locally testing maven.	2013-12-13 23:22:08 -08:00
Patrick Wendell	6e8a96c7e7	Fix maven build issues in 2.10 branch	2013-12-13 23:14:08 -08:00
Reynold Xin	6defb061f0	Merge pull request #261 from ScrapCodes/scala-2.10 Added a comment about ActorRef and ActorSelection difference.	2013-12-13 21:18:57 -08:00
Prashant Sharma	1ae3c0fc5e	Added a comment about ActorRef and ActorSelection difference.	2013-12-14 10:44:24 +05:30
Reynold Xin	76566b1fc9	Merge pull request #260 from ScrapCodes/scala-2.10 Review comments on the PR for scala 2.10 migration.	2013-12-13 10:11:02 -08:00
Prashant Sharma	a854cc536d	Review comments on the PR for scala 2.10 migration.	2013-12-13 15:19:51 +05:30
Patrick Wendell	0aeb182b0f	Merge pull request #255 from ScrapCodes/scala-2.10 Disabled yarn 2.2 in sbt and mvn build and added a message in the sbt build.	2013-12-12 21:14:42 -08:00
Tathagata Das	097e120c0c	Refactored streaming scheduler and added listener interface. - Refactored Scheduler + JobManager to JobGenerator + JobScheduler and added JobSet for cleaner code. Moved scheduler related code to streaming.scheduler package. - Added StreamingListener trait (similar to SparkListener) to enable gathering to streaming stats like processing times and delays. StreamingContext.addListener() to added listeners. - Deduped some code in streaming tests by modifying TestSuiteBase, and added StreamingListenerSuite.	2013-12-12 20:48:02 -08:00
Thomas Graves	842eb55fb5	Fix the --name option for Spark on Yarn	2013-12-12 11:11:09 -06:00
Nick Pentreath	8cdfb08c47	Fix 'IPYTHON=1 ./pyspark' throwing 'ValueError: Cannot run multiple SparkContexts at once'	2013-12-12 13:08:59 +02:00
Prashant Sharma	589b83a18f	Disabled yarn 2.2 and added a message in the sbt build	2013-12-12 16:25:30 +05:30
Patrick Wendell	2e89398e44	Merge pull request #254 from ScrapCodes/scala-2.10 Scala 2.10 migration This PR migrates spark to scala 2.10. Summary of changes apart from scala 2.10 migration: (has no implications for user.) 1. Migrated Akka to 2.2.3. Does not use remote death watch for it has a bug, where it tries to send message to dead node infinitely. Uses an indestructible actorsystem which tolerates errors only on executors. (Might be useful for user.) 4. New configuration settings introduced: System.getProperty("spark.akka.heartbeat.pauses", "600") System.getProperty("spark.akka.failure-detector.threshold", "300.0") System.getProperty("spark.akka.heartbeat.interval", "1000") Defaults for these are fairly large to only disable Failure detector that comes with akka. The reason for doing so is we have our own failure detector like mechanism in place and then this is just an overhead on top of that + it leads to a lot of false positives. But with these properties it is possible to enable them. A good use case for enabling it could be when someone wants spark to be sensitive (in a controllable manner ofc.) to GC pauses/Network lags and quickly evict executors that experienced it. More information is included in configuration.md Once we have the SPARK-544 merged, I had like to deprecate atleast these akka properties and may be others too. This PR is duplicate of #221(where all the discussion happened.) for that one pointed to master this one points to scala-2.10 branch.	2013-12-11 23:10:53 -08:00
Prashant Sharma	d3090b79a5	A few corrections to documentation.	2013-12-12 10:12:06 +05:30
Tathagata Das	5e9ce83d68	Fixed multiple file stream and checkpointing bugs. - Made file stream more robust to transient failures. - Changed Spark.setCheckpointDir API to not have the second 'useExisting' parameter. Spark will always create a unique directory for checkpointing underneath the directory provide to the funtion. - Fixed bug wrt local relative paths as checkpoint directory. - Made DStream and RDD checkpointing use SparkContext.hadoopConfiguration, so that more HDFS compatible filesystems are supported for checkpointing.	2013-12-11 14:01:36 -08:00
Prashant Sharma	f4c73df5c9	Merge branch 'akka-bug-fix' of github.com:ScrapCodes/incubator-spark into akka-bug-fix	2013-12-11 10:22:44 +05:30
Prashant Sharma	603af51bb5	Merge branch 'master' into akka-bug-fix Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala	2013-12-11 10:21:53 +05:30
Hossein Falaki	49bf47e1b7	Removed superfluous abs call from test cases.	2013-12-10 19:50:50 -08:00
Prashant Sharma	0b82b5af1e	added eclipse repository for spark streaming.	2013-12-11 08:17:02 +05:30
Patrick Wendell	1291dd4dce	Fix list rendering in YARN markdown docs.	2013-12-10 16:38:33 -08:00
Patrick Wendell	d2efe13574	Merge pull request #250 from pwendell/master README incorrectly suggests build sources spark-env.sh This is misleading because the build doesn't source that file. IMO it's better to force people to specify build environment variables on the command line always, like we do in every example, so I'm just removing this doc.	2013-12-10 13:01:26 -08:00
Patrick Wendell	153cad1293	README incorrectly suggests build sources spark-env.sh This is misleading because the build doesn't source that file. IMO it's better to force people to specify build environment variables on the command line always, like we do in every example.	2013-12-10 12:54:28 -08:00
Binh Nguyen	0b494f7db4	Hook directly to Scala API	2013-12-10 11:17:52 -08:00

... 3 4 5 6 7 ...

5093 commits