ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Patrick Wendell	30b9db0abe	Merge pull request #285 from colorant/yarn-refactor Yarn refactor	2014-01-02 23:15:55 -08:00
Raymond Liu	f442afc22e	fix docs for yarn	2014-01-03 14:14:35 +08:00
Raymond Liu	18b3633e54	minor fix for loginfo	2014-01-03 12:14:38 +08:00
Raymond Liu	c59029402d	move duplicate pom config into parent pom	2014-01-03 12:14:38 +08:00
Raymond Liu	ebdfa6bb97	Using name yarn-alpha/yarn instead of yarn-2.0/yarn-2.2	2014-01-03 12:14:38 +08:00
Raymond Liu	a47ebf7228	Add yarn/common/src/test dir in building script	2014-01-03 12:14:38 +08:00
Raymond Liu	ddc5054b35	Fix yarn/README.md	2014-01-03 12:14:38 +08:00
Raymond Liu	79b6b4ddc2	Clean up unused files for yarn	2014-01-03 12:14:38 +08:00
Raymond Liu	7c96faee74	Fix pom for build yarn/2.x with yarn/common into one jar	2014-01-03 12:14:38 +08:00
Raymond Liu	d1a6f7aabc	Use unmanaged source dir to include common yarn code	2014-01-03 12:14:37 +08:00
Raymond Liu	c5422e02b8	merge yarn/scheduler yarn/common code into one directory	2014-01-03 12:14:37 +08:00
Raymond Liu	ad60710010	Need to send dummy hello message to actually estabilish akka connection.	2014-01-03 12:14:37 +08:00
Raymond Liu	dd6d347f4f	A few clean up for yarn 2.0 code	2014-01-03 12:14:37 +08:00
Raymond Liu	7815a3ace9	Update maven build documentation	2014-01-03 12:12:38 +08:00
Raymond Liu	be343d2a56	Fix yarn/README.md and update docs/running-on-yarn.md	2014-01-03 12:12:38 +08:00
Raymond Liu	67cd752e74	Add README for yarn modules	2014-01-03 12:12:38 +08:00
Raymond Liu	e867e31145	some code clean up for Yarn 2.2	2014-01-03 12:12:37 +08:00
Raymond Liu	8818661721	Fix pom file for scala binary version	2014-01-03 12:12:37 +08:00
Raymond Liu	96e25e567c	Fix yarn/assemble pom file	2014-01-03 12:12:37 +08:00
Raymond Liu	aec96dd108	Change profile name new-yarn to hadoop2.2-yarn	2014-01-03 12:12:37 +08:00
Raymond Liu	d1528c7c8c	Fix pom for yarn code reorgnaize commit	2014-01-03 12:12:37 +08:00
Raymond Liu	3dc379ce5a	Reorganize yarn related codes into sub projects to remove duplicate files.	2014-01-03 12:12:37 +08:00
Patrick Wendell	498a5f0a1c	Merge pull request #323 from tgravescs/sparkconf_yarn_fix fix spark on yarn after the sparkConf changes This fixes it so that spark on yarn now compiles and works after the sparkConf changes. There are also other issues I discovered along the way that are broken: - mvn builds for yarn don't assemble correctly - unset SPARK_EXAMPLES_JAR isn't handled properly anymore - I'm pretty sure spark.conf doesn't actually work as its not distributed with yarn those things can be fixed in separate pr unless others disagree.	2014-01-02 19:06:40 -08:00
Reynold Xin	0475ca8f81	Merge pull request #320 from kayousterhout/erroneous_failed_msg Remove erroneous FAILED state for killed tasks. Currently, when tasks are killed, the Executor first sends a status update for the task with a "KILLED" state, and then sends a second status update with a "FAILED" state saying that the task failed due to an exception. The second FAILED state is misleading/unncessary, and occurs due to a NonLocalReturnControl Exception that gets thrown due to the way we kill tasks. This commit eliminates that problem. I'm not at all sure that this is the best way to fix this problem, so alternate suggestions welcome. @rxin guessing you're the right person to look at this.	2014-01-02 15:17:08 -08:00
Thomas Graves	fced7885cb	fix yarn-client	2014-01-02 17:11:16 -06:00
Thomas Graves	c6de982be6	Fix yarn build after sparkConf changes	2014-01-02 16:50:35 -06:00
Patrick Wendell	588a1695f4	Merge pull request #297 from tdas/window-improvement Improvements to DStream window ops and refactoring of Spark's CheckpointSuite - Added a new RDD - PartitionerAwareUnionRDD. Using this RDD, one can take multiple RDDs partitioned by the same partitioner and unify them into a single RDD while preserving the partitioner. So m RDDs with p partitions each will be unified to a single RDD with p partitions and the same partitioner. The preferred location for each partition of the unified RDD will be the most common preferred location of the corresponding partitions of the parent RDDs. For example, location of partition 0 of the unified RDD will be where most of partition 0 of the parent RDDs are located. - Improved the performance of DStream's reduceByKeyAndWindow and groupByKeyAndWindow. Both these operations work by doing per-batch reduceByKey/groupByKey and then using PartitionerAwareUnionRDD to union the RDDs across the window. This eliminates a shuffle related to the window operation, which can reduce batch processing time by 30-40% for simple workloads. - Fixed bugs and simplified Spark's CheckpointSuite. Some of the tests were incorrect and unreliable. Added missing tests for ZippedRDD. I can go into greater detail if necessary. - Added mapSideCombine option to combineByKeyAndWindow.	2014-01-02 13:20:54 -08:00
Matei Zaharia	7bafb68d77	Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/incubator-spark	2014-01-02 15:57:28 -05:00
Reynold Xin	5e67cdc801	Merge pull request #319 from kayousterhout/remove_error_method Removed redundant TaskSetManager.error() function. This function was leftover from a while ago, and now just passes all calls through to the abort() function, so this commit deletes it.	2014-01-02 12:56:28 -08:00
Matei Zaharia	ca67909cd4	Merge pull request #311 from tmyklebu/master SPARK-991: Report information gleaned from a Python stacktrace in the UI Scala: - Added setCallSite/clearCallSite to SparkContext and JavaSparkContext. These functions mutate a LocalProperty called "externalCallSite." - Add a wrapper, getCallSite, that checks for an externalCallSite and, if none is found, calls the usual Utils.formatSparkCallSite. - Change everything that calls Utils.formatSparkCallSite to call getCallSite instead. Except getCallSite. - Add wrappers to setCallSite/clearCallSite wrappers to JavaSparkContext. Python: - Add a gruesome hack to rdd.py that inspects the traceback and guesses what you want to see in the UI. - Add a RAII wrapper around said gruesome hack that calls setCallSite/clearCallSite as appropriate. - Wire said RAII wrapper up around three calls into the Scala code. I'm not sure that I hit all the spots with the RAII wrapper. I'm also not sure that my gruesome hack does exactly what we want. One could also approach this change by refactoring runJob/submitJob/runApproximateJob to take a call site, then threading that parameter through everything that needs to know it. One might object to the pointless-looking wrappers in JavaSparkContext. Unfortunately, I can't directly access the SparkContext from Python---or, if I can, I don't know how---so I need to wrap everything that matters in JavaSparkContext. Conflicts: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala	2014-01-02 15:54:54 -05:00
Kay Ousterhout	a1b438d94d	Remove erroneous FAILED state for killed tasks. Currently, when tasks are killed, the Executor first sends a status update for the task with a "KILLED" state, and then sends a second status update with a "FAILED" state saying that the task failed due to an exception. The second FAILED state is misleading/unncessary, and occurs due to a NonLocalReturnControl Exception that gets thrown due to the way we kill tasks. This commit eliminates that problem.	2014-01-02 12:34:46 -08:00
Kay Ousterhout	5a3c00c958	Removed redundant TaskSetManager.error() function. This function was leftover from a while ago, and now just passes all calls through to the abort() function, so this commit deletes it.	2014-01-02 11:13:58 -08:00
Patrick Wendell	3713f8129a	Merge pull request #309 from mateiz/conf2 SPARK-544. Migrate configuration to a SparkConf class This is still a work in progress based on Prashant and Evan's code. So far I've done the following: - Got rid of global SparkContext.globalConf - Passed SparkConf to serializers and compression codecs - Made SparkConf public instead of private[spark] - Improved API of SparkContext and SparkConf - Switched executor environment vars to be passed through SparkConf - Fixed some places that were still using system properties - Fixed some tests, though others are still failing This still fails several tests in core, repl and streaming, likely due to properties not being set or cleared correctly (some of the tests run fine in isolation). But the API at least is hopefully ready for review. Unfortunately there was a lot of global stuff before due to a "SparkContext.globalConf" method that let you set a "default" configuration of sorts, which meant I had to make some pretty big changes.	2014-01-01 21:29:12 -08:00
Matei Zaharia	7e8d2e8a5c	Fix Python code after change of getOrElse	2014-01-01 23:21:34 -05:00
Matei Zaharia	0f6060733d	Fixed two uses of conf.get with no default value in Mesos	2014-01-01 22:09:42 -05:00
Matei Zaharia	e2c68642c6	Miscellaneous fixes from code review. Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.	2014-01-01 22:03:39 -05:00
Matei Zaharia	45ff8f413d	Merge remote-tracking branch 'apache/master' into conf2 Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala	2014-01-01 21:25:00 -05:00
Patrick Wendell	c1d928a897	Merge pull request #312 from pwendell/log4j-fix-2 SPARK-1008: Logging improvments 1. Adds a default log4j file that gets loaded if users haven't specified a log4j file. 2. Isolates use of the tools assembly jar. I found this produced SLF4J warnings after building with SBT (and I've seen similar warnings on the mailing list).	2014-01-01 17:03:48 -08:00
Patrick Wendell	f8d245bdfc	Merge remote-tracking branch 'apache-github/master' into log4j-fix-2 Conflicts: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala	2014-01-01 16:10:51 -08:00
Matei Zaharia	0e5b2adb5c	Merge remote-tracking branch 'apache/master' into conf2 Conflicts: project/SparkBuild.scala	2014-01-01 13:28:54 -05:00
Reynold Xin	9a0ff721c9	Merge pull request #314 from witgo/master restore core/pom.xml file modification	2013-12-31 21:50:24 -08:00
liguoqiang	b5d0b3b0f7	restore core/pom.xml file modification	2014-01-01 11:30:08 +08:00
Reynold Xin	8b8e70ebde	Merge pull request #73 from falaki/ApproximateDistinctCount Approximate distinct count Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.	2013-12-31 17:48:24 -08:00
Patrick Wendell	37c43c9dd1	Adding outer checkout when initializing logging	2013-12-31 17:36:56 -08:00
Hossein Falaki	bee445c927	Made the code more compact and readable	2013-12-31 16:58:18 -08:00
Hossein Falaki	acb0323053	minor improvements	2013-12-31 15:34:26 -08:00
Matei Zaharia	42bcfb2bb2	Fix two compile errors introduced in merge	2013-12-31 18:26:23 -05:00
Matei Zaharia	ba9338f104	Merge remote-tracking branch 'apache/master' into conf2 Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala	2013-12-31 18:23:14 -05:00
Patrick Wendell	63b411dd86	Merge pull request #238 from ngbinh/upgradeNetty upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final the changes are listed at https://github.com/netty/netty/wiki/New-and-noteworthy	2013-12-31 14:31:28 -08:00
Patrick Wendell	55b7e2fdff	Merge pull request #289 from tdas/filestream-fix Bug fixes for file input stream and checkpointing - Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.) - Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration. - Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten. - Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.	2013-12-31 10:12:51 -08:00

1 2 3 4 5 ...

5110 commits