ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Prashant Sharma	7be75682b9	Merge branch 'master' into wip-merge-master Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.	2013-10-08 11:29:40 +05:30
Reynold Xin	ea34c52102	Merge pull request #42 from pwendell/shuffle-read-perf Fix inconsistent and incorrect log messages in shuffle read path The user-facing messages generated by the CacheManager are currently wrong and somewhat misleading. This patch makes the messages more accurate. It also uses a consistent representation of the partition being fetched (`rdd_xx_yy`) so that it's easier for users to trace what is going on when reading logs.	2013-10-07 20:45:58 -07:00
Patrick Wendell	8b377718b8	Responses to review	2013-10-07 20:03:35 -07:00
Patrick Wendell	391133f66a	Fix inconsistent and incorrect log messages in shuffle read path	2013-10-07 17:24:18 -07:00
Patrick Wendell	02f37ee853	Merge pull request #39 from pwendell/master Adding Shark 0.7.1 to EC2 scripts This adds a newer version of Shark to the ec2 scripts. I've tested this for both Hadoop1 and Hadoop2 clusters.	2013-10-07 15:48:52 -07:00
Patrick Wendell	3745a1827f	Adding Shark 0.7.1 to EC2 scripts	2013-10-07 15:03:42 -07:00
Reynold Xin	213b70a2db	Merge pull request #31 from sundeepn/branch-0.8 Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies. (cherry picked from commit `023e3fdf00`) Signed-off-by: Reynold Xin <rxin@apache.org>	2013-10-07 10:54:22 -07:00
Patrick Wendell	d585613ee2	Merge pull request #37 from pwendell/merge-0.8 merge in remaining changes from `branch-0.8` This merges in the following changes from `branch-0.8`: - The scala version is included in the published maven artifact names - A unit tests which had non-deterministic failures is ignored (see SPARK-908) - A minor documentation change shows the short version instead of the full version - Moving the kafka jar to be "provided" - Changing the default spark ec2 version. - Some spacing changes caused by Maven's release plugin Note that I've squashed this into a single commit rather than pull in the branch-0.8 history. There are a bunch of release/revert commits there that make the history super ugly.	2013-10-05 22:57:05 -07:00
Patrick Wendell	aa9fb84994	Merging build changes in from 0.8	2013-10-05 22:07:00 -07:00
Matei Zaharia	4a25b116d4	Merge pull request #20 from harveyfeng/hadoop-config-cache Allow users to pass broadcasted Configurations and cache InputFormats across Hadoop file reads. Note: originally from https://github.com/mesos/spark/pull/942 Currently motivated by Shark queries on Hive-partitioned tables, where there's a JobConf broadcast for every Hive-partition (i.e., every subdirectory read). The only thing different about those JobConfs is the input path - the Hadoop Configuration that the JobConfs are constructed from remain the same. This PR only modifies the old Hadoop API RDDs, but similar additions to the new API might reduce computation latencies a little bit for high-frequency FileInputDStreams (which only uses the new API right now). As a small bonus, added InputFormats caching, to avoid reflection calls for every RDD#compute(). Few other notes: Added a general soft-reference hashmap in SparkHadoopUtil because I wanted to avoid adding another class to SparkEnv. SparkContext default hadoopConfiguration isn't cached. There's no equals() method for Configuration, so there isn't a good way to determine when configuration properties have changed.	2013-10-05 19:28:55 -07:00
Harvey Feng	6a2bbec5e3	Some comments regarding JobConf and InputFormat caching for HadoopRDDs.	2013-10-05 17:53:58 -07:00
Reynold Xin	8fc68d04bd	Merge pull request #36 from pwendell/versions Bumping EC2 default version in master to . This change was already made on . This PR ports the change up to master.	2013-10-05 17:24:35 -07:00
Harvey Feng	96929f28bb	Make HadoopRDD object Spark private.	2013-10-05 17:14:19 -07:00
Patrick Wendell	2484b84678	Bumping EC2 default version in master to `0.8.0`.	2013-10-05 16:59:11 -07:00
Harvey Feng	b5e93c1227	Fix API changes; lines > 100 chars.	2013-10-05 16:57:08 -07:00
Matei Zaharia	100222b048	Merge pull request #27 from davidmccauley/master SPARK-920/921 - JSON endpoint updates 920 - Removal of duplicate scheme part of Spark URI, it was appearing as spark://spark//host:port in the JSON field. JSON now delivered as: url:spark://127.0.0.1:7077 921 - Adding the URL of the Main Application UI will allow custom interfaces (that use the JSON output) to redirect from the standalone UI.	2013-10-05 13:38:59 -07:00
Matei Zaharia	08641932bd	Merge pull request #33 from AndreSchumacher/pyspark_partition_key_change Fixing SPARK-602: PythonPartitioner Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.	2013-10-05 13:25:18 -07:00
Prashant Sharma	3e41495288	Fixed tests, changed property akka.remote.netty.x to akka.remote.netty.tcp.x	2013-10-05 16:39:25 +05:30
Prashant Sharma	c810ee0690	Merge branch 'master' into scala-2.10 Conflicts: core/src/test/scala/org/apache/spark/DistributedSuite.scala project/SparkBuild.scala	2013-10-05 15:52:57 +05:30
Andre Schumacher	c84946fe21	Fixing SPARK-602: PythonPartitioner Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.	2013-10-04 11:56:47 -07:00
Matei Zaharia	232765f7b2	Merge pull request #26 from Du-Li/master fixed a wildcard bug in make-distribution.sh; ask sbt to check local maven repo in project/SparkBuild.scala (1) fixed a wildcard bug in make-distribution.sh: with the wildcard * in quotes, this cp command failed. it worked after moving the wildcard out quotes. (2) ask sbt to check local maven repo in SparkBuild.scala: To build Spark (0.9.0-SNAPSHOT) with the HEAD of mesos (0.15.0), I must do "make maven-install" under mesos/build, which publishes the java .jar file under ~/.m2. However, when building Spark (after pointing mesos to version 0.15.0), sbt uses ivy which by default only checks ~/.ivy2. This change is to tell sbt to also check ~/.m2.	2013-10-03 12:00:48 -07:00
Matei Zaharia	405e69bb20	Merge pull request #25 from CruncherBigData/master Update README: updated the link	2013-10-03 10:52:41 -07:00
Matei Zaharia	49dbfccf6b	Merge pull request #28 from tgravescs/sparYarnAppName Allow users to set the application name for Spark on Yarn	2013-10-03 10:52:06 -07:00
tgravescs	c021b8c202	Add default value to usage statement	2013-10-03 08:07:19 -05:00
Matei Zaharia	e597ea34a6	Merge pull request #10 from kayousterhout/results_through-bm Send Task results through the block manager when larger than Akka frame size (fixes SPARK-669). This change requires adding an extra failure mode: tasks can complete successfully, but the result gets lost or flushed from the block manager before it's been fetched. This change also moves the deserialization of tasks into a separate thread, so it's no longer part of the DAG scheduler's tight loop. This should improve scheduler throughput, particularly when tasks are sending back large results. Thanks Josh for writing the original version of this patch! This is duplicated from the mesos/spark repo: https://github.com/mesos/spark/pull/835	2013-10-02 21:14:24 -07:00
tgravescs	bc3b20abdc	Allow users to set the application name for Spark on Yarn	2013-10-02 12:54:17 -05:00
David McCauley	1577b373a9	SPARK-921 - Add Application UI URL to ApplicationInfo Json output	2013-10-02 15:03:41 +01:00
David McCauley	351da54676	SPARK-920 - JSON endpoint URI scheme part (spark://) duplicated	2013-10-02 13:23:38 +01:00
Du Li	9fd6bba60d	ask ivy/sbt to check local maven repo under ~/.m2	2013-10-01 15:46:51 -07:00
Du Li	0d19f00e9e	fixed a bug of using wildcard in quotes	2013-10-01 15:42:06 -07:00
CruncherBigData	c85f720588	Update README	2013-10-01 09:05:03 -07:00
Prashant Sharma	5829692885	Merge branch 'master' into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala	2013-10-01 11:57:24 +05:30
Kay Ousterhout	0dcad2edcb	Added additional unit test for repeated task failures	2013-09-30 23:26:15 -07:00
Kay Ousterhout	dea4677c88	Fixed compilation errors and broken test.	2013-09-30 22:07:01 -07:00
Kay Ousterhout	8deda427bc	Merge remote-tracking branch 'upstream/master' into results_through-bm Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterScheduler.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalTaskSetManager.scala	2013-09-30 10:16:58 -07:00
Kay Ousterhout	58b764b7c6	Addressed Matei's code review comments	2013-09-30 10:11:59 -07:00
Prashant Sharma	9865fd6aa0	Fixed non termination of Executor backend, when sc.stop is not called.	2013-09-30 18:09:12 +05:30
Harvey Feng	7d06bdde1d	Merge HadoopDatasetRDD into HadoopRDD.	2013-09-29 20:08:03 -07:00
Harvey Feng	417085716a	Merge remote-tracking branch 'oldsparkme/hadoopRDD-broadcast-change' into hadoop-config-cache	2013-09-26 15:49:42 -07:00
Reynold Xin	714fdabd99	Merge pull request #17 from rxin/optimize Remove -optimize flag	2013-09-26 14:28:55 -07:00
Reynold Xin	13eced723f	Merge pull request #16 from pwendell/master Bug fix in master build	2013-09-26 14:18:19 -07:00
Reynold Xin	70a0b993d4	Merge pull request #14 from kayousterhout/untangle_scheduler Improved organization of scheduling packages. This commit does not change any code -- only file organization. Please let me know if there was some masterminded strategy behind the existing organization that I failed to understand! There are two components of this change: (1) Moving files out of the cluster package, and down a level to the scheduling package. These files are all used by the local scheduler in addition to the cluster scheduler(s), so should not be in the cluster package. As a result of this change, none of the files in the local package reference files in the cluster package. (2) Moving the mesos package to within the cluster package. The mesos scheduling code is for a cluster, and represents a specific case of cluster scheduling (the Mesos-related classes often subclass cluster scheduling classes). Thus, the most logical place for it seems to be within the cluster package. The one thing about the scheduling code that seems a little funny to me is the naming of the SchedulerBackends. The StandaloneSchedulerBackend is not just for Standalone mode, but instead is used by Mesos coarse grained mode and Yarn, and the backend that is just for Standalone mode is instead called SparkDeploySchedulerBackend. I didn't change this because I wasn't sure if there was a reason for this naming that I'm just not aware of.	2013-09-26 14:11:54 -07:00
Reynold Xin	76677b8fa1	Merge pull request #670 from jey/ec2-ssh-improvements EC2 SSH improvements	2013-09-26 14:03:46 -07:00
Reynold Xin	3f283278b0	Removed scala -optimize flag.	2013-09-26 13:58:10 -07:00
Reynold Xin	c514cd1587	Merge pull request #930 from holdenk/master Add mapPartitionsWithIndex	2013-09-26 13:48:20 -07:00
Patrick Wendell	e2ff59af72	Bug fix in master build	2013-09-26 13:06:51 -07:00
Reynold Xin	560ee5c9bb	Merge pull request #7 from wannabeast/memorystore-fixes some minor fixes to MemoryStore This is a repeat of #5, moved to its own branch in my repo. This makes all updates to on ; it skips on synchronizing the reads where it can get away with it.	2013-09-26 11:27:34 -07:00
Patrick Wendell	6566a19b38	Merge pull request #9 from rxin/limit Smarter take/limit implementation.	2013-09-26 08:01:04 -07:00
Prashant Sharma	42f30b5590	Fixed UISuite, for case when port 4040 is already bound on machine running the test.	2013-09-26 14:38:42 +05:30
Prashant Sharma	604dc40996	Sync with master and some build fixes	2013-09-26 11:40:02 +05:30

1 2 3 4 5 ...

4209 commits