ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Patrick Wendell	f8d245bdfc	Merge remote-tracking branch 'apache-github/master' into log4j-fix-2 Conflicts: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala	2014-01-01 16:10:51 -08:00
Tathagata Das	fcd17a1e8e	Fixed comments and long lines based on comments on PR 289.	2013-12-31 02:01:45 -08:00
Patrick Wendell	18181e6c41	Removing initLogging entirely	2013-12-30 23:39:47 -08:00
Tathagata Das	271e3237f3	Minor changes in comments and strings to address comments in PR 289.	2013-12-27 12:26:57 -08:00
Tathagata Das	3618d70b2a	Added warning if filestream adds files with no data in them (file RDDs have 0 partitions).	2013-12-26 12:45:40 -08:00
Tathagata Das	be64719138	Changed file stream to not catch any exceptions related to finding new files (FileNotFound exception is still caught and ignored).	2013-12-26 12:33:12 -08:00
Tathagata Das	bacc65cf28	Removed slack time in file stream and added better handling of exceptions due to failures due FileNotFound exceptions.	2013-12-26 10:18:46 +00:00
Tathagata Das	d4dfab503a	Fixed Python API for sc.setCheckpointDir. Also other fixes based on Reynold's comments on PR 289.	2013-12-24 14:01:13 -08:00
Tathagata Das	0af7f84c8e	Minor formatting fixes.	2013-12-23 17:47:16 -08:00
Tathagata Das	8ca14a1e51	Updated testsuites to work with the slack time of file stream.	2013-12-23 16:27:00 -08:00
Tathagata Das	b31e91f927	Merge branch 'scheduler-update' into filestream-fix	2013-12-23 15:59:15 -08:00
Tathagata Das	6eaa050549	Minor change for PR 277.	2013-12-23 15:55:45 -08:00
Tathagata Das	19d1d58b67	Fixed bug in file stream that prevented some files from being read correctly.	2013-12-23 23:48:43 +00:00
Tathagata Das	f9771690a6	Minor formatting fixes.	2013-12-23 11:32:26 -08:00
Tathagata Das	dc3ee6b612	Added comments to BatchInfo and JobSet, based on Patrick's comment on PR 277.	2013-12-23 11:30:42 -08:00
Tathagata Das	e7b62cbfbf	Updated CheckpointWriter and FileInputDStream to be robust against failed FileSystem objects. Refactored JobGenerator to use actor so that all updating of DStream's metadata is single threaded.	2013-12-22 18:49:36 -08:00
Tathagata Das	d91ec6f8ea	Merge branch 'scheduler-update' into filestream-fix	2013-12-22 15:23:35 -08:00
Tathagata Das	3ddbdbfbc7	Minor updated based on comments on PR 277.	2013-12-20 19:51:37 -08:00
Tathagata Das	984c582487	Merge branch 'scheduler-update' into filestream-fix Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala	2013-12-19 11:20:48 -08:00
Tathagata Das	ec71b445ad	Minor changes.	2013-12-18 23:39:28 -08:00
Tathagata Das	e93b391d75	Merge branch 'apache-master' into scheduler-update Conflicts: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/ForEachDStream.scala	2013-12-18 17:51:14 -08:00
Tathagata Das	b80ec05635	Added StatsReportListener to generate processing time statistics across multiple batches.	2013-12-18 15:35:24 -08:00
Mark Hamstra	09ed7ddfa0	Use scala.binary.version in POMs	2013-12-15 12:39:58 -08:00
Tathagata Das	097e120c0c	Refactored streaming scheduler and added listener interface. - Refactored Scheduler + JobManager to JobGenerator + JobScheduler and added JobSet for cleaner code. Moved scheduler related code to streaming.scheduler package. - Added StreamingListener trait (similar to SparkListener) to enable gathering to streaming stats like processing times and delays. StreamingContext.addListener() to added listeners. - Deduped some code in streaming tests by modifying TestSuiteBase, and added StreamingListenerSuite.	2013-12-12 20:48:02 -08:00
Tathagata Das	5e9ce83d68	Fixed multiple file stream and checkpointing bugs. - Made file stream more robust to transient failures. - Changed Spark.setCheckpointDir API to not have the second 'useExisting' parameter. Spark will always create a unique directory for checkpointing underneath the directory provide to the funtion. - Fixed bug wrt local relative paths as checkpoint directory. - Made DStream and RDD checkpointing use SparkContext.hadoopConfiguration, so that more HDFS compatible filesystems are supported for checkpointing.	2013-12-11 14:01:36 -08:00
Prashant Sharma	603af51bb5	Merge branch 'master' into akka-bug-fix Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala	2013-12-11 10:21:53 +05:30
Prashant Sharma	17db6a9041	Style fixes and addressed review comments at #221	2013-12-10 11:47:16 +05:30
Prashant Sharma	7ad6921ae0	Incorporated Patrick's feedback comment on #211 and made maven build/dep-resolution atleast a bit faster.	2013-12-07 12:45:57 +05:30
Raymond Liu	4738818dd6	Fix pom.xml for maven build	2013-12-03 16:36:05 +08:00
Prashant Sharma	95d8dbce91	Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10-temp Conflicts: core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala	2013-11-21 12:34:46 +05:30
Prashant Sharma	199e9cf02d	Merge branch 'scala210-master' of github.com:colorant/incubator-spark into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/deploy/client/Client.scala core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala	2013-11-21 11:55:48 +05:30
Henry Saputra	10be58f251	Another set of changes to remove unnecessary semicolon (;) from Scala code. Passed the sbt/sbt compile and test	2013-11-19 16:56:23 -08:00
Henry Saputra	9c934b640f	Remove the semicolons at the end of Scala code to make it more pure Scala code. Also remove unused imports as I found them along the way. Remove return statements when returning value in the Scala code. Passing compile and tests.	2013-11-19 10:19:03 -08:00
Aaron Davidson	f629ba95b6	Various merge corrections I've diff'd this patch against my own -- since they were both created independently, this means that two sets of eyes have gone over all the merge conflicts that were created, so I'm feeling significantly more confident in the resulting PR. @rxin has looked at the changes to the repl and is resoundingly confident that they are correct.	2013-11-14 22:13:09 -08:00
Raymond Liu	a60620b76a	Merge branch 'master' into scala-2.10	2013-11-14 12:44:19 +08:00
Raymond Liu	0f2e3c6e31	Merge branch 'master' into scala-2.10	2013-11-13 16:55:11 +08:00
Tathagata Das	7ccbbdacb9	Made block generator thread safe to fix Kafka bug.	2013-11-12 00:10:45 -08:00
Prashant Sharma	6860b79f6e	Remove deprecated actorFor and use actorSelection everywhere.	2013-11-12 12:43:53 +05:30
Tathagata Das	dc9570782a	Merge branch 'apache-master' into transform	2013-10-25 14:22:23 -07:00
Patrick Wendell	af4a529f6e	Exclude jopt from kafka dependency. Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.	2013-10-25 09:20:30 -07:00
Patrick Wendell	ad5f579cbf	Style fixes	2013-10-24 22:18:53 -07:00
Patrick Wendell	e5f6d5697b	Spacing fix	2013-10-24 22:08:06 -07:00
Patrick Wendell	a351fd4aed	Small spacing fix	2013-10-24 21:16:30 -07:00
Patrick Wendell	31e92b72e3	Adding Java versions and associated tests	2013-10-24 21:14:56 -07:00
Patrick Wendell	39f6f75588	Some clean-up of tests	2013-10-24 16:43:33 -07:00
Tathagata Das	e962a6e6ee	Fixed accidental bug.	2013-10-24 15:17:26 -07:00
Patrick Wendell	9423532fab	Removing Java for now	2013-10-24 14:31:34 -07:00
Patrick Wendell	05ac9940ee	Adding tests	2013-10-24 14:31:34 -07:00
Patrick Wendell	08c1a42d7d	Add a `repartition` operator. This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.	2013-10-24 14:31:33 -07:00
Tathagata Das	0400aba1c0	Merge branch 'apache-master' into transform	2013-10-24 11:05:00 -07:00

1 2 3 4 5 ...

420 commits