Commit graph

2263 commits

Author SHA1 Message Date
Andrew xia 7d2eada451 Add metrics source of DAGScheduler and blockManager
Conflicts:

	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/SparkEnv.scala
2013-07-24 14:57:47 +08:00
jerryshao e9ac88754d Remove twice add Source bug and code clean 2013-07-24 14:57:47 +08:00
jerryshao e080588f73 Add metrics system unit test 2013-07-24 14:57:47 +08:00
jerryshao 5ce5dc9fcd Add default properties to deal with no configure file situation 2013-07-24 14:57:47 +08:00
jerryshao 871bc1687e Add Executor instrumentation 2013-07-24 14:57:46 +08:00
jerryshao 7fb574bf66 Code clean and remarshal 2013-07-24 14:57:46 +08:00
Andrew xia 4d6dd67fa1 refactor metrics system
1.change source abstract class to support MetricRegistry
2.change master/work/jvm source class
2013-07-24 14:57:46 +08:00
jerryshao 03f9871116 MetricsSystem refactor 2013-07-24 14:57:46 +08:00
jerryshao c3daad3f65 Update metric source support for instrumentation 2013-07-24 14:57:46 +08:00
jerryshao 9dec8c73e6 Add Master and Worker instrumentation support 2013-07-24 14:57:46 +08:00
jerryshao 503acd3a37 Build metrics system framwork 2013-07-24 14:57:46 +08:00
Matei Zaharia b011329040 Merge pull request #727 from rxin/scheduler
Scheduler code style cleanup.
2013-07-23 22:50:09 -07:00
Matei Zaharia 876125b997 Merge pull request #726 from rxin/spark-826
SPARK-829: scheduler shouldn't hang if a task contains unserializable objects in its closure
2013-07-23 22:28:21 -07:00
Reynold Xin 3dae1df66f Moved non-serializable closure catching exception from submitStage to submitMissingTasks 2013-07-23 20:29:07 -07:00
Reynold Xin d33b8a2a0f Added comments on task closure serialization. 2013-07-23 20:28:39 -07:00
Reynold Xin 85ab8114bc Moved non-serializable closure catching exception from submitStage to submitMissingTasks 2013-07-23 20:25:58 -07:00
Matei Zaharia 6a31b7191d Small bug fix 2013-07-23 16:20:24 -07:00
Matei Zaharia 2f1736c396 Merge pull request #725 from karenfeng/task-start
Creates task start events
2013-07-23 15:53:30 -07:00
Karen Feng abc78cd331 Modifies instead of copies HashSets, fixes comment style 2013-07-23 15:47:16 -07:00
Karen Feng 383684daaa Replaces Seq with HashSet, removes redundant import 2013-07-23 15:33:27 -07:00
Reynold Xin f2422d4f29 SPARK-829: scheduler shouldn't hang if a task contains unserializable objects in its closure. 2013-07-23 15:30:20 -07:00
Reynold Xin 5ed38b4d1d Scheduler code style cleanup. 2013-07-23 15:28:59 -07:00
Reynold Xin 101b8cc78a SPARK-829: scheduler shouldn't hang if a task contains unserializable objects in its closure. 2013-07-23 15:28:20 -07:00
Dmitriy Lyubimov 72bac09c42 Leaking spark context in the test 2013-07-23 15:19:07 -07:00
Karen Feng 9f2dbb2a7c Adds/removes active tasks only once 2013-07-23 15:10:09 -07:00
Dmitriy Lyubimov ef82ff8564 Merge branch 'master' into SPARK-826
Conflicts:
	core/src/main/scala/spark/scheduler/local/LocalScheduler.scala
2013-07-23 13:43:00 -07:00
Karen Feng 0200801a55 Tracks task start events and shows number of active tasks on Executor UI 2013-07-23 13:35:43 -07:00
Dmitriy Lyubimov 310e73d566 style 2013-07-23 13:23:25 -07:00
Matei Zaharia f369e0e51b Merge pull request #720 from ooyala/2013-07/persistent-rdds-api
Add a public method getCachedRdds to SparkContext
2013-07-23 13:22:27 -07:00
Dmitriy Lyubimov ac60d06381 Re-working in terms of changes to TaskSetManager. Verified with Standalone and Local mode. 2013-07-23 13:13:19 -07:00
Evan Chan efd6418c1b Move getPersistentRDDs testing to a new Suite 2013-07-23 10:40:41 -07:00
Evan Chan 4830e22562 Rename method per rxin feedback 2013-07-23 09:50:13 -07:00
Evan Chan 2c2bfbe294 Add toMap method to TimeStampedHashMap and use it 2013-07-23 01:36:44 -07:00
Matei Zaharia 401aac8b18 Merge pull request #719 from karenfeng/ui-808
Creates Executors tab for Jobs UI
2013-07-22 16:57:16 -07:00
Karen Feng 872c97ad82 Split task columns, memory columns sort by numeric value 2013-07-22 16:54:37 -07:00
Matei Zaharia ea1cfabfdd Merge branch 'master' of github.com:mesos/spark 2013-07-22 16:22:02 -07:00
Matei Zaharia 8e38e77232 Fix a test that was using an outdated config setting 2013-07-22 16:05:32 -07:00
Karen Feng 2eea974795 Executors UI now calls executor ID from TaskInfo instead of TaskMetrics 2013-07-22 15:15:54 -07:00
Dmitriy Lyubimov 8ca0c31944 removing non-pertinent comment 2013-07-22 14:48:46 -07:00
Dmitriy Lyubimov b4b230e606 Fixing for LocalScheduler with test, that much works .. 2013-07-22 14:42:47 -07:00
Karen Feng 85c4d7bf3b Shows number of complete/total/failed tasks (bug: failed tasks assigned to null executor) 2013-07-22 14:35:47 -07:00
Josh Rosen f649dabb4a Fix bug: DoubleRDDFunctions.sampleStdev() computed non-sample stdev().
Update JavaDoubleRDD to add new methods and docs.

Fixes SPARK-825.
2013-07-22 13:21:48 -07:00
Karen Feng 8901f379c9 Fixed memory used/remaining/total bug 2013-07-22 09:58:03 -07:00
Karen Feng 636b19f833 Merge branch 'master' of https://github.com/mesos/spark into ui-808 2013-07-22 09:53:26 -07:00
Evan Chan 0337d88321 Add a public method getCachedRdds to SparkContext 2013-07-21 18:26:14 -07:00
Karen Feng 865dc63bac Changed table format for executors 2013-07-19 15:57:01 -07:00
Karen Feng 81bb5dc640 Creates Executors tab for application with RDD block and memory/disk used, solves SPARK-808 2013-07-19 14:08:30 -07:00
Konstantin Boudnik cfce9a6a36 Regression: default webui-port can't be set via command line "--webui-port" anymore 2013-07-19 14:00:58 -07:00
Liang-Chi Hsieh 4530e8a9bf fix typo. 2013-07-20 00:04:25 +08:00
Liang-Chi Hsieh aa6f83289b A better fix for giving local jars unde Yarn mode. 2013-07-19 22:25:28 +08:00
Liang-Chi Hsieh a613628c50 Do not copy local jars given to SparkContext in yarn mode since the Context is not running on local. This bug causes failure when jars can not be found. Example codes (such as spark.examples.SparkPi) can not work without this fix under yarn mode. 2013-07-19 16:59:12 +08:00
Matei Zaharia af3c9d5042 Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
Matei Zaharia b1f9f64743 Merge branch 'master' of github.com:mesos/spark 2013-07-16 11:01:53 -07:00
Matei Zaharia 5c388808a8 SPARK-814: Result stages should be named after action 2013-07-16 11:01:14 -07:00
Matei Zaharia f347cc3f65 Fix deprecation warning and style issues 2013-07-16 10:53:30 -07:00
Reynold Xin 69316603d6 Throw a more meaningful message when runJob is called to launch tasks on non-existent partitions. 2013-07-15 22:50:11 -07:00
Karen Feng 6dc7c9bfb1 Removed job UI column, linked description to job UI 2013-07-15 16:33:50 -07:00
Karen Feng fbf5aa761e Removed log message, added field in master UI to link to log UI 2013-07-15 15:50:03 -07:00
Karen Feng eac381a957 Merge branch 'ui-802' of https://github.com/karenfeng/spark into ui-802 2013-07-15 15:48:44 -07:00
Karen Feng 3955711250 Added field to master UI with link to job UI 2013-07-15 15:47:21 -07:00
Karen Feng 0d78b6d9cd Links to job UI from standalone deploy cluster web UI: fixes SPARK-802 2013-07-15 13:47:38 -07:00
Karen Feng b2aaa1199e Adds app name in HTML page titles on job web UI: fixes SPARK-806 2013-07-15 11:44:42 -07:00
Matei Zaharia d47c16f78d Add an option to disable reference tracking in Kryo 2013-07-15 01:55:54 +00:00
Matei Zaharia c7877d5e16 Merge pull request #689 from BlackNiuza/application_status
Bug fix: SPARK-796
2013-07-14 12:58:13 -07:00
Matei Zaharia 10c05937bd Merge pull request #699 from pwendell/ui-env
Add `Environment` tab to SparkUI.
2013-07-14 11:45:18 -07:00
Patrick Wendell 4883586838 Responding to Matei's review 2013-07-14 10:37:26 -07:00
BlackNiuza 00556a94c9 add spaces before curly braces and after for if conditions 2013-07-14 17:04:53 +08:00
Matei Zaharia b91a218cea Cosmetic fixes to web UI 2013-07-14 07:31:33 +00:00
Matei Zaharia a44a7b1238 Determine Spark core classes better in getCallSite 2013-07-14 07:23:09 +00:00
root e271fde10b Fixed a delay scheduling bug in the YARN branch, found by Patrick 2013-07-14 06:24:29 +00:00
Patrick Wendell ddb97f0fdf Add Environment tab to SparkUI.
This adds a tab which displays system property and classpath information. This
can be useful in debugging various types of issues such as:

1. Extra/incorrect Hadoop jars being included in the classpath
2. Spark launching with a different JRE version than intended
3. Spark system properties not being set to intended values
4. User added jars that conflict with Spark jars
2013-07-13 16:14:40 -07:00
Matei Zaharia 77c69ae5a0 Merge pull request #697 from pwendell/block-locations
Show block locations in Web UI.
2013-07-12 23:05:21 -07:00
Matei Zaharia 5a7835c152 Merge pull request #691 from karenfeng/logpaging
Create log pages
2013-07-12 20:28:21 -07:00
Matei Zaharia 71ccca0cc1 Merge pull request #696 from woggle/executor-env
Pass executor env vars (e.g. SPARK_CLASSPATH) to compute-classpath.sh
2013-07-12 20:25:06 -07:00
Matei Zaharia 90fc3f30cd Merge pull request #692 from Reinvigorate/takeOrdered
adding takeOrdered() to RDD
2013-07-12 20:23:36 -07:00
Patrick Wendell 08150f19ab Minor style fix 2013-07-12 19:32:35 -07:00
Patrick Wendell 6855338e14 Show block locations in Web UI.
This fixes SPARK-769. Support is added for enumerating the locations of blocks
in the UI. There is also some minor cleanup in StorageUtils.
2013-07-12 19:30:32 -07:00
Karen Feng 73984b96a8 Removed unit test of nonexistent function Utils.lastNBytes 2013-07-12 14:26:56 -07:00
Charles Reiss 531a7e5574 Pass executor env vars (e.g. SPARK_CLASSPATH) to compute-classpath. 2013-07-12 12:58:25 -07:00
seanm a1662326e9 comment adjustment to takeOrdered 2013-07-12 08:38:19 -07:00
Andrew xia 2080e25006 Enhance job ui in spark ui system with adding pool information 2013-07-12 14:25:18 +08:00
seanm a2c915fba8 giving order to top and making tests more clear 2013-07-11 18:55:00 -07:00
Karen Feng 5c67ca0278 Remove "Bytes" in lieu of String notation 2013-07-11 17:31:59 -07:00
Karen Feng 6d054487bf Replace default buffer value to 100 GB, changed buttons to use String notation, removed default buffer parameter in UI URLs 2013-07-11 17:12:17 -07:00
Karen Feng a32784109d Fixed links for "Back to Master" 2013-07-11 16:57:55 -07:00
Karen Feng ece2388585 Removed logPageLength from logPage 2013-07-11 16:35:56 -07:00
Karen Feng 9ed036ccdb Replaced logPageLength with byteLength to prevent buffer shrink bug 2013-07-11 16:33:53 -07:00
Karen Feng fdc226a14c Clarified start and end byte variable names 2013-07-11 15:36:43 -07:00
Karen Feng 5d5dbc39f6 getByteRange moved to WorkerWebUI, takes converted parameters, returns only start/end offset 2013-07-11 15:22:45 -07:00
Karen Feng 15fd11d657 Removed redundant calls to request by logPage 2013-07-11 15:01:50 -07:00
Karen Feng 11872888ca Created getByteRange function for logs and log pages, removed lastNBytes function 2013-07-11 14:56:37 -07:00
Matei Zaharia 018d04c64e Merge pull request #684 from woggle/mesos-classloader
Explicitly set class loader for MesosSchedulerDriver callbacks.
2013-07-11 12:48:37 -07:00
Karen Feng e3a3fcf61b Scrollbar on log pages appear automatically 2013-07-11 12:16:38 -07:00
Karen Feng 044d4577ec Fixed capitalization of log page 2013-07-11 12:02:15 -07:00
Karen Feng 0ecc33f0c8 Added byte range, page title with log name, previous/next bytes buttons, initialization to end of log, large default buffer, buggy back to master link 2013-07-11 11:25:58 -07:00
Karen Feng 74bd3fc680 Added byte range on log pages 2013-07-10 15:44:28 -07:00
Karen Feng 24196c91f0 Changed buffer to 10,000 bytes, created scrollbar for fixed-height log 2013-07-10 15:27:52 -07:00
Karen Feng f5f3b272f8 Fixed mixup of start/end, moved more import files 2013-07-10 14:52:29 -07:00
Karen Feng dbe948d9a2 Moved appropriate import files from UISuite to UtilsSuite 2013-07-10 14:15:41 -07:00
Karen Feng 5f8a20b4a8 Moved unit tests for Utils from UISuite to UtilsSuite 2013-07-10 13:53:39 -07:00
Karen Feng 0d4580360b Fixed docstring of offsetBytes to match params and wrapped for 100+ character lines 2013-07-10 13:24:26 -07:00
Karen Feng 04263e4d46 Made some minor style changes 2013-07-10 13:15:42 -07:00
Karen Feng cfb6447ac4 Fixed for nonexistent bytes, added unit tests, changed stdout-page to stdout 2013-07-10 11:47:57 -07:00
seanm ee4ce2fc51 adding takeOrdered to java API 2013-07-10 10:46:04 -07:00
seanm 24705d0f46 adding takeOrdered() to RDD 2013-07-10 10:33:11 -07:00
Karen Feng 620a6974c6 Allows for larger files, refactors lastNBytes, removes old Log column, fixes imports, uses map 2013-07-10 10:20:53 -07:00
BlackNiuza ce18b50d5f set SUCCEEDED for all master in shutdown hook 2013-07-10 19:11:43 +08:00
Karen Feng b6072b58bf Fixes style, makes "std__-page" consistent, reads only parts of files 2013-07-09 17:25:10 -07:00
Karen Feng 13fc6f248c Clean commit of log paging 2013-07-09 14:17:15 -07:00
BlackNiuza aaa7b081df according to mridulm's comments to adjust the code 2013-07-09 20:03:01 +08:00
Charles Reiss e47253e0cc Reset ClassLoader in MesosSchedulerBackend, too. (per review comments).
Also set ClassLoader for all mesos callbacks, not just statusUpdate,
registered.
2013-07-09 01:23:23 -07:00
BlackNiuza c1d44be805 Bug fix: SPARK-796 2013-07-09 15:18:28 +08:00
Matei Zaharia 7dcda9ae74 Merge pull request #688 from markhamstra/scalaDependencies
Fixed SPARK-795 with explicit dependencies
2013-07-08 23:24:23 -07:00
Mark Hamstra 0b39d66f3f pom cleanup 2013-07-08 16:07:09 -07:00
Mark Hamstra afdaf430bd Explicit dependencies for scala-library and scalap to prevent 2.9.2 vs. 2.9.3 problems 2013-07-08 15:40:50 -07:00
Charles Reiss 8c1d1c98e0 Explicitly set class loader for MesosSchedulerDriver callbacks. 2013-07-08 12:25:46 -07:00
Shivaram Venkataraman 4af0d63cb1 Remove akka LogLevel fix as we no longer use spray 2013-07-07 10:42:43 -07:00
Shivaram Venkataraman d362d0f411 Ignore stderr when calling cat on a non-existing file 2013-07-07 04:09:46 -07:00
Shivaram Venkataraman 7d6d9e6ab2 Set DriverSuite log level to WARN 2013-07-07 04:09:15 -07:00
Shivaram Venkataraman a948f06725 Suppress log messages in sbt test with two changes:
1. Set akka log level to ERROR before shutting down the actorSystem.
This avoids akka log messages (like Spray) from falling back to INFO
on the Stdout logger
2. Initialize netty to use SLF4J in LocalSparkContext. This ensures that
stack trace thrown during shutdown is handled by SLF4J instead of stdout
2013-07-07 04:09:08 -07:00
Patrick Wendell 32b9d21a97 Fix occasional failure in UI listener.
If a task fails before the metrics are initialized, it remains possible
that the metrics field will be `None`. This patch accounts for that possbility
by keeping metrics as an `Option` at all times.
2013-07-06 16:40:02 -07:00
Matei Zaharia 1ffadb2d9e Merge remote-tracking branch 'pwendell/ui-updates'
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
	core/src/main/scala/spark/util/AkkaUtils.scala
	pom.xml
2013-07-06 15:51:41 -07:00
Matei Zaharia 94871e4703 Merge pull request #655 from tgravescs/master
Add support for running Spark on Yarn on a secure Hadoop Cluster
2013-07-06 15:26:19 -07:00
Matei Zaharia 3f918b33f8 Merge pull request #672 from holdenk/master
s/ActorSystemImpl/ExtendedActorSystem/ as ActorSystemImpl results in a warning
2013-07-06 12:45:18 -07:00
Matei Zaharia 2a36e5449b Merge pull request #673 from xiajunluan/master
Add config template file for fair scheduler feature
2013-07-06 12:43:21 -07:00
Matei Zaharia 7ba7fa110b Merge pull request #674 from liancheng/master
Bug fix: SPARK-789
2013-07-06 11:45:08 -07:00
BlackNiuza 44a2440039 Remove active job from idToActiveJob when job finished or aborted 2013-07-07 01:33:09 +08:00
Patrick Wendell 37abe84212 Tracking some task metrics even during failures. 2013-07-06 09:19:59 -07:00
Patrick Wendell 84b7fc54e6 Enforcing correct sort order for formatted strings 2013-07-05 17:21:08 -07:00
Matei Zaharia 399bd65ef5 Fixed compile error due to merge 2013-07-05 11:27:06 -07:00
Matei Zaharia 652ea0f1d8 Allow RDD.takeSample to give samples bigger than the RDD
Before, when withReplacement was set to true, we would not get a sample
bigger than the RDD's count().

Conflicts:
	core/src/main/scala/spark/RDD.scala
	core/src/test/scala/spark/RDDSuite.scala
2013-07-05 11:15:13 -07:00
Matei Zaharia 6586c5e28b Added a SparkContext accessor to RDD 2013-07-05 11:13:46 -07:00
jerryshao e4ff544a8d Clean StageToInfos periodically when spark.cleaner.ttl is enabled 2013-07-05 10:34:45 +08:00
Lian Cheng c0c3155c3c Bug fix: SPARK-789
https://spark-project.atlassian.net/browse/SPARK-789
2013-07-05 00:54:10 +08:00
Andrew xia 6ccfb73ca9 Add fair scheduler config template file 2013-07-04 19:19:44 +08:00
Holden Karau 0f06d6217d s/ActorSystemImpl/ExtendedActorSystem/ as ActorSystemImpl results in a warning 2013-07-04 01:05:39 -07:00
Gavin Li 94238aae57 fix dependencies 2013-07-03 18:08:38 +00:00
Gavin Li 96130c30d9 add compression codec trait and snappy compression 2013-07-03 05:49:04 +00:00
Y.CORP.YAHOO.COM\tgraves 923cf92900 Rework from pull request. Removed --user option from Spark on Yarn Client, made the user of JAVA_HOME environment
variable conditional on if its set, and created addCredentials in each of the SparkHadoopUtil classes
to only add the credentials when the profile is hadoop2-yarn.
2013-07-02 21:18:59 -05:00
Patrick Wendell 39e2325675 Removing dead code 2013-07-02 16:28:40 -07:00
Patrick Wendell 8ca1cc1786 Adding truncation for log files 2013-07-02 16:10:50 -07:00
Patrick Wendell 9a42d04efa Throw exception for missing resource 2013-07-01 14:43:13 -07:00
Patrick Wendell 1025d7d1ef Package refactoring 2013-07-01 14:40:53 -07:00
Patrick Wendell 30b9034241 Fixing bug where logs aren't shown 2013-07-01 13:48:01 -07:00
Patrick Wendell 8688689387 Various formatting changes 2013-07-01 13:40:12 -07:00
Patrick Wendell 735c951a09 Adding test script 2013-07-01 09:33:22 -07:00
Patrick Wendell 5de326db7d Print exception message 2013-07-01 09:19:45 -07:00
root ec31e68d5d Fixed PySpark perf regression by not using socket.makefile(), and improved
debuggability by letting "print" statements show up in the executor's stderr

Conflicts:
	core/src/main/scala/spark/api/python/PythonRDD.scala
2013-07-01 06:26:31 +00:00
root 3296d132b6 Fix performance bug with new Python code not using buffered streams 2013-07-01 06:25:43 +00:00
Matei Zaharia 03d0b858c8 Made use of spark.executor.memory setting consistent and documented it
Conflicts:

	core/src/main/scala/spark/SparkContext.scala
2013-06-30 15:46:46 -07:00
Patrick Wendell e721ff7e5a Allowing details for failed stages 2013-06-29 11:26:30 -07:00
Patrick Wendell 473961d82e Styling for progress bar 2013-06-29 08:38:04 -07:00
Patrick Wendell 249f0e54ba Minor changes from Matei's review 2013-06-28 13:25:26 -07:00
Matei Zaharia 50ca17635a Merge pull request #664 from pwendell/test-fix
Removing incorrect test statement
2013-06-27 22:24:52 -07:00
Patrick Wendell c537e869f3 Missing logo file 2013-06-27 22:02:03 -07:00
Patrick Wendell c767e74370 Removing incorrect test statement 2013-06-27 21:48:58 -07:00
Patrick Wendell 62c2c6b856 Forcing Jetty to run as daemon 2013-06-27 21:47:22 -07:00
Patrick Wendell a55190d314 Adding better tabs for UI headers. 2013-06-27 19:14:51 -07:00
Patrick Wendell 362d996c81 Handful of changes based on matei's review
- Avoid exception when no tasks have finished for a stage
- Adding DOCTYPE so css renders properly
- Adding progress slider
2013-06-27 19:14:28 -07:00
Patrick Wendell 92a4c2a5f6 Fixing bug in local scheduler time recording 2013-06-27 12:33:06 -07:00
Stephen Haberman d7011632d1 Wrap lines. 2013-06-26 12:35:57 -05:00
Patrick Wendell ee692482a6 One more private class 2013-06-26 09:07:32 -07:00
Patrick Wendell a59c15a37e Adding config option for retained stages 2013-06-26 08:54:57 -07:00
Patrick Wendell 274193664a Bumping timeouts 2013-06-26 08:51:28 -07:00
Patrick Wendell b14ad509ba Moving static ui package 2013-06-26 08:46:51 -07:00
Patrick Wendell 2cbaa0734b Making all new classes package private 2013-06-26 08:44:55 -07:00
Stephen Haberman d11025dc6a Be cute with Option and getenv. 2013-06-26 09:53:35 -05:00
Matei Zaharia 9f0d913295 Refactored tests to share SparkContexts in some of them
Creating these seems to take a while and clutters the output with Akka
stuff, so it would be nice to share them.
2013-06-25 19:18:30 -04:00
Matei Zaharia 6c8d1b2ca6 Fix computation of classpath when we launch java directly
The previous version assumed that a CLASSPATH environment variable was
set by the "run" script when launching the process that starts the
ExecutorRunner, but unfortunately this is not true in tests. Instead, we
factor the classpath calculation into an extenral script and call that.

NOTE: This includes a Windows version but hasn't yet been tested there.
2013-06-25 18:21:00 -04:00
Matei Zaharia 15b00914c5 Some fixes to the launch-java-directly change:
- Split SPARK_JAVA_OPTS into multiple command-line arguments if it
  contains spaces; this splitting follows quoting rules in bash
- Add the Scala JARs to the classpath if they're not in the CLASSPATH
  variable because the ExecutorRunner is launched with "scala" (this can
  happen when using local-cluster URLs in spark-shell)
2013-06-25 17:17:27 -04:00
Matei Zaharia 7680ce0bd6 Fixed deprecated use of expect in SizeEstimatorSuite 2013-06-25 16:11:44 -04:00
Matei Zaharia 7e0191c6ea Merge remote-tracking branch 'cgrothaus/SPARK-698'
Conflicts:
	run
2013-06-25 15:47:40 -04:00
Patrick Wendell d66bd6f885 Adding another unit test to Web UI suite 2013-06-24 17:12:55 -07:00
Patrick Wendell f7389330c3 Allowing for requested port on construction 2013-06-24 16:51:52 -07:00
Patrick Wendell 42157027f2 A few bug fixes and a unit test 2013-06-24 16:25:05 -07:00
Patrick Wendell a4248138b4 Minor style cleanup 2013-06-24 14:22:28 -07:00
Patrick Wendell b5e6e8bcc8 Cleaning up some code for Job Progress 2013-06-24 14:13:24 -07:00
Patrick Wendell 93e8ed85aa Work around for initalization issue 2013-06-24 13:11:18 -07:00
Patrick Wendell f6e64b5cd6 Updating based on changes to JobLogger (and one small change to JobLogger) 2013-06-24 12:40:41 -07:00
Matei Zaharia 78ffe164b3 Clone the zero value for each key in foldByKey
The old version reused the object within each task, leading to
overwriting of the object when a mutable type is used, which is expected
to be common in fold.

Conflicts:

	core/src/test/scala/spark/ShuffleSuite.scala
2013-06-23 10:26:53 -07:00
Matei Zaharia 0e0f9d3069 Fix search path for REPL class loader to really find added JARs 2013-06-22 17:44:04 -07:00
Matei Zaharia 3e61beff7b Merge pull request #648 from shivaram/netty-dbg
Shuffle fixes and cleanup
2013-06-22 16:22:47 -07:00
Patrick Wendell 7e9f1ed0de Some cleanup of styling 2013-06-22 10:31:37 -07:00
Patrick Wendell 3b7ebdeeb8 Handling entirely failed stages 2013-06-22 10:31:37 -07:00
Patrick Wendell be6107ce44 Some tweaking with shared page header 2013-06-22 10:31:37 -07:00
Patrick Wendell 9a24d1a2d0 Using scala in XML imports 2013-06-22 10:31:37 -07:00
Patrick Wendell f91e1c4822 Linking RDD information when available in stages 2013-06-22 10:31:37 -07:00
Patrick Wendell a86bb459e2 Showing shuffle status and purging old stages 2013-06-22 10:31:37 -07:00
Patrick Wendell 3485e73376 Style cleanup 2013-06-22 10:31:37 -07:00
Patrick Wendell dd696f3a3d Some renaming and comments 2013-06-22 10:31:37 -07:00
Patrick Wendell 5c872e9ef5 Documentation and some refactoring 2013-06-22 10:31:37 -07:00
Patrick Wendell 17776323a6 More work on percentile data: 2013-06-22 10:31:37 -07:00
Patrick Wendell dcf6a68177 Refactoring into different modules 2013-06-22 10:31:36 -07:00
Patrick Wendell ce81c320ac Adding helper function to make listing tables 2013-06-22 10:31:36 -07:00
Patrick Wendell 9fd5dc3ea9 Initial steps towards job progress UI 2013-06-22 10:31:36 -07:00
Patrick Wendell bc4a811c57 Stash 2013-06-22 10:31:36 -07:00
Patrick Wendell 77c53f7868 Refactoring UI packages 2013-06-22 10:31:36 -07:00
Patrick Wendell 8b5c7e71c4 Import cleanup 2013-06-22 10:31:36 -07:00
Patrick Wendell 32a45d01b1 Removing twirl files 2013-06-22 10:31:36 -07:00
Patrick Wendell 17f145f3bc Updating Maven build 2013-06-22 10:31:36 -07:00
Patrick Wendell 4e1f202481 Removing dead code 2013-06-22 10:31:36 -07:00
Patrick Wendell d6fde4ffe4 Some JSON cleanup 2013-06-22 10:31:36 -07:00
Patrick Wendell 91ec5a1a04 Changing JSON protocol and removing spray code 2013-06-22 10:31:36 -07:00
Patrick Wendell fc94576ece Adding worker version of UI 2013-06-22 10:31:36 -07:00
Patrick Wendell ee73c09ac9 Some comments 2013-06-22 10:31:36 -07:00
Patrick Wendell 9161db5478 Cleaning up master web UI 2013-06-22 10:31:36 -07:00
Patrick Wendell e55cf0245f Adding WebUI file 2013-06-22 10:31:35 -07:00
Patrick Wendell f85fd7a793 Commenting unfinished part 2013-06-22 10:31:35 -07:00
Patrick Wendell 2c36a514aa Spray refactoring for master web UI 2013-06-22 10:31:35 -07:00
Patrick Wendell 7e6977b6c5 Fix in storage status page 2013-06-22 10:31:35 -07:00
Patrick Wendell 950f83535a Adding deterministic port 2013-06-22 10:31:35 -07:00
Patrick Wendell 7cd70dc2c1 Minor cleanup 2013-06-22 10:31:35 -07:00
Patrick Wendell e66f570194 Completely hacked version of block manager UI in jetty 2013-06-22 10:31:35 -07:00
Patrick Wendell 60fbf7e461 Partially working checkpoint 2013-06-22 10:31:35 -07:00
Matei Zaharia 1ef5d0d2c9 Merge pull request #644 from shimingfei/joblogger
add Joblogger to Spark (on new Spark code)
2013-06-22 09:35:57 -07:00
Jey Kottalam 1ba3c17303 use parens when calling method with side-effects 2013-06-21 12:14:16 -04:00
Jey Kottalam edb18ca928 Rename PythonWorker to PythonWorkerFactory 2013-06-21 12:14:16 -04:00
Jey Kottalam 62c4781400 Add tests and fixes for Python daemon shutdown 2013-06-21 12:14:16 -04:00
Jey Kottalam c79a6078c3 Prefork Python worker processes 2013-06-21 12:14:16 -04:00
Jey Kottalam 40afe0d2a5 Add Python timing instrumentation 2013-06-21 12:14:16 -04:00
Mingfei 2fc794a6c7 small modify in DAGScheduler 2013-06-21 18:21:35 +08:00
Mingfei 4b9862ac9c small format modification 2013-06-21 17:55:32 +08:00
Mingfei aa7aa587be some format modification 2013-06-21 17:48:41 +08:00
Mingfei 5240795154 edit according to comments 2013-06-21 17:38:23 +08:00
Matei Zaharia 71030ba3eb Merge pull request #654 from lyogavin/enhance_pipe
fix typo and coding style in #638
2013-06-19 15:21:03 -07:00
Thomas Graves bad51c7cb4 upmerge with latest mesos/spark master and fix hbase compile with hadoop2-yarn profile 2013-06-19 14:39:13 -05:00
Thomas Graves 75d78c7ac9 Add support for Spark on Yarn on a secure Hadoop cluster 2013-06-19 11:18:42 -05:00
Matei Zaharia 7902baddc7 Update ASM to version 4.0 2013-06-19 13:34:30 +02:00
Gavin Li 0a2a9bce1e fix typo and coding style 2013-06-18 21:30:13 +00:00
jerryshao 1e9269c3ee reduce ZippedPartitionsRDD's getPreferredLocations complexity 2013-06-18 09:49:06 +08:00
Matei Zaharia db42451a52 Merge pull request #643 from adatao/master
Bug fix: Zero-length partitions result in NaN for overall mean & variance
2013-06-17 15:26:36 -07:00
Matei Zaharia e82a2ffcc9 Merge pull request #653 from rxin/logging
SPARK-781: Log the temp directory path when Spark says "Failed to create temp directory."
2013-06-17 15:13:15 -07:00
Matei Zaharia ec193c7d89 Merge remote-tracking branch 'xiajunluan/xiajunluan'
Conflicts:
	core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala
2013-06-18 00:11:50 +02:00
Reynold Xin be3c406edf Fixed the typo pointed out by Matei. 2013-06-17 17:07:51 -04:00
Reynold Xin 1450296797 SPARK-781: Log the temp directory path when Spark says "Failed to create
temp directory".
2013-06-17 16:58:23 -04:00
Gavin Li 4508089fc3 refine comments and add sc.clean 2013-06-17 05:23:46 +00:00
Gavin Li e6ae049283 Merge remote-tracking branch 'upstream1/master' into enhance_pipe 2013-06-16 22:53:39 +00:00
Gavin Li fb6d733fa8 update according to comments 2013-06-16 22:32:55 +00:00
Christopher Nguyen f91195cc15 Import just scala.math.abs rather than scala.math._ 2013-06-16 01:29:53 -07:00
Christopher Nguyen 5c886194e4 Move zero-length partition testing from JavaAPISuite.java to PartitioningSuite.scala 2013-06-16 01:23:48 -07:00
Christopher Nguyen 479442a9b9 Add zeroLengthPartitions() test to make sure, e.g., StatCounter.scala can handle empty partitions without incorrectly returning NaN 2013-06-15 17:35:55 -07:00
Matei Zaharia f961aac8b2 Merge pull request #649 from ryanlecompte/master
Add top K method to RDD using a bounded priority queue
2013-06-15 00:53:41 -07:00
ryanlecompte e8801d4490 use delegation for BoundedPriorityQueue, add Java API 2013-06-14 23:39:05 -07:00
Andrew xia 53add598f2 Update LocalSchedulerSuite to avoid using sleep for task launch 2013-06-15 01:46:13 +08:00
Reynold Xin 2cc188fd54 SPARK-774: cogroup should also disable map side combine by default 2013-06-14 00:10:54 -07:00
Reynold Xin 6738178d0d SPARK-772: groupByKey should disable map side combine. 2013-06-13 23:59:42 -07:00
ryanlecompte 93b3f5e535 drop unneeded ClassManifest implicit 2013-06-13 16:26:35 -07:00
ryanlecompte 44b8dbaede use Iterator.single(elem) instead of Iterator(elem) for improved performance based on scaladocs 2013-06-13 16:23:15 -07:00
Shivaram Venkataraman 1d9f0df065 Fix some comments and style 2013-06-13 14:46:25 -07:00
Mingfei 967a6a699d modify sparklister function interface according to comments 2013-06-13 14:36:07 +08:00
Shivaram Venkataraman 5da4287b1d Merge branch 'netty-dbg' of github.com:shivaram/spark into netty-dbg 2013-06-12 16:38:37 -07:00
Shivaram Venkataraman 5e9a9317c5 Merge branch 'master' of git://github.com/mesos/spark into netty-dbg 2013-06-12 16:38:01 -07:00
ryanlecompte db5bca08ff add a new top K method to RDD using a bounded priority queue 2013-06-12 10:54:16 -07:00
Patrick Wendell fd6148c8b2 Removing print statement 2013-06-10 10:27:25 -07:00
Andrew xia 190ec61799 change code style and debug info 2013-06-10 15:27:02 +08:00
Patrick Wendell ef14dc2e77 Adding Java-API version of compression codec 2013-06-09 18:09:46 -07:00
Patrick Wendell df592192e7 Monads FTW 2013-06-09 18:09:24 -07:00
Patrick Wendell 083a3485ab Clean extra whitespace 2013-06-09 11:49:33 -07:00
Patrick Wendell d1bbcebae5 Adding compression to Hadoop save functions 2013-06-09 11:39:35 -07:00
Mingfei ade822011d not check return value of eventQueue.take 2013-06-08 16:26:45 +08:00
Mingfei 4fd86e0e10 delete test code for joblogger in SparkContext 2013-06-08 15:45:47 +08:00
Mingfei 362f0f93ac Merge branch 'master' of https://github.com/mesos/spark 2013-06-08 15:20:13 +08:00
Mingfei 1a4d93c025 modify to pass job annotation by localProperties and use daeamon thread to do joblogger's work 2013-06-08 14:23:39 +08:00
Matei Zaharia b58a29295b Small formatting and style fixes 2013-06-07 22:51:28 -07:00
Matei Zaharia c8fc423bc2 Merge pull request #631 from jerryshao/master
Fix block manager UI display issue when enable spark.cleaner.ttl
2013-06-07 22:43:18 -07:00
Matei Zaharia c9ca0a4a58 Small code style fix to SchedulingAlgorithm.scala 2013-06-07 22:40:44 -07:00
Matei Zaharia 1ae60bcb36 Merge pull request #634 from xiajunluan/master
[Spark-753] Fix ClusterSchedulSuite unit test failed
2013-06-07 22:39:06 -07:00
Shivaram Venkataraman ac480fd977 Clean up variables and counters in BlockFetcherIterator 2013-06-06 16:34:27 -07:00
Gavin Li e179ff8a32 update according to comments 2013-06-05 22:41:05 +00:00
Shivaram Venkataraman cb2f5046ee Pass in bufferSize to BufferedOutputStream 2013-06-05 15:09:02 -07:00
Shivaram Venkataraman c851957fe4 Don't write zero block files with java serializer 2013-06-05 14:28:38 -07:00
Christopher Nguyen 9d35904357 In the current code, when both partitions happen to have zero-length, the return mean will be NaN.
Consequently, the result of mean after reducing over all partitions will also be NaN,
which is not correct if there are partitions with non-zero length. This patch fixes this issue.
2013-06-04 22:12:47 -07:00
Matei Zaharia fff3728552 Merge pull request #640 from pwendell/timeout-update
Fixing bug in BlockManager timeout
2013-06-04 16:09:50 -07:00
Patrick Wendell 061fd3ae36 Fixing bug in BlockManager timeout 2013-06-04 19:02:44 -04:00
Matei Zaharia f420d4f228 Merge pull request #639 from pwendell/timeout-update
Bump akka and blockmanager timeouts to 60 seconds
2013-06-04 15:25:58 -07:00
Patrick Wendell 8bd4e12104 Bump akka and blockmanager timeouts to 60 seconds 2013-06-04 18:14:24 -04:00
Shivaram Venkataraman 96943a1cc0 var to val 2013-06-03 12:29:38 -07:00
Shivaram Venkataraman cd347f547a Reuse the file object as it is valid after delete 2013-06-03 12:27:51 -07:00
Shivaram Venkataraman a058b0acf3 Delete a file for a block if it already exists. 2013-06-03 12:10:00 -07:00
Andrew xia 606bb1b450 Fix schedulingAlgorithm bugs for unit test 2013-06-03 10:29:23 +08:00
Gavin Li 4a9913d66a add ut for pipe enhancement 2013-06-02 23:21:09 +00:00
Shivaram Venkataraman 038cfc1a9a Make connect timeout configurable 2013-05-31 23:32:18 -07:00
Shivaram Venkataraman 91aca92249 Another round of Netty fixes.
1. Avoid race condition between stop and copier completion
2. Handle socket exceptions by reporting them and filling in a failed
FetchResult
2013-05-31 23:21:38 -07:00
Gavin Li 9f84315c05 enhance pipe to support what we can do in hadoop streaming 2013-06-01 00:26:10 +00:00
Reynold Xin de1167bf2c Incorporated Charles' feedback to put rdd metadata removal in
BlockManagerMasterActor.
2013-05-31 15:54:57 -07:00
Reynold Xin ba5e544461 More block manager cleanup.
Implemented a removeRdd method in BlockManager, and use that to
implement RDD.unpersist. Previously, unpersist needs to send B akka
messages, where B = number of blocks. Now unpersist only needs to send W
akka messages, where W = the number of workers.
2013-05-31 01:48:16 -07:00
jerryshao 926f41cc52 fix block manager UI display issue when enable spark.cleaner.ttl 2013-05-31 09:32:52 +08:00
Reynold Xin f6ad3781b1 Fixed the flaky unpersist test in RDDSuite. 2013-05-30 16:28:08 -07:00
Reynold Xin bed1b08169 Do not create symlink for local add file. Instead, copy the file.
This prevents Spark from changing the original file's permission, and
also allow add file to work on non-posix operating systems.
2013-05-30 16:21:49 -07:00
Shivaram Venkataraman 3b0cd17343 Merge branch 'master' of git://github.com/mesos/spark
Conflicts:
	core/src/test/scala/spark/ShuffleSuite.scala
2013-05-30 14:36:24 -07:00
Andrew xia c3db3ea554 1. Add unit test for local scheduler
2. Move localTaskSetManager to a new file
2013-05-30 20:49:40 +08:00
Andrew xia ecceb101d3 implement FIFO and fair scheduler for spark local mode 2013-05-30 10:43:01 +08:00
Shivaram Venkataraman 19fd6d54c0 Also flush serializer in revertPartialWrites 2013-05-29 17:29:34 -07:00
Shivaram Venkataraman 618c8cae1e Skip fetching zero-sized blocks in OIO.
Also unify splitLocalRemoteBlocks for netty/nio and add a test case
2013-05-29 13:18:54 -07:00
Matei Zaharia 6ed71390d9 Merge pull request #626 from stephenh/remove-add-if-no-port
Remove unused addIfNoPort.
2013-05-29 10:14:22 -07:00
Shivaram Venkataraman b79b10a6d6 Flush serializer to fix zero-size kryo blocks bug.
Also convert the local-cluster test case to check for non-zero block sizes
2013-05-29 00:52:55 -07:00
Matei Zaharia 41d230ccb0 Merge pull request #611 from squito/classloader
Use default classloaders for akka & deserializing task results
2013-05-28 23:35:24 -07:00
Shivaram Venkataraman fbc1ab3468 Couple of Netty fixes
a. Fix the port number by reading it from the bound channel
b. Fix the shutdown sequence to make sure we actually block on the channel
c. Fix the unit test to use two JVMs.
2013-05-28 16:27:16 -07:00
Stephen Haberman 4fe1fbdd51 Remove unused addIfNoPort. 2013-05-28 16:26:32 -05:00
Matei Zaharia 3db1e17baa Merge pull request #620 from jerryshao/master
Fix CheckpointRDD java.io.FileNotFoundException when calling getPreferredLocations
2013-05-27 21:31:43 -07:00
Matei Zaharia e8d4b6c296 Merge pull request #529 from xiajunluan/master
[SPARK-663]Implement Fair Scheduler in Spark Cluster Scheduler
2013-05-25 21:09:03 -07:00
Reynold Xin 6bbbe01287 Fixed a stupid mistake that NonJavaSerializableClass was made Java
serializable.
2013-05-24 16:51:45 -07:00
Reynold Xin 26962c9340 Automatically configure Netty port. This makes unit tests using
local-cluster pass. Previously they were failing because Netty was
trying to bind to the same port for all processes.

Pair programmed with @shivaram.
2013-05-24 16:39:33 -07:00
Reynold Xin 6ea085169d Fixed the bug that shuffle serializer is ignored by the new shuffle
block iterators for local blocks. Also added a unit test for that.
2013-05-24 14:08:37 -07:00
jerryshao bd3ea8f2a6 fix CheckpointRDD getPreferredLocations java.io.FileNotFoundException 2013-05-24 14:26:19 +08:00
Matei Zaharia a2b0a7975c Merge pull request #619 from woggling/adjust-sampling
Use ARRAY_SAMPLE_SIZE constant instead of hard-coded 100.0 in SizeEstimator
2013-05-21 18:16:20 -07:00
Charles Reiss f350f14084 Use ARRAY_SAMPLE_SIZE constant instead of 100.0 2013-05-21 18:11:33 -07:00
Charles Reiss 786c97b87c DistributedSuite: remove dead test code 2013-05-21 11:35:49 -07:00
Andrew xia ecd6d75c6a fix bug of unit tests 2013-05-21 06:49:23 +08:00
Reynold Xin 5912cc4967 Merge pull request #610 from JoshRosen/spark-747
Throw exception if TaskResult exceeds Akka frame size
2013-05-17 19:58:40 -07:00
Reynold Xin 8d78c5f89f Changed the logging level from info to warning when addJar(null) is
called.
2013-05-17 18:51:35 -07:00
Reynold Xin 6729c2ead8 Merge branch 'master' of github.com:mesos/spark 2013-05-17 17:58:06 -07:00
Andrew xia 3d4672eaa9 Merge branch 'master' into xiajunluan
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/scheduler/cluster/ClusterScheduler.scala
	core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala
2013-05-18 07:28:03 +08:00
Andrew xia d19753b9c7 expose TaskSetManager type to resourceOffer function in ClusterScheduler 2013-05-18 06:45:19 +08:00
Reynold Xin 61cf176238 Added dependency on netty-all in Maven. 2013-05-16 14:31:26 -07:00
Andrew xia c6e2770bfe Fix ClusterScheduler bug to avoid allocating tasks to same slave 2013-05-17 05:10:38 +08:00
Mridul Muralidharan f0881f8d48 Hope this does not turn into a bike shed change 2013-05-17 01:58:50 +05:30
Mridul Muralidharan feddd2530d Filter out nulls - prevent NPE 2013-05-16 17:49:14 +05:30
Josh Rosen b8e46b6074 Abort job if result exceeds Akka frame size; add test. 2013-05-16 01:57:57 -07:00
Matei Zaharia 2f576aba8f Merge pull request #602 from rxin/shufflemerge
Manual merge & cleanup of Shane's Shuffle Performance Optimization
2013-05-15 18:06:24 -07:00
Reynold Xin 203d7b7c14 Merge pull request #593 from squito/driver_ui_link
Master UI has link to Application UI
2013-05-15 00:47:20 -07:00
Reynold Xin f3491cb89b Merge branch 'master' of github.com:mesos/spark into shufflemerge
Conflicts:
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/test/scala/spark/DistributedSuite.scala
	project/SparkBuild.scala
2013-05-15 00:31:52 -07:00
Reynold Xin f9d40a5848 Added a comment in JdbcRDD for example usage. 2013-05-14 23:29:57 -07:00
Reynold Xin 404f9ff617 Added derby dependency to Maven pom files for the JDBC Java test. 2013-05-14 23:28:34 -07:00
Reynold Xin 81ad2fa331 Merge branch 'jdbc' of github.com:koeninger/spark
Conflicts:
	project/SparkBuild.scala
2013-05-14 23:12:00 -07:00
Imran Rashid 38d4b97c6d use threads classloader when deserializing task results; classnotfoundexception includes classloader 2013-05-14 22:32:14 -07:00
Imran Rashid d7d1da79d3 when akka starts, use akkas default classloader (current thread) 2013-05-14 22:32:09 -07:00
Cody Koeninger b16c4896f6 add test for JdbcRDD using embedded derby, per rxin suggestion 2013-05-14 23:44:04 -05:00
Matei Zaharia 016ac86830 Merge pull request #601 from rxin/emptyrdd-master
EmptyRDD (master branch 0.8)
2013-05-13 21:45:36 -07:00
Matei Zaharia 4b354e0a08 Merge pull request #589 from mridulm/master
Add support for instance local scheduling
2013-05-13 17:39:19 -07:00
Patrick Wendell 7f0833647b Capturing class name 2013-05-12 07:54:03 -07:00
Patrick Wendell 72b9c4cb6e Small fix 2013-05-11 23:53:50 -07:00
Patrick Wendell 1c15b85051 Removing import 2013-05-11 23:52:53 -07:00
Patrick Wendell 059ab88754 Changing technique to use same code path in all cases 2013-05-11 23:50:54 -07:00
Cody Koeninger 3da2305ed0 code cleanup per rxin comments 2013-05-11 23:59:07 -05:00
Josh Rosen 440719109e Throw exception if task result exceeds Akka frame size.
This partially addresses SPARK-747.
2013-05-11 19:17:13 -07:00
Patrick Wendell a5c28bb888 Removing unnecessary map 2013-05-11 14:20:39 -07:00
Patrick Wendell 0345954530 SPARK-738: Spark should detect and squash nonserializable exceptions 2013-05-11 14:17:09 -07:00
Mark Hamstra 6e6b3e0d7e Actually use the cleaned closure in foreachPartition 2013-05-10 13:02:34 -07:00
Mridul Muralidharan b05c9d22d7 Remove explicit hardcoding of yarn-standalone as args(0) if it is missing. 2013-05-09 18:49:12 +05:30
Imran Rashid 0ab818d508 fix linebreak 2013-05-09 00:38:59 -07:00
Reynold Xin 9cafacf32d Added test for Netty suite. 2013-05-07 22:42:37 -07:00
Reynold Xin 5d70ee4663 Cleaned up connection manager (moved many classes to their own files). 2013-05-07 22:42:15 -07:00
Reynold Xin 8388e8dd7a Minor style fix in DiskStore... 2013-05-07 18:40:35 -07:00
Reynold Xin 547dcbe494 Cleaned up Scala files in network/netty from Shane's PR. 2013-05-07 18:39:33 -07:00
Reynold Xin 9e64396ca4 Cleaned up the Java files from Shane's PR. 2013-05-07 18:30:54 -07:00
Reynold Xin 0e5cc30868 Cleaned up BlockManager and BlockFetcherIterator from Shane's PR. 2013-05-07 18:18:24 -07:00
Reynold Xin 8b79485171 Moved BlockFetcherIterator to its own file. 2013-05-07 17:02:32 -07:00
Reynold Xin 90577ada69 Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge
Conflicts:
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/DiskStore.scala
	project/SparkBuild.scala
2013-05-07 15:56:19 -07:00
Jey Kottalam aacca1b8a8 Update Maven build to Scala 2.9.3 2013-05-07 14:39:44 -07:00
Reynold Xin 64d4d2b036 Added tests for joins, cogroups, and unions for EmptyRDD. 2013-05-06 16:30:46 -07:00
Reynold Xin 0fd84965f6 Added EmptyRDD. 2013-05-06 15:40:34 -07:00
Imran Rashid 22a5063ae4 switch from separating appUI host & port to combining into just appUiUrl 2013-05-05 12:19:11 -07:00
Matei Zaharia 7af92f248b Merge pull request #597 from JoshRosen/webui-fixes
Two minor bug fixes for Spark Web UI
2013-05-04 22:29:17 -07:00
Reynold Xin 0a2bed356b Fixed flaky unpersist test in DistributedSuite. 2013-05-04 21:50:08 -07:00
Reynold Xin 62a077cd08 Merge branch 'unpersist-test' of github.com:shivaram/spark into blockmanager 2013-05-04 21:49:50 -07:00
Josh Rosen 42b1953c53 Fix SPARK-630: app details page shows finished executors as running. 2013-05-04 18:34:47 -07:00
Josh Rosen c0688451a6 Fix wrong closing tags in web UI HTML. 2013-05-04 18:34:46 -07:00
Josh Rosen d48e9fde01 Fix SPARK-629: weird number of cores in job details page. 2013-05-04 18:34:45 -07:00
Mridul Muralidharan 25198d7e9e Merge branch 'master' of github.com:mridulm/spark 2013-05-04 20:45:56 +05:30
Mridul Muralidharan 5b011d18d7 Merge from master 2013-05-04 20:41:27 +05:30
Mridul Muralidharan edb57c8331 Add support for instance local in getPreferredLocations of ZippedPartitionsBaseRDD. Add comments to both ZippedPartitionsBaseRDD and ZippedRDD to better describe the potential problem with the approach 2013-05-04 19:47:45 +05:30
Matei Zaharia 3bf2c868c3 Merge pull request #594 from shivaram/master
Add zip partitions to Java API
2013-05-03 18:27:30 -07:00
Shivaram Venkataraman 2274ad0786 Fix flaky test by changing catch and adding sleep 2013-05-03 16:35:35 -07:00
Shivaram Venkataraman bb8a434f9d Add zipPartitions to Java API. 2013-05-03 15:14:02 -07:00
Imran Rashid 6fae936088 applications (aka drivers) send their webUI address to master when registering so it can be displayed in the master web ui 2013-05-03 12:59:10 -07:00
Mridul Muralidharan ea2a6f91d3 pull from master 2013-05-04 00:35:59 +05:30
Reynold Xin 93091f6936 Merge branch 'master' of github.com:mesos/spark into blockmanager 2013-05-03 01:02:32 -07:00
Reynold Xin 2bc895a829 Updated according to Matei's code review comment. 2013-05-03 01:02:16 -07:00
Mridul Muralidharan 11589c39d9 Fix ZippedRDD as part Matei's suggestion 2013-05-03 12:23:30 +05:30
Matei Zaharia 6fe9d4e61e Merge pull request #592 from woggling/localdir-fix
Don't accept generated local directory names that can't be created
2013-05-02 21:33:56 -07:00
Matei Zaharia 538ee755b4 Merge pull request #581 from jerryshao/master
fix [SPARK-740] block manage UI throws exception when enabling Spark Streaming
2013-05-02 09:01:42 -07:00
Charles Reiss c847dd3da2 Don't accept generated temp directory names that can't be created successfully. 2013-05-01 23:19:10 -07:00
Reynold Xin 4a31877408 Added the unpersist api to JavaRDD. 2013-05-01 20:31:54 -07:00
Reynold Xin 98df9d2853 Added removeRdd function in BlockManager. 2013-05-01 20:17:09 -07:00
Mridul Muralidharan dfde9ce9dd comment out debug versions of checkHost, etc from Utils - which were used to test 2013-05-02 07:41:33 +05:30
Mridul Muralidharan 1b5aaeadc7 Integrate review comments 2 2013-05-02 07:30:06 +05:30
jerryshao c047f0e3ad filter out Spark streaming block RDD and sort RDDInfo with id 2013-05-02 09:48:32 +08:00
Mridul Muralidharan 609a817f52 Integrate review comments on pull request 2013-05-02 06:44:33 +05:30
Reynold Xin 204eb32e14 Changed the type of the persistentRdds hashmap back to
TimeStampedHashMap.
2013-05-01 16:14:58 -07:00
Reynold Xin 34637b97ec Added SparkContext.cleanup back. Not sure why it was removed before ... 2013-05-01 16:12:37 -07:00
Reynold Xin 3227ec8edd Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist.
Also updated unit tests to make sure they are properly testing for
concurrency.
2013-05-01 16:07:44 -07:00
harshars 8481562731 Merged Ram's commit on removing RDDs.
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
2013-05-01 14:42:17 -07:00
Mridul Muralidharan 27764a00f4 Fix some npe introduced accidentally 2013-05-01 20:56:05 +05:30
Mridul Muralidharan d960e7e0f8 a) Add support for hyper local scheduling - specific to a host + port - before trying host local scheduling.
b) Add some fixes to test code to ensure it passes (and fixes some other issues).

c) Fix bug in task scheduling which incorrectly used availableCores instead of all cores on the node.
2013-05-01 20:24:00 +05:30
Matei Zaharia aa8fe1a209 Merge pull request #586 from mridulm/master
Pull request to address issues Reynold Xin reported
2013-04-30 22:30:18 -07:00
Reynold Xin dd7bef3147 Two minor fixes according to Ryan LeCompte's review. 2013-04-30 15:02:32 -07:00
Reynold Xin cea6174573 Merge branch 'master' of github.com:mesos/spark into blockmanager
Conflicts:
	core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
2013-04-30 13:28:35 -07:00
Mridul Muralidharan 60cabb35cb Add addition catch block for exception too 2013-05-01 01:17:14 +05:30
Mridul Muralidharan 3b748ced22 Be more aggressive and defensive in all uses of SelectionKey in select loop 2013-05-01 00:30:30 +05:30
Mridul Muralidharan 0f45477be1 Change indentation 2013-05-01 00:10:02 +05:30
Mridul Muralidharan 538614acfe Be more aggressive and defensive in select also 2013-05-01 00:05:32 +05:30
Mridul Muralidharan 48854e1dbf If key is not valid, close connection 2013-04-30 23:59:33 +05:30
Matei Zaharia f708dda81e Merge pull request #585 from pwendell/listener-perf
[Fix SPARK-742] Task Metrics should not employ per-record timing by default
2013-04-30 07:51:40 -07:00
Mridul Muralidharan e46d547ccd Fix issues reported by Reynold 2013-04-30 16:15:56 +05:30
Reynold Xin 1055785a83 Allow specifying the shuffle write file buffer size. The default buffer
size is 8KB in FastBufferedOutputStream, which is too small and would
cause a lot of disk seeks.
2013-04-29 23:33:56 -07:00
Reynold Xin 7007201201 Added a shuffle block manager so it is easier in the future to
consolidate shuffle output files.
2013-04-29 23:07:03 -07:00
Reynold Xin d3586ef438 Merge branch 'blockmanager' of github.com:rxin/spark into blockmanager
Conflicts:
	core/src/main/scala/spark/storage/DiskStore.scala
2013-04-29 15:44:18 -07:00
Patrick Wendell 016ce1fa9c Using full package name for util 2013-04-29 12:02:27 -07:00
Patrick Wendell 540be6b154 Modified version of the fix which just removes all per-record tracking. 2013-04-29 11:32:07 -07:00
Patrick Wendell 224fbac061 Spark-742: TaskMetrics should not employ per-record timing.
This patch does three things:

1. Makes TimedIterator a trait with two implementations (one a no-op)
2. Makes the default behavior to use the no-op implementation
3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like
   the triat doesn't really reduce complexity in any way.

In the future we can add other implementations, e.g. ones which perform sampling.
2013-04-29 11:13:43 -07:00
Matei Zaharia 0f45347c7b More unit test fixes 2013-04-28 22:29:27 -07:00
Matei Zaharia bce4089f22 Fix BlockManagerSuite to deal with clearing spark.hostPort 2013-04-28 22:23:48 -07:00
Matei Zaharia 68c07ea198 Merge pull request #582 from shivaram/master
Add zip partitions interface
2013-04-28 20:19:33 -07:00
Shivaram Venkataraman 604d3bf56c Rename partition class and add scala doc 2013-04-28 16:31:07 -07:00
Shivaram Venkataraman 15acd49f07 Actually rename classes to ZippedPartitions*
(the previous commit only renamed the file)
2013-04-28 16:03:22 -07:00
Shivaram Venkataraman 6e84635ab9 Rename classes from MapZipped* to Zipped* 2013-04-28 15:58:40 -07:00
Mridul Muralidharan afee902443 Attempt to fix streaming test failures after yarn branch merge 2013-04-28 22:26:45 +05:30
Shivaram Venkataraman 0cc6642b7c Rename to zipPartitions and style changes 2013-04-28 05:11:03 -07:00
Shivaram Venkataraman c9c4954d99 Add an interface to zip iterators of multiple RDDs
The current code supports 2, 3 or 4 arguments but can be extended
to more arguments if required.
2013-04-26 16:57:46 -07:00
Matei Zaharia 6e6b5204ea Create an empty directory when checkpointing a 0-partition RDD (fixes a
test failure on Hadoop 2.0)
2013-04-25 00:42:37 -07:00
Reynold Xin ba6ffa6a5f Allow the specification of a shuffle serializer in the read path (for
local block reads).
2013-04-24 17:38:07 -07:00
Reynold Xin aa618ed2a2 Allow changing the serializer on a per shuffle basis. 2013-04-24 14:52:49 -07:00
Mridul Muralidharan dd515ca3ee Attempt at fixing merge conflict 2013-04-24 09:24:17 +05:30
Reynold Xin 31ce6c66d6 Added a BlockObjectWriter interface in block manager so ShuffleMapTask
doesn't need to build up an array buffer for each shuffle bucket.
2013-04-23 17:48:59 -07:00
Mridul Muralidharan 8faf5c51c3 Patch from Thomas Graves to improve the YARN Client, and move to more production ready hadoop yarn branch 2013-04-24 02:31:57 +05:30
koeninger dfac0aa5c2 prevent mysql driver from pulling entire resultset into memory. explicitly close resultset and statement. 2013-04-22 21:12:52 -05:00
Mridul Muralidharan 7acab3ab45 Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo 2013-04-22 08:01:13 +05:30
koeninger b2a3f24dde first attempt at an RDD to pull data from JDBC sources 2013-04-21 00:29:37 -05:00
Mridul Muralidharan ac2e8e8720 Add some basic documentation 2013-04-19 00:13:19 +05:30
Andrew xia 8436bd5d4a remove TaskSetQueueManager and update code style 2013-04-19 02:17:22 +08:00
Andrew xia e0603d7e8b refactor the Schedulable interface and add unit test for SchedulingAlgorithm 2013-04-18 13:13:54 +08:00
Mridul Muralidharan 5ee2f5c483 Cache pattern, add (commented out) alternatives for check* apis 2013-04-17 23:13:34 +05:30
Mridul Muralidharan f07961060d Add a small note on spark.tasks.schedule.aggression 2013-04-17 23:13:02 +05:30
Mridul Muralidharan 02dffd2eb0 Ensure all ask/await block for spark.akka.askTimeout - so that it is controllable : instead of arbitrary timeouts spread across codebase. In our tests, we use 30 seconds, though default of 10 is maintained 2013-04-17 05:52:57 +05:30
Mridul Muralidharan a402b23bcd Fudge order of classpath - so that our jars take precedence over what is in CLASSPATH variable. Sounds logical, hope there is no issue cos of it 2013-04-17 05:52:00 +05:30
Mridul Muralidharan bcdde331c3 Move from master to driver 2013-04-17 04:12:18 +05:30
Mridul Muralidharan ad80f68eb5 remove spurious debug statements 2013-04-16 22:15:34 +05:30
Mridul Muralidharan f7969f72ee Fix exception when checkpoint path does not exist (no data in rdd which is being checkpointed for example) 2013-04-16 21:51:38 +05:30
Mridul Muralidharan 323ab8ff3b Scala does not prevent variable shadowing ! Sick error due to it ... 2013-04-16 17:05:10 +05:30
shane-huang b493f55a4f fix a bug in netty Block Fetcher
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-16 10:01:01 +08:00
Mridul Muralidharan 59c380d69a Fix npe 2013-04-16 03:29:38 +05:30
Mridul Muralidharan dd2b64ec97 Fix bug with atomic update 2013-04-16 03:19:24 +05:30
Mridul Muralidharan 5540ab8243 Use hostname instead of hostport for executor, fix creation of workdir 2013-04-16 02:57:43 +05:30
Mridul Muralidharan eb7e95e833 Commit job to persist files 2013-04-16 02:56:36 +05:30
Matei Zaharia a64c107449 Make ShuffledRDD.prev transient 2013-04-15 16:41:51 -04:00
Mridul Muralidharan 19652a44be Fix issue with FileSuite failing 2013-04-15 19:16:36 +05:30
Mridul Muralidharan 54b3d45b81 Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues 2013-04-15 18:26:50 +05:30
Mridul Muralidharan d90d2af103 Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues 2013-04-15 18:12:11 +05:30
Matei Zaharia c35d530bcf Fix compile error 2013-04-13 12:43:12 -04:00
Andrew Ash 29d3440efb Add details when BlockManager heartbeats time out
Makes it more clear what the threshold was for tuning spark.storage.blockManagerSlaveTimeoutMs

Before:
WARN  "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats

After:
WARN  "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats: 19216ms exceeds 15000ms
2013-04-11 01:54:02 -03:00
Andrew xia 2f883c515f Contiue to update codes for scala code style
1.refactor braces for "class" "if" "while" "for" "match"
2.make code lines less than 100
3.refactor class parameter and extends defination
2013-04-09 13:02:50 +08:00
Matei Zaharia 65caa8f711 Merge remote-tracking branch 'jey/bump-development-version-to-0.8.0'
Conflicts:
	docs/_config.yml
	project/SparkBuild.scala
2013-04-08 12:43:17 -04:00
Matei Zaharia 054feb6448 Fixed a bug with zip 2013-04-07 21:15:21 -04:00
Matei Zaharia b5900d47b1 Fix compile warning 2013-04-07 20:55:42 -04:00
Matei Zaharia 6962d40b44 Fix deprecated warning 2013-04-07 20:27:33 -04:00
Mridul Muralidharan 6798a09df8 Add support for building against hadoop2-yarn : adding new maven profile for it 2013-04-07 17:47:38 +05:30
shane-huang df47b40b76 Shuffle Performance fix: Use netty embeded OIO file server instead of ConnectionManager
Shuffle Performance Optimization: do not send 0-byte block requests to reduce network messages
change reference from io.Source to scala.io.Source to avoid looking into io.netty package

Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-07 14:37:12 +08:00
Andrew xia 2b373dd07a add properties default value null to fix sbt/sbt test errors 2013-04-02 12:11:14 +08:00
Mark Hamstra e215f67923 Correct sense of 'filter out' in comment. 2013-03-31 08:00:13 -07:00
Mark Hamstra 8bcdc64005 Fixed broken filter in getWritableClass[T] 2013-03-30 22:09:52 -07:00
Matei Zaharia 9831bc1a09 Merge pull request #539 from cgrothaus/fix-webui-workdirpath
Bugfix: WorkerWebUI must respect workDirPath from Worker
2013-03-29 22:16:22 -07:00
Matei Zaharia 3cc8ab6e29 Merge pull request #541 from stephenh/shufflecoalesce
Add a shuffle parameter to coalesce.
2013-03-29 22:14:07 -07:00
Andrew xia 1a28f92711 change some typo and some spacing 2013-03-29 08:34:28 +08:00
Andrew xia def3d1c84a 1.remove redundant spacing in source code
2.replace get/set functions with val and var defination
2013-03-29 08:20:35 +08:00
Jey Kottalam bc8ba222ff Bump development version to 0.8.0 2013-03-28 15:42:01 -07:00
Holden Karau f5df729b12 Explicitly catch all throwables (warning in 2.10) 2013-03-24 16:15:32 -07:00
Stephen Haberman dd854d5b9f Use Boolean in the Java API, and != for assert. 2013-03-23 11:49:45 -05:00
Stephen Haberman 4ca273edc4 Merge branch 'master' into shufflecoalesce
Conflicts:
	core/src/test/scala/spark/RDDSuite.scala
2013-03-23 11:45:45 -05:00
Matei Zaharia b8949cab88 Merge pull request #505 from stephenh/volatile
Make Executor fields volatile since they're read from the thread pool.
2013-03-23 07:19:34 -07:00
Matei Zaharia fd53f2fc7b Merge pull request #510 from markhamstra/WithThing
mapWith, flatMapWith and filterWith
2013-03-23 07:13:21 -07:00
Andrew xia d1d9bdaabe Just update typo and comments 2013-03-23 07:25:30 +08:00
Stephen Haberman 00170eb0b9 Fix are/our typo. 2013-03-22 12:59:08 -05:00
Stephen Haberman 1c67c7dfd1 Add a shuffle parameter to coalesce.
This is useful for when you want just 1 output file (part-00000) but
still up the upstream RDD to be computed in parallel.
2013-03-22 08:54:44 -05:00
Christoph Grothaus 445f387ef4 Bugfix: WorkerWebUI must respect workDirPath from Worker 2013-03-22 11:08:40 +01:00
Matei Zaharia 35588490cb Merge pull request #538 from rxin/cogroup
Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.
2013-03-20 19:27:47 -07:00
Stephen Haberman 4f4215311a Merge branch 'master' into volatile 2013-03-20 15:37:10 -05:00
Matei Zaharia b812e6b7bb Merge pull request #526 from markhamstra/foldByKey
Add foldByKey
2013-03-20 11:21:02 -07:00
Reynold Xin d48ee7e55e Merge branch 'master' of github.com:mesos/spark into cogroup 2013-03-20 14:00:28 +08:00
Reynold Xin 00a11304fd Added mapSideCombine flag to CoGroupedRDD. Added unit test for
CoGroupedRDD.
2013-03-20 13:49:51 +08:00
Matei Zaharia 945d1e720e Merge pull request #536 from sasurfer/master
CoalescedRDD for many partitions
2013-03-19 21:59:06 -07:00
Matei Zaharia 1cbbe94ac1 Merge pull request #534 from stephenh/removetrycatch
Remove try/catch block that can't be hit.
2013-03-19 21:34:34 -07:00
Andrey Kouznetsov bd167f83b0 call setConf from input format if it is Configurable 2013-03-19 17:15:15 +04:00
Giovanni Delussu aceae029f7 CoalescedRDD changed to work with a big number of partitions both in the original and the new coalesced RDD.
The limitation was in the range that Scala.Int can represent.
2013-03-19 11:25:45 +01:00
Stephen Haberman fb34967815 Remove try/catch block that can't be hit. 2013-03-18 01:55:50 -05:00
Mark Hamstra ab33e27cc9 constructorOfA -> constructA in doc comments 2013-03-16 15:29:15 -07:00
Mark Hamstra 9784fc1fcd fix wayward comma in doc comment 2013-03-16 15:25:02 -07:00
Mark Hamstra 32979b5e7d whitespace 2013-03-16 13:36:46 -07:00
Mark Hamstra ca9f81e8fc refactor foldByKey to use combineByKey 2013-03-16 13:31:01 -07:00
Mark Hamstra 1fb192ef40 Merge branch 'master' of https://github.com/mesos/spark into foldByKey 2013-03-16 12:17:13 -07:00
Mark Hamstra 80fc8c82ed _With[Matei] 2013-03-16 12:16:29 -07:00
Mark Hamstra 38454c4aed Merge branch 'master' of https://github.com/mesos/spark into WithThing 2013-03-16 11:54:44 -07:00
Matei Zaharia c1e9cdc49f Merge pull request #525 from stephenh/subtractByKey
Add PairRDDFunctions.subtractByKey.
2013-03-16 11:47:45 -07:00
Mark Hamstra ef75be3bf7 Merge branch 'master' of https://github.com/mesos/spark into foldByKey 2013-03-15 21:41:24 -07:00
Andrew xia 5892393140 refactor fair scheduler implementation
1.Chage "pool" properties to be the memeber of ActiveJob
2.Abstract the Schedulable of Pool and TaskSetManager
3.Abstract the FIFO and FS comparator algorithm
4.Miscellaneous changing of class define and construction
2013-03-16 11:13:38 +08:00
Matei Zaharia cdbfd1e196 Merge pull request #516 from squito/fix_local_metrics
Fix local metrics
2013-03-15 15:13:28 -07:00
Mikhail Bautin 7fd2708eda Add a log4j compile dependency to fix build in IntelliJ
Also rename parent project to spark-parent (otherwise it shows up as
"parent" in IntelliJ, which is very confusing).
2013-03-15 11:41:51 -07:00
Mark Hamstra 1a4070477d whitespace cleanup 2013-03-15 11:28:28 -07:00
Mark Hamstra 857010392b Fuller implementation of foldByKey 2013-03-15 10:56:05 -07:00
Mark Hamstra 16a4ca4537 restrict V type of foldByKey in order to retain ClassManifest; added foldByKey to Java API and test 2013-03-14 13:58:37 -07:00
Mark Hamstra b1422cbdd5 added foldByKey 2013-03-14 12:59:58 -07:00
Stephen Haberman 7786881f47 Fix tabs that snuck in. 2013-03-14 14:57:12 -05:00
Stephen Haberman 7d8bb4df3a Allow subtractByKey's other argument to have a different value type. 2013-03-14 14:44:15 -05:00
Stephen Haberman 4632c45af1 Finished subtractByKeys. 2013-03-14 10:35:34 -05:00
Matei Zaharia 4032beba49 Merge pull request #521 from stephenh/earlyclose
Close the reader in HadoopRDD as soon as iteration end.
2013-03-13 19:29:46 -07:00
Stephen Haberman 63fe225587 Simplify SubtractedRDD in preparation from subtractByKey. 2013-03-13 17:17:34 -05:00
Mark Hamstra cd5b947cf6 Merge branch 'master' of https://github.com/mesos/spark into WithThing 2013-03-13 13:16:14 -07:00
Stephen Haberman e7f1a69c6b Add a test for NextIterator. 2013-03-13 10:46:33 -05:00
Stephen Haberman 1a175d13b9 Add NextIterator.closeIfNeeded. 2013-03-13 10:17:39 -05:00
Stephen Haberman 8f00d23598 Remove NextIterator.close default implementation. 2013-03-12 12:30:10 -05:00
Harold Lim 0b64e5f1ac Removed some commented code 2013-03-12 13:31:27 +08:00
Harold Lim f5b1fecb9f Cleaned up the code 2013-03-12 13:31:27 +08:00
Harold Lim b5325182a3 Updated/Refactored the Fair Task Scheduler. It does not inherit ClusterScheduler anymore. Rather, ClusterScheduler internally uses TaskSetQueuesManager that handles the scheduling of taskset queues. This is the class that should be extended to support other scheduling policies 2013-03-12 13:31:27 +08:00
Harold Lim 54ed7c4af4 Changed the name of the system property to set the allocation xml 2013-03-12 13:31:27 +08:00
Harold Lim c07087364b Made changes to the SparkContext to have a DynamicVariable for setting local properties that can be passed down the stack. Added an implementation of the fair scheduler 2013-03-12 13:31:27 +08:00
Stephen Haberman 9e68f48625 More quickly call close in HadoopRDD.
This also refactors out the common "gotNext" iterator pattern into
a shared utility class.
2013-03-11 23:59:17 -05:00
Charles Reiss 769d399674 Send block sizes as longs. 2013-03-11 14:17:05 -07:00
Mark Hamstra 562893bea3 deleted excess curly braces 2013-03-10 22:43:08 -07:00
Imran Rashid 8a11ac3dc7 increase sleep time 2013-03-10 22:31:44 -07:00
Imran Rashid 9f97f2f9d8 add a small wait to one task to make sure some task runtime really is non-zero 2013-03-10 22:30:18 -07:00
Mark Hamstra 1289e7176b refactored _With API and added foreachPartition 2013-03-10 22:27:13 -07:00
Mark Hamstra b57df1f5e3 Merge branch 'master' of https://github.com/mesos/spark into WithThing 2013-03-10 16:56:31 -07:00
Matei Zaharia 2e1bbc4e7e Merge remote-tracking branch 'woggling/dag-sched-driver-port'
Conflicts:
	core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala
2013-03-10 16:52:54 -07:00
Matei Zaharia 91a9d093bd Merge pull request #512 from patelh/fix-kryo-serializer
Fix reference bug in Kryo serializer, add test, update version
2013-03-10 15:48:23 -07:00
Matei Zaharia 557cfd0f4d Merge pull request #515 from woggling/deploy-app-death
Notify standalone deploy client of application death.
2013-03-10 15:44:57 -07:00
Matei Zaharia a59cc6060f Merge remote-tracking branch 'stephenh/nomocks'
Conflicts:
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala
2013-03-10 13:39:10 -07:00
Imran Rashid 20f01a0a1b enable task metrics in local mode, add tests 2013-03-09 21:17:31 -08:00
Imran Rashid ec30188a2a rename remoteFetchWaitTime to fetchWaitTime, since it also includes time from local fetches 2013-03-09 21:16:53 -08:00
Charles Reiss b0983c5762 Notify standalone deploy client of application death.
Usually, this isn't necessary since the application will be removed
as a result of the deploy client disconnecting, but occassionally, the
standalone deploy master removes an application otherwise.

Also mark applications as FAILED instead of FINISHED when they are
killed as a result of their executors failing too many times.
2013-03-09 11:29:45 -08:00
Charles Reiss d0216cb38b Prevent DAGSchedulerSuite from corrupting driver.port.
Use the LocalSparkContext abstraction to properly manage clearing
spark.driver.port.
2013-03-09 10:49:02 -08:00
Hiral Patel 664e5fd24b Fix reference bug in Kryo serializer, add test, update version 2013-03-07 22:16:11 -08:00
Mark Hamstra 5ff0810b11 refactor mapWith, flatMapWith and filterWith to each use two parameter lists 2013-03-05 12:25:44 -08:00
Mark Hamstra d046d8ad32 whitespace formatting 2013-03-05 00:48:13 -08:00
Mark Hamstra 9148b968cf mapWith, flatMapWith and filterWith 2013-03-04 15:48:47 -08:00
Matei Zaharia 9f0dc829cb Fix TaskMetrics not being serializable 2013-03-04 12:08:31 -08:00
Matei Zaharia 04fb81ffe5 Merge pull request #506 from rxin/spark-706
Fixed SPARK-706: Failures in block manager put leads to read task hanging.
2013-03-03 17:20:07 -08:00
Imran Rashid 0bd1d00c2a minor cleanup based on feedback in review request 2013-03-03 16:46:45 -08:00
Imran Rashid f1006b99ff change CleanupIterator to CompletionIterator 2013-03-03 16:39:05 -08:00
Imran Rashid 8fef5b9c5f refactoring of TaskMetrics 2013-03-03 16:34:04 -08:00
Imran Rashid d36abdb053 Merge branch 'master' into stageInfo 2013-03-03 15:20:46 -08:00
Matei Zaharia 6bfc7cad6b Merge pull request #504 from mosharaf/master
Worker address was getting removed when removing an app.
2013-03-02 22:14:49 -08:00
Mark Hamstra 8b06b359da bump version to 0.7.1-SNAPSHOT in the subproject poms to keep the maven build building. 2013-02-28 23:34:34 -08:00
Reynold Xin 44134e12bb Fixed SPARK-706: Failures in block manager put leads to read task
hanging.
2013-02-28 15:14:59 -08:00
Stephen Haberman 6415c2bb60 Don't create the Executor until we have everything it needs. 2013-02-28 12:38:09 -06:00
Stephen Haberman 80eecd2cb1 Make Executor fields volatile since they're read from the thread pool. 2013-02-28 10:41:07 -06:00
Mosharaf Chowdhury 4ab387bcdb Fixed master datastructure updates after removing an application; and a typo. 2013-02-27 13:52:44 -08:00
Matei Zaharia ece3edfffa Fix a problem with no hosts being counted as alive in the first job 2013-02-26 12:11:03 -08:00
Matei Zaharia 73697e2891 Fix overly large thread names in PySpark 2013-02-26 12:07:59 -08:00
Stephen Haberman db957e5bd7 Fix MapOutputTrackerSuite. 2013-02-26 01:38:50 -06:00
Stephen Haberman a65aa549ff Override DAGScheduler.runLocally so we can remove the Thread.sleep. 2013-02-25 23:49:32 -06:00
Stephen Haberman a4adeb255c Merge branch 'master' into nomocks
Conflicts:
	core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala
2013-02-25 23:48:52 -06:00
Tathagata Das c02e064938 Fixed replication bug in BlockManager 2013-02-25 17:27:46 -08:00
Matei Zaharia 490f056cdd Allow passing sparkHome and JARs to StreamingContext constructor
Also warns if spark.cleaner.ttl is not set in the version where you pass
your own SparkContext.
2013-02-25 15:13:30 -08:00
Matei Zaharia 568bdaf8ae Set spark.deploy.spreadOut to true by default in 0.7 (improves locality) 2013-02-25 14:34:55 -08:00
Matei Zaharia 1ef58dadcc Add a config property for Akka lifecycle event logging 2013-02-25 14:01:24 -08:00
Matei Zaharia ceaec4a675 Merge pull request #498 from pwendell/shutup-akka
Disable remote lifecycle logging from Akka.
2013-02-25 12:31:24 -08:00
Patrick Wendell 85a85646d9 Disable remote lifecycle logging from Akka.
This changes the default setting to `off` for remote lifecycle events. When this is on, it is very chatty at the INFO level. It also prints out several ERROR messages sometimes when sc.stop() is called.
2013-02-25 12:25:43 -08:00
Imran Rashid 8f17387d97 remove bogus comment 2013-02-25 10:31:06 -08:00
Matei Zaharia 6ae9a22c3e Get spark.default.paralellism on each call to defaultPartitioner,
instead of only once, in case the user changes it across Spark uses
2013-02-25 10:28:08 -08:00
Matei Zaharia d6e6abece3 Merge pull request #459 from stephenh/bettersplits
Change defaultPartitioner to use upstream split size.
2013-02-25 09:22:04 -08:00
Stephen Haberman c44ccf2862 Use default parallelism if its set. 2013-02-24 23:54:03 -06:00
Stephen Haberman 44032bc476 Merge branch 'master' into bettersplits
Conflicts:
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/test/scala/spark/ShuffleSuite.scala
2013-02-24 22:08:14 -06:00
Christoph Grothaus f39f2b7636 Incorporate feedback from mateiz:
- we do not need getEnvOrEmpty
- Instead of saving SPARK_NONDAEMON_JAVA_OPTS, it would be better to modify the scripts to use a different variable name for the JAVA_OPTS they do eventually use
2013-02-24 21:24:30 +01:00
Tathagata Das dff53d1b94 Merge branch 'mesos-master' into streaming 2013-02-24 12:17:22 -08:00
Matei Zaharia 3b9f929467 Merge pull request #468 from haitaoyao/master
support customized java options for Master, Worker, Executor, and Repl
2013-02-23 23:38:15 -08:00
Stephen Haberman 37c7a71f9c Add subtract to JavaRDD, JavaDoubleRDD, and JavaPairRDD. 2013-02-24 00:27:53 -06:00
Stephen Haberman f442e7d83c Update for split->partition rename. 2013-02-24 00:27:14 -06:00
Stephen Haberman cec87a0653 Merge branch 'master' into subtract 2013-02-23 23:27:55 -06:00
Tathagata Das d853aa9658 Change spark.cleaner.delay to spark.cleaner.ttl. Updated docs. 2013-02-23 17:42:26 -08:00
Patrick Wendell 931f439be9 Responding to code review 2013-02-23 15:40:41 -08:00
Patrick Wendell f51b0f93f2 Adding Java-accessible methods to Vector.scala
This is needed for the Strata machine learning tutorial (and
also is generally helpful).
2013-02-23 13:26:59 -08:00
Matei Zaharia d942d39072 Handle exceptions in RecordReader.close() better (suggested by Jim
Donahue)
2013-02-23 11:19:07 -08:00
Matei Zaharia c89824046a Merge pull request #490 from woggling/conn-death
Detect when SendingConnections disconnect even if we aren't sending to them
2013-02-22 22:58:19 -08:00
Charles Reiss 50cf8c8b79 Add fault tolerance test that uses replicated RDDs. 2013-02-22 16:11:53 -08:00
Charles Reiss c8a7886921 Detect when SendingConnections drop by trying to read them.
Comment fix
2013-02-22 16:11:52 -08:00
Matei Zaharia d4d7993bf5 Several fixes to the work to log when no resources can be used by a job.
Fixed some of the messages as well as code style.
2013-02-22 15:51:37 -08:00
Matei Zaharia f33662c133 Merge remote-tracking branch 'pwendell/starvation-check'
Also fixed a bug where master was offering executors on dead workers

Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
2013-02-22 15:27:41 -08:00
Matei Zaharia 7341de0d48 Merge pull request #475 from JoshRosen/spark-668
Remove hack workaround for SPARK-668
2013-02-22 14:56:18 -08:00
Patrick Wendell f8c3a03d55 SPARK-702: Replace Function --> JFunction in JavaAPI Suite.
In a few places the Scala (rather than Java) function class is used.
2013-02-22 12:54:15 -08:00
Imran Rashid 0f37b43b40 make the ShuffleFetcher responsible for collecting shuffle metrics, which gives us metrics for CoGroupedRDD and ShuffledRDD 2013-02-21 16:56:28 -08:00
Imran Rashid 9230617f23 add cleanup iterator 2013-02-21 16:55:14 -08:00
Imran Rashid 81bd07da26 sparkListeners should be a val 2013-02-21 15:21:45 -08:00
Imran Rashid 796e934d31 add some docs & some cleanup 2013-02-21 15:19:34 -08:00
Imran Rashid 394d3acc3e store taskInfo & metrics together in a tuple 2013-02-21 15:19:34 -08:00
Imran Rashid 7960927cf4 get rid of a bunch of boilerplate; more formatting happens in Listener, not StageInfo 2013-02-21 15:19:34 -08:00
Imran Rashid d0bfac3eed taskInfo tracks if a task is run on a preferred host 2013-02-21 15:19:34 -08:00
Imran Rashid 6f62a57858 add runtime breakdowns 2013-02-21 15:19:34 -08:00
Imran Rashid 176cb20703 add task result size; better formatting for time interval distributions; cleanup distribution formatting 2013-02-21 15:19:33 -08:00
Imran Rashid f2fcabf2ea add timing around parts of executor & track result size 2013-02-21 15:19:33 -08:00
Imran Rashid ff127cfcd3 Merge branch 'master' into stageInfo
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/storage/BlockManager.scala
2013-02-21 15:16:21 -08:00
Imran Rashid 69f9a7035f fully revert change to addOnCompleteCallback -- missed this in e9f53ec 2013-02-21 15:07:46 -08:00
Imran Rashid baab23abdf TaskContext does not hold a reference to Task; instead, it has a shared instance of TaskMetrics with Task 2013-02-21 14:13:01 -08:00
haitao.yao 8215b95547 Merge branch 'mesos' 2013-02-21 10:07:24 +08:00
Christoph Grothaus 85a35c6840 Fix SPARK-698. From ExecutorRunner, launch java directly instead via the run scripts. 2013-02-20 21:42:11 +01:00
Tathagata Das 334ab92441 Fixed bug in CheckpointSuite 2013-02-20 10:26:36 -08:00
Tathagata Das 1cb725e417 Merge branch 'mesos-master' into streaming 2013-02-20 09:55:35 -08:00
Tathagata Das fb9956256d Merge branch 'mesos-master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CheckpointRDD.scala
	streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala
2013-02-20 09:01:29 -08:00
Matei Zaharia 05bc02e80b Merge pull request #482 from woggling/shutdown-exceptions
Don't call System.exit over uncaught exceptions from shutdown hooks
2013-02-19 20:56:15 -08:00
haitao.yao 6a3d44c673 Merge branch 'mesos' 2013-02-20 10:23:58 +08:00
Charles Reiss 092c631fa8 Pull detection of being in a shutdown hook into utility function. 2013-02-19 17:49:55 -08:00
Reynold Xin 130f704baf Added a method to create PartitionPruningRDD. 2013-02-19 16:03:52 -08:00
Charles Reiss d0588bd6d7 Catch/log errors deleting temp dirs 2013-02-19 13:04:06 -08:00
Charles Reiss 687581c3ec Paranoid uncaught exception handling for exceptions during shutdown 2013-02-19 13:03:02 -08:00
haitao.yao 7c129388fb Merge branch 'mesos' 2013-02-19 11:22:24 +08:00
Matei Zaharia 7151e1e4c8 Rename "jobs" to "applications" in the standalone cluster 2013-02-17 23:23:08 -08:00
Matei Zaharia 06e5e6627f Renamed "splits" to "partitions" 2013-02-17 22:13:26 -08:00
Matei Zaharia 340cc54e47 Merge pull request #471 from stephenh/parallelrdd
Move ParallelCollection into spark.rdd package.
2013-02-16 16:39:15 -08:00
Matei Zaharia 3260b6120e Merge pull request #470 from stephenh/morek
Make CoGroupedRDDs explicitly have the same key type.
2013-02-16 16:38:38 -08:00
Stephen Haberman 924f47dd11 Add RDD.subtract.
Instead of reusing the cogroup primitive, this adds a SubtractedRDD
that knows it only needs to keep rdd1's values (per split) in memory.
2013-02-16 13:38:42 -06:00
Stephen Haberman e7713adb99 Move ParallelCollection into spark.rdd package. 2013-02-16 13:20:48 -06:00