Commit graph

1947 commits

Author SHA1 Message Date
Matei Zaharia e89ffc7b3c Merge pull request #839 from jegonzal/zip_partitions
Currying RDD.zipPartitions
2013-08-16 14:02:34 -07:00
Joseph E. Gonzalez 53b2639a1e Reversing the argument order in zipPartitions to enable stronger type inference. 2013-08-16 12:38:59 -07:00
Andre Schumacher c7e348faec Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path 2013-08-16 11:58:20 -07:00
Reynold Xin c961c19b7b Use the JSON formatter from Scala library and removed dependency on lift-json.
It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).
2013-08-15 18:23:01 -07:00
Reynold Xin eddbf43b54 Revert "Merge pull request #834 from Daemoen/master"
This reverts commit 230ab2722e, reversing
changes made to 659553b21d.
2013-08-15 17:49:37 -07:00
Reynold Xin 230ab2722e Merge pull request #834 from Daemoen/master
Updated json output to allow for display of worker state
2013-08-15 17:45:17 -07:00
Patrick Wendell 659553b21d Merge pull request #836 from pwendell/rename
Rename `memoryBytesToString` and `memoryMegabytesToString`
2013-08-15 16:56:31 -07:00
Jey Kottalam a06a9d5c5f Rename HadoopWriter to SparkHadoopWriter since it's outside of our package 2013-08-15 16:50:37 -07:00
Jey Kottalam 8f979edef5 Fix newTaskAttemptID to work under YARN 2013-08-15 16:50:37 -07:00
Jey Kottalam e2d7656ca3 re-enable YARN support 2013-08-15 16:50:37 -07:00
Jey Kottalam bd0bab47c9 SparkEnv isn't available this early, and not needed anyway 2013-08-15 16:50:37 -07:00
Jey Kottalam 4f43fd791a make SparkHadoopUtil a member of SparkEnv 2013-08-15 16:50:37 -07:00
Jey Kottalam 43ebcb8484 rename HadoopMapRedUtil => SparkHadoopMapRedUtil, HadoopMapReduceUtil => SparkHadoopMapReduceUtil 2013-08-15 16:50:37 -07:00
Jey Kottalam 8b1c1520fc add comment 2013-08-15 16:50:37 -07:00
Jey Kottalam 69c3bbf688 dynamically detect hadoop version 2013-08-15 16:50:37 -07:00
Jey Kottalam f67b94ad4f remove core/src/hadoop{1,2} dirs 2013-08-15 16:50:36 -07:00
Jey Kottalam b877e20a33 move yarn to its own directory 2013-08-15 16:50:36 -07:00
Patrick Wendell 4c6ade1ad5 Rename memoryBytesToString and memoryMegabytesToString
These are used all over the place now and they are not specific to memory at all.

memoryBytesToString --> bytesToString
memoryMegabytesToString --> megabytesToString
2013-08-15 15:58:07 -07:00
Reynold Xin 1a51deae8a More minor UI changes including code review feedback. 2013-08-15 14:34:07 -07:00
Daemoen ad2e8b5126 Updated json output to allow for display of worker state
Ops teams need to ensure that the cluster is functional and performant.  Having to scrape the html source for worker state won't work reliably, and will be slow.  By exposing the state in the json output, ops teams are able to ensure a fully functional environment by querying for the json output and parsing for dead nodes.
2013-08-15 12:19:14 -07:00
Reynold Xin 2d2a556bdf Various UI improvements. 2013-08-14 23:23:09 -07:00
Reynold Xin 290e3e6e65 Renamed setCurrentJobDescription to setJobDescription. 2013-08-14 18:40:53 -07:00
Reynold Xin 3886b54933 A few small scheduler / job description changes.
1. Renamed SparkContext.addLocalProperty to setLocalProperty. And allow this function to unset a property.

2. Renamed SparkContext.setDescription to setCurrentJobDescription.

3. Throw an exception if the fair scheduler allocation file is invalid.
2013-08-14 17:19:42 -07:00
Matei Zaharia 839f2d4f3f Merge pull request #822 from pwendell/ui-features
Adding GC Stats to TaskMetrics (and three small fixes)
2013-08-14 16:17:23 -07:00
Patrick Wendell 04ad78b09d Style cleanup based on Matei feedback 2013-08-14 14:57:21 -07:00
Kay Ousterhout a88aa5e6ed Fixed 2 bugs in executor UI.
1) UI crashed if the executor UI was loaded before any tasks started.
2) The total tasks was incorrectly reported due to using string (rather
than int) arithmetic.
2013-08-13 23:44:58 -07:00
Patrick Wendell c223176388 Small style clean-up 2013-08-13 16:56:37 -07:00
Patrick Wendell fab5cee111 Correcting terminology in RDD page 2013-08-13 16:25:55 -07:00
Patrick Wendell 024e5c5ce1 Correct sorting order for stages 2013-08-13 16:25:55 -07:00
Patrick Wendell 4e9f0c2df6 Capturing GC detials in TaskMetrics 2013-08-13 16:25:55 -07:00
Patrick Wendell f0382007dc Bug fix for display of shuffle read/write metrics.
This fixes an error where empty cells are missing if a given task
has no shuffle read/write.
2013-08-13 16:25:55 -07:00
Matei Zaharia d316af9c84 Merge pull request #821 from pwendell/print-launch-command
Print run command to stderr rather than stdout
2013-08-13 15:31:01 -07:00
Patrick Wendell a7feb69ae8 Print run command to stderr rather than stdout 2013-08-13 15:07:03 -07:00
Kay Ousterhout 1beb843a6f Reuse the set of failed states rather than creating a new object each time 2013-08-13 14:27:40 -07:00
Kay Ousterhout c92dd627ca Properly account for killed tasks.
The TaskState class's isFinished() method didn't return true for
KILLED tasks, which means some resources are never reclaimed
for tasks that are killed. This also made it inconsistent with the
isFinished() method used by CoarseMesosSchedulerBackend.
2013-08-13 12:40:15 -07:00
Patrick Wendell ed6a1646e6 Slight change to pr-784 2013-08-13 09:29:40 -07:00
Patrick Wendell a0133bfbad Merge pull request #784 from jerryshao/dev-metrics-servlet
Add MetricsServlet for Spark metrics system
2013-08-13 09:28:18 -07:00
Matei Zaharia 65d0d91fba Merge pull request #807 from JoshRosen/guava-optional
Change scala.Option to Guava Optional in Java APIs
2013-08-12 19:00:57 -07:00
Josh Rosen cf08bb7a3e Fix import organization. 2013-08-12 18:55:02 -07:00
jerryshao 09c7179e81 MetricsServlet code refactor according to comments 2013-08-12 13:23:23 +08:00
jerryshao 320e87e7ab Add MetricsServlet for Spark metrics system 2013-08-12 13:23:23 +08:00
Reynold Xin e5b9ed2833 Merge pull request #808 from pwendell/ui_compressed_bytes
Report compressed bytes read when calculating TaskMetrics
2013-08-11 17:22:47 -07:00
Patrick Wendell 3d8f281604 Report compressed bytes read when calculating TaskMetrics 2013-08-11 16:25:57 -07:00
Matei Zaharia 379648630b Merge pull request #805 from woggle/hadoop-rdd-jobconf
Use new Configuration() instead of slower new JobConf() in SerializableWritable
2013-08-11 14:51:47 -07:00
Josh Rosen d7f78b443b Change scala.Option to Guava Optional in Java APIs. 2013-08-11 12:05:09 -07:00
Charles Reiss 6402b539d0 Use new Configuration() instead of new JobConf() for ObjectWritable.
JobConf's constructor loads default config files in some verisons of
Hadoop, which is quite slow, and we only need the Configuration object
to pass the correct ClassLoader.
2013-08-10 21:31:05 -07:00
Matei Zaharia 71c63de22f Merge pull request #795 from mridulm/master
Fix bug reported in PR 791 : a race condition in ConnectionManager and Connection
2013-08-10 10:21:20 -07:00
Matei Zaharia d3277a0daf Merge remote-tracking branch 'origin/pr/792'
Conflicts:
	core/src/main/scala/spark/ui/jobs/IndexPage.scala
	core/src/main/scala/spark/ui/jobs/StagePage.scala
2013-08-10 10:18:50 -07:00
Patrick Wendell d17eeb997d Merge pull request #785 from anfeng/master
expose HDFS file system stats via Executor metrics
2013-08-10 09:02:27 -07:00
Kay Ousterhout 14d14f451a Shortened names, as per Matei's suggestion 2013-08-10 07:50:27 -07:00
Matei Zaharia cd247ba5bb Merge pull request #786 from shivaram/mllib-java
Java fixes, tests and examples for ALS, KMeans
2013-08-09 20:41:13 -07:00
Kay Ousterhout 7810a76512 Only print event queue full error message once 2013-08-09 18:20:48 -07:00
Kay Ousterhout 44ca8629d8 Style fix: removing unnecessary return type 2013-08-09 17:22:50 -07:00
Kay Ousterhout 29b79714f9 Style fixes based on code review 2013-08-09 16:46:34 -07:00
Kay Ousterhout 81e1d4a7d1 Refactored SparkListener to process all events asynchronously.
This commit fixes issues where SparkListeners that take a while to
process events slow the DAGScheduler.

This commit also fixes a bug in the UI where if a user goes to a
web page of a stage that does not exist, they can create a memory
leak (granted, this is not an issue at small scale -- probably only
an issue if someone actively tried to DOS the UI).
2013-08-09 13:27:41 -07:00
Matei Zaharia b09d4b79e8 Merge pull request #799 from woggle/sync-fix
Remove extra synchronization in ResultTask
2013-08-09 13:17:08 -07:00
Patrick Wendell cc6b92e80e Merge pull request #775 from pwendell/print-launch-command
Log the launch command for Spark daemons
2013-08-09 13:00:33 -07:00
Patrick Wendell 3970b580c2 Using quotes when printing out command 2013-08-09 11:53:32 -07:00
Charles Reiss 9dfc280f74 Remove extra synchronization in ResultTask 2013-08-09 11:09:02 -07:00
Matei Zaharia f94fc75c3f Merge pull request #788 from shane-huang/sparkjavaopts
For standalone mode, add worker local env setting of SPARK_JAVA_OPTS as ...
2013-08-09 10:04:03 -07:00
Matei Zaharia d1e1c1b24d Add test for Kryo with WrappedArray (which was failing in Chill 0.3.0) 2013-08-08 13:34:11 -07:00
Mridul Muralidharan c230ca3b4e Change line size 2013-08-08 22:28:40 +05:30
Mridul Muralidharan dc47084f4e Attempt to fix bug reported in PR 791 : a race condition in ConnectionManager and Connection 2013-08-08 22:19:27 +05:30
Kay Ousterhout 88049a214d Fixed 3 bugs that caused UI to crash (including SPARK-810).
One bug caused the UI to crash if you try to look at a job's status
before any of the tasks have finished.

The second bug was a concurrency issue where two different threads
(the scheduling thread and a UI thread) could be reading/updating
the data structures in JobProgressListener concurrently.

The third bug mis-used an Option, also causing the UI to crash
under certain conditions.
2013-08-07 23:09:25 -07:00
Patrick Wendell b4321edf68 Reverting boostrap change 2013-08-07 22:18:18 -07:00
Patrick Wendell 21392f2a73 Change I forgot to merge in 2013-08-07 21:45:32 -07:00
Patrick Wendell 706394b370 Bumping font size to 14px and fixing sytle issue in progress bars 2013-08-07 21:27:04 -07:00
Patrick Wendell 8c0d668468 Merge branch 'master' into bootstrap-design
Conflicts:
	core/src/main/scala/spark/ui/UIUtils.scala
	core/src/main/scala/spark/ui/jobs/IndexPage.scala
	core/src/main/scala/spark/ui/storage/RDDPage.scala
2013-08-07 21:06:03 -07:00
Kay Ousterhout b88e26248e Fixed issue in UI that limited scheduler throughput.
Removal of items from ArrayBuffers in the UI code was slow and
significantly impacted scheduler throughput. This commit
improves scheduler throughput by 5x.
2013-08-07 14:42:05 -07:00
shane-huang cbc5107e36 For standalone mode, add worker local env setting of SPARK_JAVA_OPTS as default and let application env override default options if applicable
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-08-07 14:36:48 +08:00
Matei Zaharia 6b043a6f11 Merge pull request #724 from dlyubimov/SPARK-826
SPARK-826: fold(), reduce(), collect() always attempt to use java serialization
2013-08-06 22:31:02 -07:00
Matei Zaharia 7c4b7a53b1 Merge remote-tracking branch 'origin/pr/781'
Conflicts:
	core/src/main/resources/spark/ui/static/webui.css
2013-08-06 17:19:49 -07:00
Karen Feng 908032e79b Used saturated colors for progress bars 2013-08-06 16:52:21 -07:00
Karen Feng 8bc497fa10 Lightened color of progress bars 2013-08-06 16:33:05 -07:00
Karen Feng ca1903ea63 Overlays progress text on top of bar 2013-08-06 15:45:42 -07:00
Matei Zaharia df4d10d630 Merge pull request #779 from adatao/adatao-global-SparkEnv
[HOTFIX] Extend thread safety for SparkEnv.get()
2013-08-06 15:44:05 -07:00
Shivaram Venkataraman 471fbadd0c Java examples, tests for KMeans and ALS
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
  easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
  called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
  examples project.

Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
  KMeans init
2013-08-06 15:43:46 -07:00
anfeng dda2ac8b5d reformat registerFileSystemStat() 2013-08-06 15:22:25 -07:00
Karen Feng 099528b6c4 Pre-sorts stage/env tables, changes text/link of stage summaries 2013-08-06 14:52:12 -07:00
Karen Feng 254a930730 Reverse sorts StageTable by submitted time 2013-08-06 14:18:38 -07:00
Karen Feng 5ed5b73026 Sorts first column of env tables 2013-08-06 13:59:53 -07:00
anfeng 0748c60817 expose HDFS file system stats via Executor metrics 2013-08-06 11:47:06 -07:00
Reynold Xin d031f73679 Merge pull request #782 from WANdisco/master
SHARK-94 Log the files computed by HadoopRDD and NewHadoopRDD
2013-08-05 22:33:00 -07:00
Matei Zaharia 1b63dea816 Merge pull request #769 from markhamstra/NegativeCores
SPARK-847 + SPARK-845: Zombie workers and negative cores
2013-08-05 22:21:26 -07:00
Alexander Pivovarov a30866438b SHARK-94 Log the files computed by HadoopRDD and NewHadoopRDD 2013-08-05 21:48:43 -07:00
Matei Zaharia 8b277892c9 Merge pull request #774 from pwendell/job-description
Show user-defined job name in UI
2013-08-05 19:14:52 -07:00
Christopher Nguyen b1bbbe699c [HOTFIX] Mark lastSetSparkEnv @volatile in case it gets HotSpot-cached
On branch adatao-global-SparkEnv
Changes to be committed:

	modified:   core/src/main/scala/spark/SparkEnv.scala
2013-08-05 17:22:27 -07:00
Mark Hamstra 35d8f5ee52 Moved handling of timed out workers within the Master actor 2013-08-05 13:13:56 -07:00
Mark Hamstra 37ccf9301a milliseconds -> seconds in timeOutDeadWorkers logging 2013-08-05 13:13:56 -07:00
Mark Hamstra cdd1af562e Timeout zombie workers 2013-08-05 13:13:56 -07:00
Mikhail Bautin e8bec8365f Only reduce the number of cores once when removing an executor 2013-08-05 13:13:56 -07:00
Karen Feng 95025afdec Made most small fixes for SPARK-849 except for table sort, task progress overlay 2013-08-05 13:04:56 -07:00
Bill Zhao 87134b3648 SPARK-850: give better console message 2013-08-05 11:55:35 -07:00
Christopher Nguyen 39e4fda76f [HOTFIX] Extend thread safety for SparkEnv.get()
A ThreadLocal SparkEnv.env is facing various situations leading to
NullPointerExceptions, where SparkEnv.env set in one thread is not
gettable in another thread, but often assumed to be available.

See, e.g., https://groups.google.com/forum/#!topic/spark-developers/GLx8yunSj0A

This hotfixes SparkEnv.env to return either (a) the ThreadLocal
value if non-null, or (b) the previously set value in any thread.

This approach preserves SparkEnv.set() thread safety needed by
RDD.compute() and possibly other places. A refactoring that
parameterizes SparkEnv should be addressed subsequently.

On branch adatao-global-SparkEnv
Changes to be committed:

	modified:   core/src/main/scala/spark/SparkEnv.scala
2013-08-05 02:09:54 -07:00
Patrick Wendell f3660d5ab8 Make output formatting consistent between bash/scala 2013-08-03 21:30:15 -07:00
Patrick Wendell ad94fbb322 Log the launch command for Spark executors 2013-08-03 09:19:46 -07:00
Matei Zaharia 22abbc10d6 Merge pull request #772 from karenfeng/ui-843
Show app duration
2013-08-02 16:37:59 -07:00
Patrick Wendell 5b3784a79c Show user-defined job name in UI 2013-08-02 15:47:41 -07:00
Karen Feng b3ae5b25d5 Shows time the app has been running 2013-08-02 13:25:14 -07:00
Patrick Wendell 9d7dfd2d5a Merge pull request #743 from pwendell/app-metrics
Add application metrics to standalone master
2013-08-01 17:41:58 -07:00