Commit graph

1732 commits

Author SHA1 Message Date
Jeff Zhang 99fe8993f5 [SPARK-12994][CORE] It is not necessary to create ExecutorAllocationM…
…anager in local mode

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10914 from zjffdu/SPARK-12994.
2016-02-29 12:08:37 +00:00
Reynold Xin 391755dc6e [SPARK-13465] Add a task failure listener to TaskContext
## What changes were proposed in this pull request?

TaskContext supports task completion callback, which gets called regardless of task failures. However, there is no way for the listener to know if there is an error. This patch adds a new listener that gets called when a task fails.

## How was the this patch tested?
New unit test case and integration test case covering the code path

Author: Reynold Xin <rxin@databricks.com>

Closes #11340 from rxin/SPARK-13465.
2016-02-26 12:49:16 -08:00
Josh Rosen 633d63a48a [SPARK-12757] Add block-level read/write locks to BlockManager
## Motivation

As a pre-requisite to off-heap caching of blocks, we need a mechanism to prevent pages / blocks from being evicted while they are being read. With on-heap objects, evicting a block while it is being read merely leads to memory-accounting problems (because we assume that an evicted block is a candidate for garbage-collection, which will not be true during a read), but with off-heap memory this will lead to either data corruption or segmentation faults.

## Changes

### BlockInfoManager and reader/writer locks

This patch adds block-level read/write locks to the BlockManager. It introduces a new `BlockInfoManager` component, which is contained within the `BlockManager`, holds the `BlockInfo` objects that the `BlockManager` uses for tracking block metadata, and exposes APIs for locking blocks in either shared read or exclusive write modes.

`BlockManager`'s `get*()` and `put*()` methods now implicitly acquire the necessary locks. After a `get()` call successfully retrieves a block, that block is locked in a shared read mode. A `put()` call will block until it acquires an exclusive write lock. If the write succeeds, the write lock will be downgraded to a shared read lock before returning to the caller. This `put()` locking behavior allows us store a block and then immediately turn around and read it without having to worry about it having been evicted between the write and the read, which will allow us to significantly simplify `CacheManager` in the future (see #10748).

See `BlockInfoManagerSuite`'s test cases for a more detailed specification of the locking semantics.

### Auto-release of locks at the end of tasks

Our locking APIs support explicit release of locks (by calling `unlock()`), but it's not always possible to guarantee that locks will be released prior to the end of the task. One reason for this is our iterator interface: since our iterators don't support an explicit `close()` operator to signal that no more records will be consumed, operations like `take()` or `limit()` don't have a good means to release locks on their input iterators' blocks. Another example is broadcast variables, whose block locks can only be released at the end of the task.

To address this, `BlockInfoManager` uses a pair of maps to track the set of locks acquired by each task. Lock acquisitions automatically record the current task attempt id by obtaining it from `TaskContext`. When a task finishes, code in `Executor` calls `BlockInfoManager.unlockAllLocksForTask(taskAttemptId)` to free locks.

### Locking and the MemoryStore

In order to prevent in-memory blocks from being evicted while they are being read, the `MemoryStore`'s `evictBlocksToFreeSpace()` method acquires write locks on blocks which it is considering as candidates for eviction. These lock acquisitions are non-blocking, so a block which is being read will not be evicted. By holding write locks until the eviction is performed or skipped (in case evicting the blocks would not free enough memory), we avoid a race where a new reader starts to read a block after the block has been marked as an eviction candidate but before it has been removed.

### Locking and remote block transfer

This patch makes small changes to to block transfer and network layer code so that locks acquired by the BlockTransferService are released as soon as block transfer messages are consumed and released by Netty. This builds on top of #11193, a bug fix related to freeing of network layer ManagedBuffers.

## FAQ

- **Why not use Java's built-in [`ReadWriteLock`](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html)?**

  Our locks operate on a per-task rather than per-thread level. Under certain circumstances a task may consist of multiple threads, so using `ReadWriteLock` would mean that we might call `unlock()` from a thread which didn't hold the lock in question, an operation which has undefined semantics. If we could rely on Java 8 classes, we might be able to use [`StampedLock`](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/StampedLock.html) to work around this issue.

- **Why not detect "leaked" locks in tests?**:

  See above notes about `take()` and `limit`.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10705 from JoshRosen/pin-pages.
2016-02-25 17:17:56 -08:00
Shixiong Zhu 46f6e79316 Revert "[SPARK-13117][WEB UI] WebUI should use the local ip not 0.0.0.0"
This reverts commit 2e44031faf.
2016-02-25 11:39:26 -08:00
Devaraj K 2e44031faf [SPARK-13117][WEB UI] WebUI should use the local ip not 0.0.0.0
Fixed the HTTP Server Host Name/IP issue i.e. HTTP Server to take the
configured host name/IP and not '0.0.0.0' always.

Author: Devaraj K <devaraj@apache.org>

Closes #11133 from devaraj-kavali/SPARK-13117.
2016-02-25 12:18:43 +00:00
Daniel Jalova bcfd55fa98 [SPARK-12759][Core][Spark should fail fast if --executor-memory is too small for spark to start]
Added an exception to be thrown in UnifiedMemoryManager.scala if the configuration given for executor memory is too low. Also modified the exception message thrown when driver memory is too low.

This patch was tested manually by passing in config options to Spark shell. I also added a test in UnifiedMemoryManagerSuite.scala

Author: Daniel Jalova <djalova@us.ibm.com>

Closes #11255 from djalova/SPARK-12759.
2016-02-24 12:15:11 +00:00
Lianhui Wang 9f4263392e [SPARK-7729][UI] Executor which has been killed should also be displayed on Executor Tab
andrewor14 squito Dead Executors should also be displayed on Executor Tab.
as following:
![image](https://cloud.githubusercontent.com/assets/545478/11492707/ae55d7f6-982b-11e5-919a-b62cd84684b2.png)

Author: Lianhui Wang <lianhuiwang09@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Andrew Or <andrew@databricks.com>

Closes #10058 from lianhuiwang/SPARK-7729.
2016-02-23 11:08:39 -08:00
jerryshao e99d017098 [SPARK-13220][CORE] deprecate yarn-client and yarn-cluster mode
Author: jerryshao <sshao@hortonworks.com>

Closes #11229 from jerryshao/SPARK-13220.
2016-02-23 12:30:57 +00:00
Timothy Chen 00461bb911 [SPARK-10749][MESOS] Support multiple roles with mesos cluster mode.
Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler.

https://issues.apache.org/jira/browse/SPARK-10749

Author: Timothy Chen <tnachen@gmail.com>

Closes #8872 from tnachen/cluster_multi_roles.
2016-02-22 11:11:33 -08:00
Dongjoon Hyun 024482bf51 [MINOR][DOCS] Fix all typos in markdown files of doc and similar patterns in other comments
## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.
2016-02-22 09:52:07 +00:00
jerryshao 39ff154570 [SPARK-13426][CORE] Remove the support of SIMR
## What changes were proposed in this pull request?

This PR removes the support of SIMR, since SIMR is not actively used and maintained for a long time, also is not supported from `SparkSubmit`, so here propose to remove it.

## How was the this patch tested?

This patch is tested locally by running unit tests.

Author: jerryshao <sshao@hortonworks.com>

Closes #11296 from jerryshao/SPARK-13426.
2016-02-22 00:57:10 -08:00
Shixiong Zhu dfb2ae2f14 [SPARK-13408] [CORE] Ignore errors when it's already reported in JobWaiter
## What changes were proposed in this pull request?

`JobWaiter.taskSucceeded` will be called for each task. When `resultHandler` throws an exception, `taskSucceeded` will also throw it for each task. DAGScheduler just catches it and reports it like this:
```Scala
                  try {
                    job.listener.taskSucceeded(rt.outputId, event.result)
                  } catch {
                    case e: Exception =>
                      // TODO: Perhaps we want to mark the resultStage as failed?
                      job.listener.jobFailed(new SparkDriverExecutionException(e))
                  }
```
Therefore `JobWaiter.jobFailed` may be called multiple times.

So `JobWaiter.jobFailed` should use `Promise.tryFailure` instead of `Promise.failure` because the latter one doesn't support calling multiple times.

## How was the this patch tested?

Jenkins tests.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #11280 from zsxwing/SPARK-13408.
2016-02-19 23:00:08 -08:00
Josh Rosen 983fa2d620 [SPARK-13407] Guard against garbage-collected accumulators in TaskMetrics.fromAccumulatorUpdates
`TaskMetrics.fromAccumulatorUpdates()` can fail if accumulators have been garbage-collected on the driver. To guard against this, this patch introduces `ListenerTaskMetrics`, a subclass of `TaskMetrics` which is used only in `TaskMetrics.fromAccumulatorUpdates()` and which eliminates the need to access the original accumulators on the driver.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #11276 from JoshRosen/accum-updates-fix.
2016-02-19 15:57:23 -08:00
Sean Owen 78562535fe [SPARK-13371][CORE][STRING] TaskSetManager.dequeueSpeculativeTask compares Option and String directly.
## What changes were proposed in this pull request?

Fix some comparisons between unequal types that cause IJ warnings and in at least one case a likely bug (TaskSetManager)

## How was the this patch tested?

Running Jenkins tests

Author: Sean Owen <sowen@cloudera.com>

Closes #11253 from srowen/SPARK-13371.
2016-02-18 12:14:30 -08:00
Andrew Or 9451fed52c [SPARK-13344][TEST] Fix harmless accumulator not found exceptions
See [JIRA](https://issues.apache.org/jira/browse/SPARK-13344) for more detail. This was caused by #10835.

Author: Andrew Or <andrew@databricks.com>

Closes #11222 from andrewor14/fix-test-accum-exceptions.
2016-02-17 16:17:20 -08:00
Claes Redestad 22e9723d62 [SPARK-13278][CORE] Launcher fails to start with JDK 9 EA
See http://openjdk.java.net/jeps/223 for more information about the JDK 9 version string scheme.

Author: Claes Redestad <claes.redestad@gmail.com>

Closes #11160 from cl4es/master.
2016-02-14 11:49:37 +00:00
Sean Owen 388cd9ea8d [SPARK-13172][CORE][SQL] Stop using RichException.getStackTrace it is deprecated
Replace `getStackTraceString` with `Utils.exceptionString`

Author: Sean Owen <sowen@cloudera.com>

Closes #11182 from srowen/SPARK-13172.
2016-02-13 21:05:48 -08:00
Michael Gummelt 62b1c07e7e [SPARK-5095] remove flaky test
Overrode the start() method, which was previously starting a thread causing a race condition. I believe this should fix the flaky test.

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #11164 from mgummelt/fix_mesos_tests.
2016-02-12 15:00:39 -08:00
Michael Gummelt 38bc6018e9 [SPARK-5095] Fix style in mesos coarse grained scheduler code
andrewor14 This addressed your style comments from #10993

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #11187 from mgummelt/fix_mesos_style.
2016-02-12 14:57:31 -08:00
Sanket 894921d813 [SPARK-6166] Limit number of in flight outbound requests
This JIRA is related to
https://github.com/apache/spark/pull/5852
Had to do some minor rework and test to make sure it
works with current version of spark.

Author: Sanket <schintap@untilservice-lm>

Closes #10838 from redsanket/limit-outbound-connections.
2016-02-11 22:40:00 -08:00
Steve Loughran a2c7dcf61f [SPARK-7889][WEBUI] HistoryServer updates UI for incomplete apps
When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available.  It does this by checking if a version of the app has been loaded with a larger *filesize*.  If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI.

https://issues.apache.org/jira/browse/SPARK-7889

Author: Steve Loughran <stevel@hortonworks.com>
Author: Imran Rashid <irashid@cloudera.com>

Closes #11118 from squito/SPARK-7889-alternate.
2016-02-11 21:37:53 -06:00
Junyang f9ae99fee1 [SPARK-13074][CORE] Add JavaSparkContext. getPersistentRDDs method
The "getPersistentRDDs()" is a useful API of SparkContext to get cached RDDs. However, the JavaSparkContext does not have this API.

Add a simple getPersistentRDDs() to get java.util.Map<Integer, JavaRDD> for Java users.

Author: Junyang <fly.shenjy@gmail.com>

Closes #10978 from flyjy/master.
2016-02-11 09:33:11 +00:00
Sean Owen 29c547303f [SPARK-12414][CORE] Remove closure serializer
Remove spark.closure.serializer option and use JavaSerializer always

CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be.

Author: Sean Owen <sowen@cloudera.com>

Closes #11150 from srowen/SPARK-12414.
2016-02-10 13:34:53 -08:00
Michael Gummelt 80cb963ad9 [SPARK-5095][MESOS] Support launching multiple mesos executors in coarse grained mesos mode.
This is the next iteration of tnachen's previous PR: https://github.com/apache/spark/pull/4027

In that PR, we resolved with andrewor14 and pwendell to implement the Mesos scheduler's support of `spark.executor.cores` to be consistent with YARN and Standalone.  This PR implements that resolution.

This PR implements two high-level features.  These two features are co-dependent, so they're implemented both here:
- Mesos support for spark.executor.cores
- Multiple executors per slave

We at Mesosphere have been working with Typesafe on a Spark/Mesos integration test suite: https://github.com/typesafehub/mesos-spark-integration-tests, which passes for this PR.

The contribution is my original work and I license the work to the project under the project's open source license.

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #10993 from mgummelt/executor_sizing.
2016-02-10 10:53:33 -08:00
Davies Liu 0e5ebac3c1 [SPARK-12950] [SQL] Improve lookup of BytesToBytesMap in aggregate
This PR improve the lookup of BytesToBytesMap by:

1. Generate code for calculate the hash code of grouping keys.

2. Do not use MemoryLocation, fetch the baseObject and offset for key and value directly (remove the indirection).

Author: Davies Liu <davies@databricks.com>

Closes #11010 from davies/gen_map.
2016-02-09 16:41:21 -08:00
Andrew Or eeaf45b926 [SPARK-10620][SPARK-13054] Minor addendum to #10835
Additional changes to #10835, mainly related to style and visibility. This patch also adds back a few deprecated methods for backward compatibility.

Author: Andrew Or <andrew@databricks.com>

Closes #10958 from andrewor14/task-metrics-to-accums-followups.
2016-02-08 17:23:33 -08:00
Davies Liu 37bc203c8d [SPARK-13210][SQL] catch OOM when allocate memory and expand array
There is a bug when we try to grow the buffer, OOM is ignore wrongly (the assert also skipped by JVM), then we try grow the array again, this one will trigger spilling free the current page, the current record we inserted will be invalid.

The root cause is that JVM has less free memory than MemoryManager thought, it will OOM when allocate a page without trigger spilling. We should catch the OOM, and acquire memory again to trigger spilling.

And also, we could not grow the array in `insertRecord` of `InMemorySorter` (it was there just for easy testing).

Author: Davies Liu <davies@databricks.com>

Closes #11095 from davies/fix_expand.
2016-02-08 12:09:20 -08:00
Jakob Odersky 6883a5120c [SPARK-13171][CORE] Replace future calls with Future
Trivial search-and-replace to eliminate deprecation warnings in Scala 2.11.
Also works with 2.10

Author: Jakob Odersky <jakob@odersky.com>

Closes #11085 from jodersky/SPARK-13171.
2016-02-05 19:00:12 -08:00
Andrew Or 15205da817 [SPARK-13053][TEST] Unignore tests in InternalAccumulatorSuite
These were ignored because they are incorrectly written; they don't actually trigger stage retries, which is what the tests are testing. These tests are now rewritten to induce stage retries through fetch failures.

Note: there were 2 tests before and now there's only 1. What happened? It turns out that the case where we only resubmit a subset of of the original missing partitions is very difficult to simulate in tests without potentially introducing flakiness. This is because the `DAGScheduler` removes all map outputs associated with a given executor when this happens, and we will need multiple executors to trigger this case, and sometimes the scheduler still removes map outputs from all executors.

Author: Andrew Or <andrew@databricks.com>

Closes #10969 from andrewor14/unignore-accum-test.
2016-02-04 10:34:43 -08:00
Andrew Or 4120bcbaff [SPARK-13162] Standalone mode does not respect initial executors
Currently the Master would always set an application's initial executor limit to infinity. If the user specified `spark.dynamicAllocation.initialExecutors`, the config would not take effect. This is similar to #11047 but for standalone mode.

Author: Andrew Or <andrew@databricks.com>

Closes #11054 from andrewor14/standalone-da-initial.
2016-02-04 10:32:32 -08:00
Holden Karau 62a7c28388 [SPARK-13164][CORE] Replace deprecated synchronized buffer in core
Building with scala 2.11 results in the warning trait SynchronizedBuffer in package mutable is deprecated: Synchronization via traits is deprecated as it is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue as an alternative. Investigation shows we are already using ConcurrentLinkedQueue in other locations so switch our uses of SynchronizedBuffer to ConcurrentLinkedQueue.

Author: Holden Karau <holden@us.ibm.com>

Closes #11059 from holdenk/SPARK-13164-replace-deprecated-synchronized-buffer-in-core.
2016-02-04 10:29:38 -08:00
Alex Bozarth 3221eddb8f [SPARK-3611][WEB UI] Show number of cores for each executor in application web UI
Added a Cores column in the Executors UI

Author: Alex Bozarth <ajbozart@us.ibm.com>

Closes #11039 from ajbozarth/spark3611.
2016-02-03 15:53:10 -08:00
felixcheung 0df3cfb8ab [SPARK-12790][CORE] Remove HistoryServer old multiple files format
Removed isLegacyLogDirectory code path and updated tests
andrewor14

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10860 from felixcheung/historyserverformat.
2016-02-01 16:55:21 -08:00
Shixiong Zhu 6075573a93 [SPARK-6847][CORE][STREAMING] Fix stack overflow issue when updateStateByKey is followed by a checkpointed dstream
Add a local property to indicate if checkpointing all RDDs that are marked with the checkpoint flag, and enable it in Streaming

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10934 from zsxwing/recursive-checkpoint.
2016-02-01 11:02:17 -08:00
Andrew Or e6ceac49a3 [SPARK-13096][TEST] Fix flaky verifyPeakExecutionMemorySet
Previously we would assert things before all events are guaranteed to have been processed. To fix this, just block until all events are actually processed, i.e. until the listener queue is empty.

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/79/testReport/junit/org.apache.spark.util.collection/ExternalAppendOnlyMapSuite/spilling/

Author: Andrew Or <andrew@databricks.com>

Closes #10990 from andrewor14/accum-suite-less-flaky.
2016-01-29 17:59:41 -08:00
Andrew Or e38b0baa38 [SPARK-13055] SQLHistoryListener throws ClassCastException
This is an existing issue uncovered recently by #10835. The reason for the exception was because the `SQLHistoryListener` gets all sorts of accumulators, not just the ones that represent SQL metrics. For example, the listener gets the `internal.metrics.shuffleRead.remoteBlocksFetched`, which is an Int, then it proceeds to cast the Int to a Long, which fails.

The fix is to mark accumulators representing SQL metrics using some internal metadata. Then we can identify which ones are SQL metrics and only process those in the `SQLHistoryListener`.

Author: Andrew Or <andrew@databricks.com>

Closes #10971 from andrewor14/fix-sql-history.
2016-01-29 13:45:03 -08:00
zhuol e4c1162b6b [SPARK-10873] Support column sort and search for History Server.
[SPARK-10873] Support column sort and search for History Server using jQuery DataTable and REST API. Before this commit, the history server was generated hard-coded html and can not support search, also, the sorting was disabled if there is any application that has more than one attempt. Supporting search and sort (over all applications rather than the 20 entries in the current page) in any case will greatly improve user experience.

1. Create the historypage-template.html for displaying application information in datables.
2. historypage.js uses jQuery to access the data from /api/v1/applications REST API, and use DataTable to display each application's information. For application that has more than one attempt, the RowsGroup is used to merge such entries while at the same time supporting sort and search.
3. "duration" and "lastUpdated" rest API are added to application's "attempts".
4. External javascirpt and css files for datatables, RowsGroup and jquery plugins are added with licenses clarified.

Snapshots for how it looks like now:

History page view:
![historypage](https://cloud.githubusercontent.com/assets/11683054/12184383/89bad774-b55a-11e5-84e4-b0276172976f.png)

Search:
![search](https://cloud.githubusercontent.com/assets/11683054/12184385/8d3b94b0-b55a-11e5-869a-cc0ef0a4242a.png)

Sort by started time:
![sort-by-started-time](https://cloud.githubusercontent.com/assets/11683054/12184387/8f757c3c-b55a-11e5-98c8-577936366566.png)

Author: zhuol <zhuol@yahoo-inc.com>

Closes #10648 from zhuoliu/10873.
2016-01-29 11:54:58 -06:00
Josh Rosen 32f741115b [SPARK-13021][CORE] Fail fast when custom RDDs violate RDD.partition's API contract
Spark's `Partition` and `RDD.partitions` APIs have a contract which requires custom implementations of `RDD.partitions` to ensure that for all `x`, `rdd.partitions(x).index == x`; in other words, the `index` reported by a repartition needs to match its position in the partitions array.

If a custom RDD implementation violates this contract, then Spark has the potential to become stuck in an infinite recomputation loop when recomputing a subset of an RDD's partitions, since the tasks that are actually run will not correspond to the missing output partitions that triggered the recomputation. Here's a link to a notebook which demonstrates this problem: 5e8a5aa8d2/Violating%2520RDD.partitions%2520contract.html

In order to guard against this infinite loop behavior, this patch modifies Spark so that it fails fast and refuses to compute RDDs' whose `partitions` violate the API contract.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10932 from JoshRosen/SPARK-13021.
2016-01-27 13:27:32 -08:00
Andrew Or 87abcf7df9 [SPARK-12895][SPARK-12896] Migrate TaskMetrics to accumulators
The high level idea is that instead of having the executors send both accumulator updates and TaskMetrics, we should have them send only accumulator updates. This eliminates the need to maintain both code paths since one can be implemented in terms of the other. This effort is split into two parts:

**SPARK-12895: Implement TaskMetrics using accumulators.** TaskMetrics is basically just a bunch of accumulable fields. This patch makes TaskMetrics a syntactic wrapper around a collection of accumulators so we don't need to send TaskMetrics from the executors to the driver.

**SPARK-12896: Send only accumulator updates to the driver.** Now that TaskMetrics are expressed in terms of accumulators, we can capture all TaskMetrics values if we just send accumulator updates from the executors to the driver. This completes the parent issue SPARK-10620.

While an effort has been made to preserve as much of the public API as possible, there were a few known breaking DeveloperApi changes that would be very awkward to maintain. I will gather the full list shortly and post it here.

Note: This was once part of #10717. This patch is split out into its own patch from there to make it easier for others to review. Other smaller pieces of already been merged into master.

Author: Andrew Or <andrew@databricks.com>

Closes #10835 from andrewor14/task-metrics-use-accums.
2016-01-27 11:15:48 -08:00
Sean Owen 649e9d0f5b [SPARK-3369][CORE][STREAMING] Java mapPartitions Iterator->Iterable is inconsistent with Scala's Iterator->Iterator
Fix Java function API methods for flatMap and mapPartitions to require producing only an Iterator, not Iterable. Also fix DStream.flatMap to require a function producing TraversableOnce only, not Traversable.

CC rxin pwendell for API change; tdas since it also touches streaming.

Author: Sean Owen <sowen@cloudera.com>

Closes #10413 from srowen/SPARK-3369.
2016-01-26 11:55:28 +00:00
Alex Bozarth c037d25482 [SPARK-12149][WEB UI] Executor UI improvement suggestions - Color UI
Added color coding to the Executors page for Active Tasks, Failed Tasks, Completed Tasks and Task Time.

Active Tasks is shaded blue with it's range based on percentage of total cores used.
Failed Tasks is shaded red ranging over the first 10% of total tasks failed
Completed Tasks is shaded green ranging over 10% of total tasks including failed and active tasks, but only when there are active or failed tasks on that executor.
Task Time is shaded red when GC Time goes over 10% of total time with it's range directly corresponding to the percent of total time.

Author: Alex Bozarth <ajbozart@us.ibm.com>

Closes #10154 from ajbozarth/spark12149.
2016-01-25 14:42:44 -06:00
Shixiong Zhu bc1babd63d [SPARK-7997][CORE] Remove Akka from Spark Core and Streaming
- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
- Remove HttpFileServer
- Remove Akka configs from SparkConf and SSLOptions
- Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult`  depends on it.
- Update comments and docs

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10854 from zsxwing/remove-akka.
2016-01-22 21:20:04 -08:00
scwf 43f1d59e17 [SPARK-2750][WEB UI] Add https support to the Web UI
Author: scwf <wangfei1@huawei.com>
Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: WangTaoTheTonic <wangtao111@huawei.com>
Author: w00228970 <wangfei1@huawei.com>

Closes #10238 from vanzin/SPARK-2750.
2016-01-19 14:49:55 -08:00
Andrew Or b122c861cd [SPARK-12887] Do not expose var's in TaskMetrics
This is a step in implementing SPARK-10620, which migrates TaskMetrics to accumulators.

TaskMetrics has a bunch of var's, some are fully public, some are `private[spark]`. This is bad coding style that makes it easy to accidentally overwrite previously set metrics. This has happened a few times in the past and caused bugs that were difficult to debug.

Instead, we should have get-or-create semantics, which are more readily understandable. This makes sense in the case of TaskMetrics because these are just aggregated metrics that we want to collect throughout the task, so it doesn't matter who's incrementing them.

Parent PR: #10717

Author: Andrew Or <andrew@databricks.com>
Author: Josh Rosen <joshrosen@databricks.com>
Author: andrewor14 <andrew@databricks.com>

Closes #10815 from andrewor14/get-or-create-metrics.
2016-01-19 10:58:51 -08:00
Andrew Or 2b5d11f34d [SPARK-12885][MINOR] Rename 3 fields in ShuffleWriteMetrics
This is a small step in implementing SPARK-10620, which migrates TaskMetrics to accumulators. This patch is strictly a cleanup patch and introduces no change in functionality. It literally just renames 3 fields for consistency. Today we have:

```
inputMetrics.recordsRead
outputMetrics.bytesWritten
shuffleReadMetrics.localBlocksFetched
...
shuffleWriteMetrics.shuffleRecordsWritten
shuffleWriteMetrics.shuffleBytesWritten
shuffleWriteMetrics.shuffleWriteTime
```

The shuffle write ones are kind of redundant. We can drop the `shuffle` part in the method names. I added backward compatible (but deprecated) methods with the old names.

Parent PR: #10717

Author: Andrew Or <andrew@databricks.com>

Closes #10811 from andrewor14/rename-things.
2016-01-18 19:22:29 -08:00
Josh Rosen b8cb548a43 [SPARK-10985][CORE] Avoid passing evicted blocks throughout BlockManager
This patch refactors portions of the BlockManager and CacheManager in order to avoid having to pass `evictedBlocks` lists throughout the code. It appears that these lists were only consumed by `TaskContext.taskMetrics`, so the new code now directly updates the metrics from the lower-level BlockManager methods.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10776 from JoshRosen/SPARK-10985.
2016-01-18 13:34:12 -08:00
Reynold Xin ad1503f92e [SPARK-12667] Remove block manager's internal "external block store" API
This pull request removes the external block store API. This is rarely used, and the file system interface is actually a better, more standard way to interact with external storage systems.

There are some other things to remove also, as pointed out by JoshRosen. We will do those as follow-up pull requests.

Author: Reynold Xin <rxin@databricks.com>

Closes #10752 from rxin/remove-offheap.
2016-01-15 12:03:28 -08:00
Koyo Yoshida 32cca93354 [SPARK-12708][UI] Sorting task error in Stages Page when yarn mode.
If sort column contains slash(e.g. "Executor ID / Host") when yarn mode,sort fail with following message.

![spark-12708](https://cloud.githubusercontent.com/assets/6679275/12193320/80814f8c-b62a-11e5-9914-7bf3907029df.png)

It's similar to SPARK-4313 .

Author: root <root@R520T1.(none)>
Author: Koyo Yoshida <koyo0615@gmail.com>

Closes #10663 from yoshidakuy/SPARK-12708.
2016-01-15 13:32:47 +09:00
Josh Rosen 25782981cf [SPARK-12174] Speed up BlockManagerSuite getRemoteBytes() test
This patch significantly speeds up the BlockManagerSuite's "SPARK-9591: getRemoteBytes from another location when Exception throw" test, reducing the test time from 45s to ~250ms. The key change was to set `spark.shuffle.io.maxRetries` to 0 (the code previously set `spark.network.timeout` to `2s`, but this didn't make a difference because the slowdown was not due to this timeout).

Along the way, I also cleaned up the way that we handle SparkConf in BlockManagerSuite: previously, each test would mutate a shared SparkConf instance, while now each test gets a fresh SparkConf.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10759 from JoshRosen/SPARK-12174.
2016-01-14 17:37:27 -08:00
Bryan Cutler 56cdbd654d [SPARK-9844][CORE] File appender race condition during shutdown
When an Executor process is destroyed, the FileAppender that is asynchronously reading the stderr stream of the process can throw an IOException during read because the stream is closed.  Before the ExecutorRunner destroys the process, the FileAppender thread is flagged to stop.  This PR wraps the inputStream.read call of the FileAppender in a try/catch block so that if an IOException is thrown and the thread has been flagged to stop, it will safely ignore the exception.  Additionally, the FileAppender thread was changed to use Utils.tryWithSafeFinally to better log any exception that do occur.  Added unit tests to verify a IOException is thrown and logged if FileAppender is not flagged to stop, and that no IOException when the flag is set.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #10714 from BryanCutler/file-appender-read-ioexception-SPARK-9844.
2016-01-14 10:59:02 +00:00