Commit graph

5201 commits

Author SHA1 Message Date
Shixiong Zhu ad615291fe [SPARK-13519][CORE] Driver should tell Executor to stop itself when cleaning executor's state
## What changes were proposed in this pull request?

When the driver removes an executor's state, the connection between the driver and the executor may be still alive so that the executor cannot exit automatically (E.g., Master will send RemoveExecutor when a work is lost but the executor is still alive), so the driver should try to tell the executor to stop itself. Otherwise, we will leak an executor.

This PR modified the driver to send `StopExecutor` to the executor when it's removed.

## How was this patch tested?

manual test: increase the worker heartbeat interval to force it's always timeout and the leak executors are gone.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #11399 from zsxwing/SPARK-13519.
2016-02-26 15:11:57 -08:00
Reynold Xin 391755dc6e [SPARK-13465] Add a task failure listener to TaskContext
## What changes were proposed in this pull request?

TaskContext supports task completion callback, which gets called regardless of task failures. However, there is no way for the listener to know if there is an error. This patch adds a new listener that gets called when a task fails.

## How was the this patch tested?
New unit test case and integration test case covering the code path

Author: Reynold Xin <rxin@databricks.com>

Closes #11340 from rxin/SPARK-13465.
2016-02-26 12:49:16 -08:00
Josh Rosen 633d63a48a [SPARK-12757] Add block-level read/write locks to BlockManager
## Motivation

As a pre-requisite to off-heap caching of blocks, we need a mechanism to prevent pages / blocks from being evicted while they are being read. With on-heap objects, evicting a block while it is being read merely leads to memory-accounting problems (because we assume that an evicted block is a candidate for garbage-collection, which will not be true during a read), but with off-heap memory this will lead to either data corruption or segmentation faults.

## Changes

### BlockInfoManager and reader/writer locks

This patch adds block-level read/write locks to the BlockManager. It introduces a new `BlockInfoManager` component, which is contained within the `BlockManager`, holds the `BlockInfo` objects that the `BlockManager` uses for tracking block metadata, and exposes APIs for locking blocks in either shared read or exclusive write modes.

`BlockManager`'s `get*()` and `put*()` methods now implicitly acquire the necessary locks. After a `get()` call successfully retrieves a block, that block is locked in a shared read mode. A `put()` call will block until it acquires an exclusive write lock. If the write succeeds, the write lock will be downgraded to a shared read lock before returning to the caller. This `put()` locking behavior allows us store a block and then immediately turn around and read it without having to worry about it having been evicted between the write and the read, which will allow us to significantly simplify `CacheManager` in the future (see #10748).

See `BlockInfoManagerSuite`'s test cases for a more detailed specification of the locking semantics.

### Auto-release of locks at the end of tasks

Our locking APIs support explicit release of locks (by calling `unlock()`), but it's not always possible to guarantee that locks will be released prior to the end of the task. One reason for this is our iterator interface: since our iterators don't support an explicit `close()` operator to signal that no more records will be consumed, operations like `take()` or `limit()` don't have a good means to release locks on their input iterators' blocks. Another example is broadcast variables, whose block locks can only be released at the end of the task.

To address this, `BlockInfoManager` uses a pair of maps to track the set of locks acquired by each task. Lock acquisitions automatically record the current task attempt id by obtaining it from `TaskContext`. When a task finishes, code in `Executor` calls `BlockInfoManager.unlockAllLocksForTask(taskAttemptId)` to free locks.

### Locking and the MemoryStore

In order to prevent in-memory blocks from being evicted while they are being read, the `MemoryStore`'s `evictBlocksToFreeSpace()` method acquires write locks on blocks which it is considering as candidates for eviction. These lock acquisitions are non-blocking, so a block which is being read will not be evicted. By holding write locks until the eviction is performed or skipped (in case evicting the blocks would not free enough memory), we avoid a race where a new reader starts to read a block after the block has been marked as an eviction candidate but before it has been removed.

### Locking and remote block transfer

This patch makes small changes to to block transfer and network layer code so that locks acquired by the BlockTransferService are released as soon as block transfer messages are consumed and released by Netty. This builds on top of #11193, a bug fix related to freeing of network layer ManagedBuffers.

## FAQ

- **Why not use Java's built-in [`ReadWriteLock`](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html)?**

  Our locks operate on a per-task rather than per-thread level. Under certain circumstances a task may consist of multiple threads, so using `ReadWriteLock` would mean that we might call `unlock()` from a thread which didn't hold the lock in question, an operation which has undefined semantics. If we could rely on Java 8 classes, we might be able to use [`StampedLock`](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/StampedLock.html) to work around this issue.

- **Why not detect "leaked" locks in tests?**:

  See above notes about `take()` and `limit`.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10705 from JoshRosen/pin-pages.
2016-02-25 17:17:56 -08:00
Josh Rosen f2cfafdfe0 [SPARK-13501] Remove use of Guava Stopwatch
Our nightly doc snapshot builds are failing due to some issue involving the Guava Stopwatch constructor:

```
[error] /home/jenkins/workspace/spark-master-docs/spark/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala:496: constructor Stopwatch in class Stopwatch cannot be accessed in class CoarseMesosSchedulerBackend
[error]     val stopwatch = new Stopwatch()
[error]                     ^
```

This Stopwatch constructor was deprecated in newer versions of Guava (fd0cbc2c5c) and it's possible that some classpath issues affecting Unidoc could be causing this to trigger compilation failures.

In order to work around this issue, this patch removes this use of Stopwatch since we don't use it anywhere else in the Spark codebase.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #11376 from JoshRosen/remove-stopwatch.
2016-02-25 17:04:43 -08:00
Liwei Lin dc6c5ea4c9 [SPARK-13468][WEB UI] Fix a corner case where the Stage UI page should show DAG but it doesn't show
When uses clicks more than one time on any stage in the DAG graph on the *Job* web UI page, many new *Stage* web UI pages are opened, but only half of their DAG graphs are expanded.

After this PR's fix, every newly opened *Stage* page's DAG graph is expanded.

Before:
![](https://cloud.githubusercontent.com/assets/15843379/13279144/74808e86-db10-11e5-8514-cecf31af8908.png)

After:
![](https://cloud.githubusercontent.com/assets/15843379/13279145/77ca5dec-db10-11e5-9457-8e1985461328.png)

## What changes were proposed in this pull request?

- Removed the `expandDagViz` parameter for _Stage_ page and related codes
- Added a `onclick` function setting `expandDagVizArrowKey(false)` as `true`

## How was this patch tested?

Manual tests (with this fix) to verified this fix work:
- clicked many times on _Job_ Page's DAG Graph → each newly opened Stage page's DAG graph is expanded

Manual tests (with this fix) to verified this fix do not break features we already had:
- refreshed many times for a same _Stage_ page (whose DAG already expanded) → DAG remained expanded upon every refresh
- refreshed many times for a same _Stage_ page (whose DAG unexpanded) → DAG remained unexpanded upon every refresh
- refreshed many times for a same _Job_ page (whose DAG already expanded) → DAG remained expanded upon every refresh
- refreshed many times for a same _Job_ page (whose DAG unexpanded) → DAG remained unexpanded upon every refresh

Author: Liwei Lin <proflin.me@gmail.com>

Closes #11368 from proflin/SPARK-13468.
2016-02-25 15:36:25 -08:00
Shixiong Zhu 46f6e79316 Revert "[SPARK-13117][WEB UI] WebUI should use the local ip not 0.0.0.0"
This reverts commit 2e44031faf.
2016-02-25 11:39:26 -08:00
Devaraj K 2e44031faf [SPARK-13117][WEB UI] WebUI should use the local ip not 0.0.0.0
Fixed the HTTP Server Host Name/IP issue i.e. HTTP Server to take the
configured host name/IP and not '0.0.0.0' always.

Author: Devaraj K <devaraj@apache.org>

Closes #11133 from devaraj-kavali/SPARK-13117.
2016-02-25 12:18:43 +00:00
Wenchen Fan a60f91284c [SPARK-13467] [PYSPARK] abstract python function to simplify pyspark code
## What changes were proposed in this pull request?

When we pass a Python function to JVM side, we also need to send its context, e.g. `envVars`, `pythonIncludes`, `pythonExec`, etc. However, it's annoying to pass around so many parameters at many places. This PR abstract python function along with its context, to simplify some pyspark code and make the logic more clear.

## How was the this patch tested?

by existing unit tests.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #11342 from cloud-fan/python-clean.
2016-02-24 12:44:54 -08:00
Daniel Jalova bcfd55fa98 [SPARK-12759][Core][Spark should fail fast if --executor-memory is too small for spark to start]
Added an exception to be thrown in UnifiedMemoryManager.scala if the configuration given for executor memory is too low. Also modified the exception message thrown when driver memory is too low.

This patch was tested manually by passing in config options to Spark shell. I also added a test in UnifiedMemoryManagerSuite.scala

Author: Daniel Jalova <djalova@us.ibm.com>

Closes #11255 from djalova/SPARK-12759.
2016-02-24 12:15:11 +00:00
Davies Liu 9cdd867da9 [SPARK-13373] [SQL] generate sort merge join
## What changes were proposed in this pull request?

Generates code for SortMergeJoin.

## How was the this patch tested?

Unit tests and manually tested with TPCDS Q72, which showed 70% performance improvements (from 42s to 25s), but micro benchmark only show minor improvements, it may depends the distribution of data and number of columns.

Author: Davies Liu <davies@databricks.com>

Closes #11248 from davies/gen_smj.
2016-02-23 15:00:10 -08:00
Lianhui Wang 9f4263392e [SPARK-7729][UI] Executor which has been killed should also be displayed on Executor Tab
andrewor14 squito Dead Executors should also be displayed on Executor Tab.
as following:
![image](https://cloud.githubusercontent.com/assets/545478/11492707/ae55d7f6-982b-11e5-919a-b62cd84684b2.png)

Author: Lianhui Wang <lianhuiwang09@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Andrew Or <andrew@databricks.com>

Closes #10058 from lianhuiwang/SPARK-7729.
2016-02-23 11:08:39 -08:00
zhuol 4d1e5f92e1 [SPARK-13364] Sort appId as num rather than str in history page.
## What changes were proposed in this pull request?

History page now sorts the appID as a string, which can lead to unexpected order for the case "application_11111_9" and "application_11111_20".
Add a new sort type called appId-numeric can fix it.

## How was the this patch tested?
This patch was manually tested with UI. See the screenshot below:
![sortappidbetter](https://cloud.githubusercontent.com/assets/11683054/13185564/7f941a16-d707-11e5-8fb7-0316368d3030.png)

Author: zhuol <zhuol@yahoo-inc.com>

Closes #11259 from zhuoliu/13364.
2016-02-23 11:16:42 -06:00
Liang-Chi Hsieh 87d7f8904a [SPARK-13358] [SQL] Retrieve grep path when do benchmark
JIRA: https://issues.apache.org/jira/browse/SPARK-13358

When trying to run a benchmark, I found that on my Ubuntu linux grep is not in /usr/bin/ but /bin/. So wondering if it is better to use which to retrieve grep path.

cc davies

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #11231 from viirya/benchmark-grep-path.
2016-02-23 07:56:08 -08:00
jerryshao e99d017098 [SPARK-13220][CORE] deprecate yarn-client and yarn-cluster mode
Author: jerryshao <sshao@hortonworks.com>

Closes #11229 from jerryshao/SPARK-13220.
2016-02-23 12:30:57 +00:00
Shixiong Zhu a11b399519 [SPARK-13298][CORE][UI] Escape "label" to avoid DAG being broken by some special character
## What changes were proposed in this pull request?

When there are some special characters (e.g., `"`, `\`) in `label`, DAG will be broken. This patch just escapes `label` to avoid DAG being broken by some special characters

## How was the this patch tested?

Jenkins tests

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #11309 from zsxwing/SPARK-13298.
2016-02-22 17:42:30 -08:00
Reynold Xin 4a91806a45 [SPARK-13413] Remove SparkContext.metricsSystem
## What changes were proposed in this pull request?

This patch removes SparkContext.metricsSystem. SparkContext.metricsSystem returns MetricsSystem, which is a private class. I think it was added by accident.

In addition, I also removed an unused private[spark] method schedulerBackend setter.

## How was the this patch tested?

N/A.

Author: Reynold Xin <rxin@databricks.com>

This patch had conflicts when merged, resolved by
Committer: Josh Rosen <joshrosen@databricks.com>

Closes #11282 from rxin/SPARK-13413.
2016-02-22 14:01:35 -08:00
Timothy Chen 00461bb911 [SPARK-10749][MESOS] Support multiple roles with mesos cluster mode.
Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler.

https://issues.apache.org/jira/browse/SPARK-10749

Author: Timothy Chen <tnachen@gmail.com>

Closes #8872 from tnachen/cluster_multi_roles.
2016-02-22 11:11:33 -08:00
Dongjoon Hyun 024482bf51 [MINOR][DOCS] Fix all typos in markdown files of doc and similar patterns in other comments
## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.
2016-02-22 09:52:07 +00:00
jerryshao 39ff154570 [SPARK-13426][CORE] Remove the support of SIMR
## What changes were proposed in this pull request?

This PR removes the support of SIMR, since SIMR is not actively used and maintained for a long time, also is not supported from `SparkSubmit`, so here propose to remove it.

## How was the this patch tested?

This patch is tested locally by running unit tests.

Author: jerryshao <sshao@hortonworks.com>

Closes #11296 from jerryshao/SPARK-13426.
2016-02-22 00:57:10 -08:00
Shixiong Zhu dfb2ae2f14 [SPARK-13408] [CORE] Ignore errors when it's already reported in JobWaiter
## What changes were proposed in this pull request?

`JobWaiter.taskSucceeded` will be called for each task. When `resultHandler` throws an exception, `taskSucceeded` will also throw it for each task. DAGScheduler just catches it and reports it like this:
```Scala
                  try {
                    job.listener.taskSucceeded(rt.outputId, event.result)
                  } catch {
                    case e: Exception =>
                      // TODO: Perhaps we want to mark the resultStage as failed?
                      job.listener.jobFailed(new SparkDriverExecutionException(e))
                  }
```
Therefore `JobWaiter.jobFailed` may be called multiple times.

So `JobWaiter.jobFailed` should use `Promise.tryFailure` instead of `Promise.failure` because the latter one doesn't support calling multiple times.

## How was the this patch tested?

Jenkins tests.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #11280 from zsxwing/SPARK-13408.
2016-02-19 23:00:08 -08:00
Josh Rosen 983fa2d620 [SPARK-13407] Guard against garbage-collected accumulators in TaskMetrics.fromAccumulatorUpdates
`TaskMetrics.fromAccumulatorUpdates()` can fail if accumulators have been garbage-collected on the driver. To guard against this, this patch introduces `ListenerTaskMetrics`, a subclass of `TaskMetrics` which is used only in `TaskMetrics.fromAccumulatorUpdates()` and which eliminates the need to access the original accumulators on the driver.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #11276 from JoshRosen/accum-updates-fix.
2016-02-19 15:57:23 -08:00
Sean Owen fb7e21797e [SPARK-13339][DOCS] Clarify commutative / associative operator requirements for reduce, fold
Clarify that reduce functions need to be commutative, and fold functions do not

See https://github.com/apache/spark/pull/11091

Author: Sean Owen <sowen@cloudera.com>

Closes #11217 from srowen/SPARK-13339.
2016-02-19 10:26:38 +00:00
Sean Owen 78562535fe [SPARK-13371][CORE][STRING] TaskSetManager.dequeueSpeculativeTask compares Option and String directly.
## What changes were proposed in this pull request?

Fix some comparisons between unequal types that cause IJ warnings and in at least one case a likely bug (TaskSetManager)

## How was the this patch tested?

Running Jenkins tests

Author: Sean Owen <sowen@cloudera.com>

Closes #11253 from srowen/SPARK-13371.
2016-02-18 12:14:30 -08:00
Andrew Or 9451fed52c [SPARK-13344][TEST] Fix harmless accumulator not found exceptions
See [JIRA](https://issues.apache.org/jira/browse/SPARK-13344) for more detail. This was caused by #10835.

Author: Andrew Or <andrew@databricks.com>

Closes #11222 from andrewor14/fix-test-accum-exceptions.
2016-02-17 16:17:20 -08:00
Sital Kedia 1e1e31e03d [SPARK-13279] Remove O(n^2) operation from scheduler.
This commit removes an unnecessary duplicate check in addPendingTask that meant
that scheduling a task set took time proportional to (# tasks)^2.

Author: Sital Kedia <skedia@fb.com>

Closes #11175 from sitalkedia/fix_stuck_driver.
2016-02-16 22:27:39 -08:00
Claes Redestad 22e9723d62 [SPARK-13278][CORE] Launcher fails to start with JDK 9 EA
See http://openjdk.java.net/jeps/223 for more information about the JDK 9 version string scheme.

Author: Claes Redestad <claes.redestad@gmail.com>

Closes #11160 from cl4es/master.
2016-02-14 11:49:37 +00:00
Sean Owen 388cd9ea8d [SPARK-13172][CORE][SQL] Stop using RichException.getStackTrace it is deprecated
Replace `getStackTraceString` with `Utils.exceptionString`

Author: Sean Owen <sowen@cloudera.com>

Closes #11182 from srowen/SPARK-13172.
2016-02-13 21:05:48 -08:00
markpavey 374c4b2869 [SPARK-13142][WEB UI] Problem accessing Web UI /logPage/ on Microsoft Windows
Due to being on a Windows platform I have been unable to run the tests as described in the "Contributing to Spark" instructions. As the change is only to two lines of code in the Web UI, which I have manually built and tested, I am submitting this pull request anyway. I hope this is OK.

Is it worth considering also including this fix in any future 1.5.x releases (if any)?

I confirm this is my own original work and license it to the Spark project under its open source license.

Author: markpavey <mark.pavey@thefilter.com>

Closes #11135 from markpavey/JIRA_SPARK-13142_WindowsWebUILogFix.
2016-02-13 08:39:43 +00:00
Michael Gummelt 62b1c07e7e [SPARK-5095] remove flaky test
Overrode the start() method, which was previously starting a thread causing a race condition. I believe this should fix the flaky test.

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #11164 from mgummelt/fix_mesos_tests.
2016-02-12 15:00:39 -08:00
Michael Gummelt 38bc6018e9 [SPARK-5095] Fix style in mesos coarse grained scheduler code
andrewor14 This addressed your style comments from #10993

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #11187 from mgummelt/fix_mesos_style.
2016-02-12 14:57:31 -08:00
Sanket 894921d813 [SPARK-6166] Limit number of in flight outbound requests
This JIRA is related to
https://github.com/apache/spark/pull/5852
Had to do some minor rework and test to make sure it
works with current version of spark.

Author: Sanket <schintap@untilservice-lm>

Closes #10838 from redsanket/limit-outbound-connections.
2016-02-11 22:40:00 -08:00
Steve Loughran a2c7dcf61f [SPARK-7889][WEBUI] HistoryServer updates UI for incomplete apps
When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available.  It does this by checking if a version of the app has been loaded with a larger *filesize*.  If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI.

https://issues.apache.org/jira/browse/SPARK-7889

Author: Steve Loughran <stevel@hortonworks.com>
Author: Imran Rashid <irashid@cloudera.com>

Closes #11118 from squito/SPARK-7889-alternate.
2016-02-11 21:37:53 -06:00
Reynold Xin c86009ceb9 Revert "[SPARK-13279] Remove O(n^2) operation from scheduler."
This reverts commit 50fa6fd1b3.
2016-02-11 13:31:13 -08:00
Sital Kedia 50fa6fd1b3 [SPARK-13279] Remove O(n^2) operation from scheduler.
This commit removes an unnecessary duplicate check in addPendingTask that meant
that scheduling a task set took time proportional to (# tasks)^2.

Author: Sital Kedia <skedia@fb.com>

Closes #11167 from sitalkedia/fix_stuck_driver and squashes the following commits:

3fe1af8 [Sital Kedia] [SPARK-13279] Remove unnecessary duplicate check in addPendingTask function
2016-02-11 13:28:14 -08:00
Alex Bozarth 13c17cbb05 [SPARK-13124][WEB UI] Fixed CSS and JS issues caused by addition of JQuery DataTables
Made sure the old tables continue to use the old css and the new DataTables use the new css. Also fixed it so the Safari Web Inspector doesn't throw errors when on the new DataTables pages.

Author: Alex Bozarth <ajbozart@us.ibm.com>

Closes #11038 from ajbozarth/spark13124.
2016-02-11 08:50:27 -06:00
Junyang f9ae99fee1 [SPARK-13074][CORE] Add JavaSparkContext. getPersistentRDDs method
The "getPersistentRDDs()" is a useful API of SparkContext to get cached RDDs. However, the JavaSparkContext does not have this API.

Add a simple getPersistentRDDs() to get java.util.Map<Integer, JavaRDD> for Java users.

Author: Junyang <fly.shenjy@gmail.com>

Closes #10978 from flyjy/master.
2016-02-11 09:33:11 +00:00
Sean Owen 29c547303f [SPARK-12414][CORE] Remove closure serializer
Remove spark.closure.serializer option and use JavaSerializer always

CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be.

Author: Sean Owen <sowen@cloudera.com>

Closes #11150 from srowen/SPARK-12414.
2016-02-10 13:34:53 -08:00
zhuol 4b80026f07 [SPARK-13126] fix the right margin of history page.
The right margin of the history page is little bit off. A simple fix for that issue.

Author: zhuol <zhuol@yahoo-inc.com>

Closes #11029 from zhuoliu/13126.
2016-02-10 14:23:41 -06:00
Alex Bozarth 39cc620e9c [SPARK-13163][WEB UI] Column width on new History Server DataTables not getting set correctly
The column width for the new DataTables now adjusts for the current page rather than being hard-coded for the entire table's data.

Author: Alex Bozarth <ajbozart@us.ibm.com>

Closes #11057 from ajbozarth/spark13163.
2016-02-10 14:07:50 -06:00
Michael Gummelt 80cb963ad9 [SPARK-5095][MESOS] Support launching multiple mesos executors in coarse grained mesos mode.
This is the next iteration of tnachen's previous PR: https://github.com/apache/spark/pull/4027

In that PR, we resolved with andrewor14 and pwendell to implement the Mesos scheduler's support of `spark.executor.cores` to be consistent with YARN and Standalone.  This PR implements that resolution.

This PR implements two high-level features.  These two features are co-dependent, so they're implemented both here:
- Mesos support for spark.executor.cores
- Multiple executors per slave

We at Mesosphere have been working with Typesafe on a Spark/Mesos integration test suite: https://github.com/typesafehub/mesos-spark-integration-tests, which passes for this PR.

The contribution is my original work and I license the work to the project under the project's open source license.

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #10993 from mgummelt/executor_sizing.
2016-02-10 10:53:33 -08:00
Sean Owen c0b71e0b8f [SPARK-9307][CORE][SPARK] Logging: Make it either stable or private
Make Logging private[spark]. Pretty much all there is to it.

Author: Sean Owen <sowen@cloudera.com>

Closes #11103 from srowen/SPARK-9307.
2016-02-10 11:02:00 +00:00
Davies Liu 0e5ebac3c1 [SPARK-12950] [SQL] Improve lookup of BytesToBytesMap in aggregate
This PR improve the lookup of BytesToBytesMap by:

1. Generate code for calculate the hash code of grouping keys.

2. Do not use MemoryLocation, fetch the baseObject and offset for key and value directly (remove the indirection).

Author: Davies Liu <davies@databricks.com>

Closes #11010 from davies/gen_map.
2016-02-09 16:41:21 -08:00
Shixiong Zhu fae830d158 [SPARK-13245][CORE] Call shuffleMetrics methods only in one thread for ShuffleBlockFetcherIterator
Call shuffleMetrics's incRemoteBytesRead and incRemoteBlocksFetched when polling FetchResult from `results` so as to always use shuffleMetrics in one thread.

Also fix a race condition that could cause memory leak.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #11138 from zsxwing/SPARK-13245.
2016-02-09 16:31:00 -08:00
Wenchen Fan 7fe4fe630a [SPARK-12888] [SQL] [FOLLOW-UP] benchmark the new hash expression
Adds the benchmark results as comments.

The codegen version is slower than the interpreted version for `simple` case becasue of 3 reasons:

1. codegen version use a more complex hash algorithm than interpreted version, i.e. `Murmur3_x86_32.hashInt` vs [simple multiplication and addition](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L153).
2. codegen version will write the hash value to a row first and then read it out. I tried to create a `GenerateHasher` that can generate code to return hash value directly and got about 60% speed up for the `simple` case, does it worth?
3. the row in `simple` case only has one int field, so the runtime reflection may be removed because of branch prediction, which makes the interpreted version faster.

The `array` case is also slow for similar reasons, e.g. array elements are of same type, so interpreted version can probably get rid of runtime reflection by branch prediction.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #10917 from cloud-fan/hash-benchmark.
2016-02-09 13:06:36 -08:00
Jakob Odersky f9307d8fc5 [SPARK-13176][CORE] Use native file linking instead of external process ln
Since Spark requires at least JRE 1.7, it is safe to use built-in java.nio.Files.

Author: Jakob Odersky <jakob@odersky.com>

Closes #11098 from jodersky/SPARK-13176.
2016-02-09 08:43:46 +00:00
Andrew Or eeaf45b926 [SPARK-10620][SPARK-13054] Minor addendum to #10835
Additional changes to #10835, mainly related to style and visibility. This patch also adds back a few deprecated methods for backward compatibility.

Author: Andrew Or <andrew@databricks.com>

Closes #10958 from andrewor14/task-metrics-to-accums-followups.
2016-02-08 17:23:33 -08:00
Davies Liu 37bc203c8d [SPARK-13210][SQL] catch OOM when allocate memory and expand array
There is a bug when we try to grow the buffer, OOM is ignore wrongly (the assert also skipped by JVM), then we try grow the array again, this one will trigger spilling free the current page, the current record we inserted will be invalid.

The root cause is that JVM has less free memory than MemoryManager thought, it will OOM when allocate a page without trigger spilling. We should catch the OOM, and acquire memory again to trigger spilling.

And also, we could not grow the array in `insertRecord` of `InMemorySorter` (it was there just for easy testing).

Author: Davies Liu <davies@databricks.com>

Closes #11095 from davies/fix_expand.
2016-02-08 12:09:20 -08:00
Tommy YU 81da3bee66 [SPARK-5865][API DOC] Add doc warnings for methods that return local data structures
rxin srowen
I work out note message for rdd.take function, please help to review.

If it's fine, I can apply to all other function later.

Author: Tommy YU <tummyyu@163.com>

Closes #10874 from Wenpei/spark-5865-add-warning-for-localdatastructure.
2016-02-06 17:29:09 +00:00
Davies Liu 4f28291f85 [HOTFIX] fix float part of avgRate 2016-02-05 22:40:40 -08:00
Jakob Odersky 6883a5120c [SPARK-13171][CORE] Replace future calls with Future
Trivial search-and-replace to eliminate deprecation warnings in Scala 2.11.
Also works with 2.10

Author: Jakob Odersky <jakob@odersky.com>

Closes #11085 from jodersky/SPARK-13171.
2016-02-05 19:00:12 -08:00