Commit graph

4002 commits

Author SHA1 Message Date
zsxwing 440ea31b76 [SPARK-6621][Core] Fix the bug that calling EventLoop.stop in EventLoop.onReceive/onError/onStart doesn't call onStop
Author: zsxwing <zsxwing@gmail.com>

Closes #5280 from zsxwing/SPARK-6621 and squashes the following commits:

521125e [zsxwing] Fix the bug that calling EventLoop.stop in EventLoop.onReceive and EventLoop.onError doesn't call onStop
2015-04-02 22:54:30 -07:00
KaiXinXiaoLei 8a0aa81ca3 [CORE] The descriptionof jobHistory config should be spark.history.fs.logDirectory
The config option  is spark.history.fs.logDirectory, not spark.fs.history.logDirectory. So the descriptionof  should be changed. Thanks.

Author: KaiXinXiaoLei <huleilei1@huawei.com>

Closes #5332 from KaiXinXiaoLei/historyConfig and squashes the following commits:

5ffbfb5 [KaiXinXiaoLei] the describe of jobHistory config is error
2015-04-02 20:24:31 -07:00
Marcelo Vanzin 45134ec920 [SPARK-6650] [core] Stop ExecutorAllocationManager when context stops.
This fixes the thread leak. I also changed the unit test to keep track
of allocated contexts and make sure they're closed after tests are
run; this is needed since some tests use this pattern:

    val sc = createContext()
    doSomethingThatMayThrow()
    sc.stop()

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #5311 from vanzin/SPARK-6650 and squashes the following commits:

652c73b [Marcelo Vanzin] Nits.
5711512 [Marcelo Vanzin] More exception safety.
cc5a744 [Marcelo Vanzin] Stop alloc manager before scheduler.
9886f69 [Marcelo Vanzin] [SPARK-6650] [core] Stop ExecutorAllocationManager when context stops.
2015-04-02 19:48:55 -07:00
Hung Lin e3202aa2e9 SPARK-6414: Spark driver failed with NPE on job cancelation
Use Option for ActiveJob.properties to avoid NPE bug

Author: Hung Lin <hung.lin@gmail.com>

Closes #5124 from hunglin/SPARK-6414 and squashes the following commits:

2290b6b [Hung Lin] [SPARK-6414][core] Fix NPE in SparkContext.cancelJobGroup()
2015-04-02 14:01:43 -07:00
Davies Liu 0cce5451ad [SPARK-6667] [PySpark] remove setReuseAddress
The reused address on server side had caused the server can not acknowledge the connected connections, remove it.

This PR will retry once after timeout, it also add a timeout at client side.

Author: Davies Liu <davies@databricks.com>

Closes #5324 from davies/collect_hang and squashes the following commits:

e5a51a2 [Davies Liu] remove setReuseAddress
7977c2f [Davies Liu] do retry on client side
b838f35 [Davies Liu] retry after timeout
2015-04-02 12:18:33 -07:00
Patrick Wendell 6562787b96 [SPARK-6627] Some clean-up in shuffle code.
Before diving into review #4450 I did a look through the existing shuffle
code to learn how it works. Unfortunately, there are some very
confusing things in this code. This patch makes a few small changes
to simplify things. It is not easily to concisely describe the changes
because of how convoluted the issues were, but they are fairly small
logically:

1. There is a trait named `ShuffleBlockManager` that only deals with
   one logical function which is retrieving shuffle block data given shuffle
   block coordinates. This trait has two implementors FileShuffleBlockManager
   and IndexShuffleBlockManager. Confusingly the vast majority of those
   implementations have nothing to do with this particular functionality.
   So I've renamed the trait to ShuffleBlockResolver and documented it.
2. The aforementioned trait had two almost identical methods, for no good
   reason. I removed one method (getBytes) and modified callers to use the
   other one. I think the behavior is preserved in all cases.
3. The sort shuffle code uses an identifier "0" in the reduce slot of a
   BlockID as a placeholder. I made it into a constant since it needs to
   be consistent across multiple places.

I think for (3) there is actually a better solution that would avoid the
need to do this type of workaround/hack in the first place, but it's more
complex so I'm punting it for now.

Author: Patrick Wendell <patrick@databricks.com>

Closes #5286 from pwendell/cleanup and squashes the following commits:

c71fbc7 [Patrick Wendell] Open interface back up for testing
f36edd5 [Patrick Wendell] Code review feedback
d1c0494 [Patrick Wendell] Style fix
a406079 [Patrick Wendell] [HOTFIX] Some clean-up in shuffle code.
2015-04-01 23:42:09 -07:00
Kousuke Saruta d824c11c9f [SPARK-6597][Minor] Replace input:checkbox with input[type="checkbox"] in additional-metrics.js
In additional-metrics.js, there are some selector notation like `input:checkbox` but JQuery's official document says `input[type="checkbox"]` is better.

https://api.jquery.com/checkbox-selector/

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #5254 from sarutak/SPARK-6597 and squashes the following commits:

a253bc4 [Kousuke Saruta] Replaced input:checkbox with input[type="checkbox"]
2015-04-01 11:11:56 +01:00
Ilya Ganelin ff1915e12e [SPARK-4655][Core] Split Stage into ShuffleMapStage and ResultStage subclasses
Hi all - this patch changes the Stage class to an abstract class and introduces two new classes that extend it: ShuffleMapStage and ResultStage - with the goal of increasing readability of the DAGScheduler class. Their usage is updated within DAGScheduler.

Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
Author: Ilya Ganelin <ilganeli@gmail.com>

Closes #4708 from ilganeli/SPARK-4655 and squashes the following commits:

c248924 [Ilya Ganelin] Merge branch 'SPARK-4655' of github.com:ilganeli/spark into SPARK-4655
d930385 [Ilya Ganelin] Fixed merge conflict from
a9a765f [Ilya Ganelin] Update DAGScheduler.scala
c03563c [Ilya Ganelin] Minor fixeS
c39e971 [Ilya Ganelin] Added return typing for public methods
845bc87 [Ilya Ganelin] Merge branch 'SPARK-4655' of github.com:ilganeli/spark into SPARK-4655
e8031d8 [Ilya Ganelin] Minor string fixes
4ec53ac [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-4655
c004f62 [Ilya Ganelin] Update DAGScheduler.scala
a2cb03f [Ilya Ganelin] [SPARK-4655] Replaced usages of Nil and eliminated some code reuse
3d5cf20 [Ilya Ganelin] [SPARK-4655] Moved mima exclude to 1.4
6912c55 [Ilya Ganelin] Resolved merge conflict
4bff208 [Ilya Ganelin] Minor stylistic fixes
c6fffbb [Ilya Ganelin] newline
41402ad [Ilya Ganelin] Style fixes
02c6981 [Ilya Ganelin] Merge branch 'SPARK-4655' of github.com:ilganeli/spark into SPARK-4655
c755a09 [Ilya Ganelin] Some more stylistic updates and minor refactoring
b6257a0 [Ilya Ganelin] Update MimaExcludes.scala
0f0c624 [Ilya Ganelin] Fixed merge conflict
2eba262 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-4655
6b43d7b [Ilya Ganelin] Got rid of some spaces
6f1a5db [Ilya Ganelin] Revert "More minor formatting and refactoring"
1b3471b [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-4655
c9288e2 [Ilya Ganelin] More minor formatting and refactoring
d548caf [Ilya Ganelin] Formatting fix
c3ae5c2 [Ilya Ganelin] Explicit typing
0dacaf3 [Ilya Ganelin] Got rid of stale import
6da3a71 [Ilya Ganelin] Trailing whitespace
b85c5fe [Ilya Ganelin] Added minor fixes
a57dfcd [Ilya Ganelin] Added MiMA exclusion to get around binary compatibility check
83ed849 [Ilya Ganelin] moved braces for consistency
96dd161 [Ilya Ganelin] Fixed minor style error
cfd6f10 [Ilya Ganelin] Updated DAGScheduler to use new ResultStage and ShuffleMapStage classes
83494e9 [Ilya Ganelin] Added new Stage classes
2015-04-01 11:09:00 +01:00
Josh Rosen 37326079d8 [SPARK-6614] OutputCommitCoordinator should clear authorized committer only after authorized committer fails, not after any failure
In OutputCommitCoordinator, there is some logic to clear the authorized committer's lock on committing in case that task fails.  However, it looks like the current code also clears this lock if other non-authorized tasks fail, which is an obvious bug.

In theory, it's possible that this could allow a new committer to start, run to completion, and commit output before the authorized committer finished, but it's unlikely that this race occurs often in practice due to the complex combination of failure and timing conditions that would be required to expose it.

This patch addresses this issue and adds a regression test.

Thanks to aarondav for spotting this issue.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #5276 from JoshRosen/SPARK-6614 and squashes the following commits:

d532ba7 [Josh Rosen] Check whether failed task was authorized committer
cbb3784 [Josh Rosen] Add regression test for SPARK-6614
2015-03-31 16:18:39 -07:00
zsxwing 56775571cb [SPARK-5124][Core] Move StopCoordinator to the receive method since it does not require a reply
Hotfix for #4588

cc rxin

Author: zsxwing <zsxwing@gmail.com>

Closes #5283 from zsxwing/hotfix and squashes the following commits:

cf3e5a7 [zsxwing] Move StopCoordinator to the receive method since it does not require a reply
2015-03-30 22:10:49 -07:00
Cheng Lian fde6945417 [SPARK-6369] [SQL] Uses commit coordinator to help committing Hive and Parquet tables
This PR leverages the output commit coordinator introduced in #4066 to help committing Hive and Parquet tables.

This PR extracts output commit code in `SparkHadoopWriter.commit` to `SparkHadoopMapRedUtil.commitTask`, and reuses it for committing Parquet and Hive tables on executor side.

TODO

- [ ] Add tests

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5139)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #5139 from liancheng/spark-6369 and squashes the following commits:

72eb628 [Cheng Lian] Fixes typo in javadoc
9a4b82b [Cheng Lian] Adds javadoc and addresses @aarondav's comments
dfdf3ef [Cheng Lian] Uses commit coordinator to help committing Hive and Parquet tables
2015-03-31 07:48:37 +08:00
Li Zhihui 01dc9f50d1 Fix string interpolator error in HeartbeatReceiver
Error log before fixed
<code>15/03/29 10:07:25 ERROR YarnScheduler: Lost an executor 24 (already removed): Executor heartbeat timed out after ${now - lastSeenMs} ms</code>

Author: Li Zhihui <zhihui.li@intel.com>

Closes #5255 from li-zhihui/fixstringinterpolator and squashes the following commits:

c93f2b7 [Li Zhihui] Fix string interpolator error in HeartbeatReceiver
2015-03-29 21:30:37 -07:00
zsxwing a8d53afb4e [SPARK-5124][Core] A standard RPC interface and an Akka implementation
This PR added a standard internal RPC interface for Spark and an Akka implementation. See [the design document](https://issues.apache.org/jira/secure/attachment/12698710/Pluggable%20RPC%20-%20draft%202.pdf) for more details.

I will split the whole work into multiple PRs to make it easier for code review. This is the first PR and avoid to touch too many files.

Author: zsxwing <zsxwing@gmail.com>

Closes #4588 from zsxwing/rpc-part1 and squashes the following commits:

fe3df4c [zsxwing] Move registerEndpoint and use actorSystem.dispatcher in asyncSetupEndpointRefByURI
f6f3287 [zsxwing] Remove RpcEndpointRef.toURI
8bd1097 [zsxwing] Fix docs and the code style
f459380 [zsxwing] Add RpcAddress.fromURI and rename urls to uris
b221398 [zsxwing] Move send methods above ask methods
15cfd7b [zsxwing] Merge branch 'master' into rpc-part1
9ffa997 [zsxwing] Fix MiMa tests
78a1733 [zsxwing] Merge remote-tracking branch 'origin/master' into rpc-part1
385b9c3 [zsxwing] Fix the code style and add docs
2cc3f78 [zsxwing] Add an asynchronous version of setupEndpointRefByUrl
e8dfec3 [zsxwing] Remove 'sendWithReply(message: Any, sender: RpcEndpointRef): Unit'
08564ae [zsxwing] Add RpcEnvFactory to create RpcEnv
e5df4ca [zsxwing] Handle AkkaFailure(e) in Actor
ec7c5b0 [zsxwing] Fix docs
7fc95e1 [zsxwing] Implement askWithReply in RpcEndpointRef
9288406 [zsxwing] Document thread-safety for setupThreadSafeEndpoint
3007c09 [zsxwing] Move setupDriverEndpointRef to RpcUtils and rename to makeDriverRef
c425022 [zsxwing] Fix the code style
5f87700 [zsxwing] Move the logical of processing message to a private function
3e56123 [zsxwing] Use lazy to eliminate CountDownLatch
07f128f [zsxwing] Remove ActionScheduler.scala
4d34191 [zsxwing] Remove scheduler from RpcEnv
7cdd95e [zsxwing] Add docs for RpcEnv
51e6667 [zsxwing] Add 'sender' to RpcCallContext and rename the parameter of receiveAndReply to 'context'
ffc1280 [zsxwing] Rename 'fail' to 'sendFailure' and other minor code style changes
28e6d0f [zsxwing] Add onXXX for network events and remove the companion objects of network events
3751c97 [zsxwing] Rename RpcResponse to RpcCallContext
fe7d1ff [zsxwing] Add explicit reply in rpc
7b9e0c9 [zsxwing] Fix the indentation
04a106e [zsxwing] Remove NopCancellable and add a const NOP in object SettableCancellable
2a579f4 [zsxwing] Remove RpcEnv.systemName
155b987 [zsxwing] Change newURI to uriOf and add some comments
45b2317 [zsxwing] A standard RPC interface and An Akka implementation
2015-03-29 21:25:09 -07:00
Thomas Graves 52ece26b8f [SPARK-6558] Utils.getCurrentUserName returns the full principal name instead of login name
Utils.getCurrentUserName returns UserGroupInformation.getCurrentUser().getUserName() when SPARK_USER isn't set. It should return UserGroupInformation.getCurrentUser().getShortUserName()
getUserName() returns the users full principal name (ie user1CORP.COM). getShortUserName() returns just the users login name (user1).

This just happens to work on YARN because the Client code sets:
env("SPARK_USER") = UserGroupInformation.getCurrentUser().getShortUserName()

Author: Thomas Graves <tgraves@apache.org>

Closes #5229 from tgravescs/SPARK-6558 and squashes the following commits:

24830bf [Thomas Graves] Utils.getCurrentUserName returns the full principal name instead of login name
2015-03-29 12:43:30 +01:00
zsxwing da546b7ba0 [SPARK-6556][Core] Fix wrong parsing logic of executorTimeoutMs and checkTimeoutIntervalMs in HeartbeatReceiver
The current reading logic of `executorTimeoutMs` is:
```Scala
private val executorTimeoutMs = sc.conf.getLong("spark.network.timeout",
    sc.conf.getLong("spark.storage.blockManagerSlaveTimeoutMs", 120)) * 1000
```
So if `spark.storage.blockManagerSlaveTimeoutMs` is 10000 and `spark.network.timeout` is not set, executorTimeoutMs will be 10000 * 1000. But the correct value should have been 10000.

`checkTimeoutIntervalMs` has the same issue.

This PR fixes them.

Author: zsxwing <zsxwing@gmail.com>

Closes #5209 from zsxwing/SPARK-6556 and squashes the following commits:

6a0a411 [zsxwing] Fix docs
c7d5422 [zsxwing] Add comments for executorTimeoutMs and checkTimeoutIntervalMs
ccd5147 [zsxwing] Fix wrong parsing logic of executorTimeoutMs and checkTimeoutIntervalMs in HeartbeatReceiver
2015-03-27 12:31:06 +00:00
mcheah 49d2ec63ec [SPARK-6405] Limiting the maximum Kryo buffer size to be 2GB.
Kryo buffers are backed by byte arrays, but primitive arrays can only be
up to 2GB in size. It is misleading to allow users to set buffers past
this size.

Author: mcheah <mcheah@palantir.com>

Closes #5218 from mccheah/feature/limit-kryo-buffer and squashes the following commits:

1d6d1be [mcheah] Fixing numeric typo
e2e30ce [mcheah] Removing explicit int and double type to match style
09fd80b [mcheah] Should be >= not >. Slightly more consistent error message.
60634f9 [mcheah] [SPARK-6405] Limiting the maximum Kryo buffer size to be 2GB.
2015-03-26 22:48:42 -07:00
Sean Owen fe15ea9760 SPARK-6480 [CORE] histogram() bucket function is wrong in some simple edge cases
Fix fastBucketFunction for histogram() to handle edge conditions more correctly. Add a test, and fix existing one accordingly

Author: Sean Owen <sowen@cloudera.com>

Closes #5148 from srowen/SPARK-6480 and squashes the following commits:

974a0a0 [Sean Owen] Additional test of huge ranges, and a few more comments (and comment fixes)
23ec01e [Sean Owen] Fix fastBucketFunction for histogram() to handle edge conditions more correctly. Add a test, and fix existing one accordingly
2015-03-26 15:00:23 +00:00
zsxwing 0c88ce5416 [SPARK-6468][Block Manager] Fix the race condition of subDirs in DiskBlockManager
There are two race conditions of `subDirs` in `DiskBlockManager`:

1. `getAllFiles` does not use correct locks to read the contents in `subDirs`. Although it's designed for testing, it's still worth to add correct locks to eliminate the race condition.
2. The double-check has a race condition in `getFile(filename: String)`. If a thread finds `subDirs(dirId)(subDirId)` is not null out of the `synchronized` block, it may not be able to see the correct content of the File instance pointed by `subDirs(dirId)(subDirId)` according to the Java memory model (there is no volatile variable here).

This PR fixed the above race conditions.

Author: zsxwing <zsxwing@gmail.com>

Closes #5136 from zsxwing/SPARK-6468 and squashes the following commits:

cbb872b [zsxwing] Fix the race condition of subDirs in DiskBlockManager
2015-03-26 12:54:48 +00:00
Josh Rosen d44a3362ed [SPARK-6079] Use index to speed up StatusTracker.getJobIdsForGroup()
`StatusTracker.getJobIdsForGroup()` is implemented via a linear scan over a HashMap rather than using an index, which might be an expensive operation if there are many (e.g. thousands) of retained jobs.

This patch adds a new map to `JobProgressListener` in order to speed up these lookups.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #4830 from JoshRosen/statustracker-job-group-indexing and squashes the following commits:

e39c5c7 [Josh Rosen] Address review feedback
6709fb2 [Josh Rosen] Merge remote-tracking branch 'origin/master' into statustracker-job-group-indexing
2c49614 [Josh Rosen] getOrElse
97275a7 [Josh Rosen] Add jobGroup to jobId index to JobProgressListener
2015-03-25 17:40:00 -07:00
Andrew Or c1b74df604 [SPARK-5771] Master UI inconsistently displays application cores
If the user calls `sc.stop()`, then the number of cores under "Completed Applications" will be 0. If the user does not call `sc.stop()`, then the number of cores will be however many cores were being used before the application exited. This PR makes both cases have the behavior of the latter.

Note that there have been a series of PR that attempted to fix this. For the full discussion, please refer to #4841. The unregister event is necessary because of a subtle race condition explained in that PR.

Tested this locally with and without calling `sc.stop()`.

Author: Andrew Or <andrew@databricks.com>

Closes #5177 from andrewor14/master-ui-cores and squashes the following commits:

62449d1 [Andrew Or] Freeze application state before finishing it
2015-03-25 13:28:32 -07:00
Kousuke Saruta acef51defb [SPARK-6537] UIWorkloadGenerator: The main thread should not stop SparkContext until all jobs finish
The main thread of UIWorkloadGenerator spawn sub threads to launch jobs but the main thread stop SparkContext without waiting for finishing those threads.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #5187 from sarutak/SPARK-6537 and squashes the following commits:

4e9307a [Kousuke Saruta] Fixed UIWorkloadGenerator so that the main thread stop SparkContext after all jobs finish
2015-03-25 13:27:15 -07:00
zsxwing 883b7e9030 [SPARK-6076][Block Manager] Fix a potential OOM issue when StorageLevel is MEMORY_AND_DISK_SER
In dcd1e42d6b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala (L538) , when StorageLevel is `MEMORY_AND_DISK_SER`, it will copy the content from file into memory, then put it into MemoryStore.
```scala
              val copyForMemory = ByteBuffer.allocate(bytes.limit)
              copyForMemory.put(bytes)
              memoryStore.putBytes(blockId, copyForMemory, level)
              bytes.rewind()
```
However, if the file is bigger than the free memory, OOM will happen. A better approach is testing if there is enough memory. If not, copyForMemory should not be created, since this is an optional operation.

Author: zsxwing <zsxwing@gmail.com>

Closes #4827 from zsxwing/SPARK-6076 and squashes the following commits:

7d25545 [zsxwing] Add alias for tryToPut and dropFromMemory
1100a54 [zsxwing] Replace call-by-name with () => T
0cc0257 [zsxwing] Fix a potential OOM issue when StorageLevel is MEMORY_AND_DISK_SER
2015-03-25 12:17:18 -07:00
Xiangrui Meng c14ddd97ed [SPARK-6515] update OpenHashSet impl
Though I don't see any bug in the existing code, the update in this PR makes it read better. rxin

Author: Xiangrui Meng <meng@databricks.com>

Closes #5176 from mengxr/SPARK-6515 and squashes the following commits:

134494d [Xiangrui Meng] update OpenHashSet impl
2015-03-24 18:58:27 -07:00
Xiangrui Meng 6930e965e2 [SPARK-6512] add contains to OpenHashMap
Add `contains` to test whether a key exists in an OpenHashMap. rxin

Author: Xiangrui Meng <meng@databricks.com>

Closes #5171 from mengxr/openhashmap-contains and squashes the following commits:

d6e6f1f [Xiangrui Meng] add contains to primitivekeyopenhashmap
748a69b [Xiangrui Meng] add contains to OpenHashMap
2015-03-24 17:06:22 -07:00
Andrew Or dd907d1a9d Revert "[SPARK-5771] Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called"
This reverts commit dd077abf2e.

Conflicts:
	core/src/main/scala/org/apache/spark/deploy/master/ApplicationInfo.scala
	core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
2015-03-24 16:49:27 -07:00
Andrew Or f7c3668ee6 Revert "[SPARK-5771][UI][hotfix] Change Requested Cores into * if default cores is not set"
This reverts commit 12135e9054.
2015-03-24 16:41:31 -07:00
Kay Ousterhout d8ccf655f3 [SPARK-3570] Include time to open files in shuffle write time.
Opening shuffle files can be very significant when the disk is
contended, especially when using ext3. While writing data to
a file can avoid hitting disk (and instead hit the buffer
cache), opening a file always involves writing some metadata
about the file to disk, so the open time can be a very significant
portion of the shuffle write time. In one job I ran recently, the time to
write shuffle data to the file was only 4ms for each task, but
the time to open the file was about 100x as long (~400ms).

When we add metrics about spilled data (#2504), we should ensure
that the file open time is also included there.

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #4550 from kayousterhout/SPARK-3570 and squashes the following commits:

ea3a4ae [Kay Ousterhout] Added comment about excluded open time
fdc5185 [Kay Ousterhout] Improved comment
42b7e43 [Kay Ousterhout] Fixed parens for nanotime
2423555 [Kay Ousterhout] [SPARK-3570] Include time to open files in shuffle write time.
2015-03-24 16:29:40 -07:00
Kay Ousterhout 6948ab6f8b [SPARK-6088] Correct how tasks that get remote results are shown in UI.
It would be great to fix this for 1.3. since the fix is surgical and it helps understandability for users.

cc shivaram pwendell

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #4839 from kayousterhout/SPARK-6088 and squashes the following commits:

3ab012c [Kay Ousterhout] Update getting result time incrementally, correctly set GET_RESULT status
f346b49 [Kay Ousterhout] Typos
748ea6b [Kay Ousterhout] Fixed build failure
84d617c [Kay Ousterhout] [SPARK-6088] Correct how tasks that get remote results are shown in the UI.
2015-03-24 16:26:43 -07:00
Kousuke Saruta 85cf063682 [SPARK-5559] [Streaming] [Test] Remove oppotunity we met flakiness when running FlumeStreamSuite
When we run FlumeStreamSuite on Jenkins, sometimes we get error like as follows.

    sbt.ForkMain$ForkError: The code passed to eventually never returned normally. Attempted 52 times over 10.094849836 seconds. Last failure message: Error connecting to localhost/127.0.0.1:23456.
	    at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
	    at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
	    at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
	    at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
	   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
	   at org.apache.spark.streaming.flume.FlumeStreamSuite.writeAndVerify(FlumeStreamSuite.scala:116)
           at org.apache.spark.streaming.flume.FlumeStreamSuite.org$apache$spark$streaming$flume$FlumeStreamSuite$$testFlumeStream(FlumeStreamSuite.scala:74)
	   at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply$mcV$sp(FlumeStreamSuite.scala:66)
	    at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66)
	    at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66)
	    at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	    at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	    at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	    at org.scalatest.Transformer.apply(Transformer.scala:22)
	    at org.scalatest.Transformer.apply(Transformer.scala:20)
    	    at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	    at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
	    at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
	    at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
	   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	    at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	    at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	    at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)

This error is caused by check-then-act logic  when it find free-port .

      /** Find a free port */
      private def findFreePort(): Int = {
        Utils.startServiceOnPort(23456, (trialPort: Int) => {
          val socket = new ServerSocket(trialPort)
          socket.close()
          (null, trialPort)
        }, conf)._2
      }

Removing the check-then-act is not easy but we can reduce the chance of having the error by choosing random value for initial port instead of 23456.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #4337 from sarutak/SPARK-5559 and squashes the following commits:

16f109f [Kousuke Saruta] Added `require` to Utils#startServiceOnPort
c39d8b6 [Kousuke Saruta] Merge branch 'SPARK-5559' of github.com:sarutak/spark into SPARK-5559
1610ba2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
33357e3 [Kousuke Saruta] Changed "findFreePort" method in MQTTStreamSuite and FlumeStreamSuite so that it can choose valid random port
a9029fe [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
9489ef9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
8212e42 [Kousuke Saruta] Modified default port used in FlumeStreamSuite from 23456 to random value
2015-03-24 16:20:52 +00:00
Reynold Xin 4ce2782a61 [SPARK-6428] Added explicit types for all public methods in core.
Author: Reynold Xin <rxin@databricks.com>

Closes #5125 from rxin/core-explicit-type and squashes the following commits:

f471415 [Reynold Xin] Revert style checker changes.
81b66e4 [Reynold Xin] Code review feedback.
a7533e3 [Reynold Xin] Mima excludes.
1d795f5 [Reynold Xin] [SPARK-6428] Added explicit types for all public methods in core.
2015-03-23 23:41:06 -07:00
Patrick Wendell 6cd7058b36 Revert "[SPARK-6122][Core] Upgrade Tachyon client version to 0.6.1."
This reverts commit a41b9c6004.
2015-03-23 15:08:39 -07:00
Calvin Jia a41b9c6004 [SPARK-6122][Core] Upgrade Tachyon client version to 0.6.1.
Changes the Tachyon client version from 0.5 to 0.6 in spark core and distribution script.

New dependencies in Tachyon 0.6.0 include

commons-codec:commons-codec:jar:1.5:compile
io.netty:netty-all:jar:4.0.23.Final:compile

These are already in spark core.

Author: Calvin Jia <jia.calvin@gmail.com>

Closes #4867 from calvinjia/upgrade_tachyon_0.6.0 and squashes the following commits:

eed9230 [Calvin Jia] Update tachyon version to 0.6.1.
11907b3 [Calvin Jia] Use TachyonURI for tachyon paths instead of strings.
71bf441 [Calvin Jia] Upgrade Tachyon client version to 0.6.0.
2015-03-22 11:11:29 -07:00
Ryan Williams b9fe504b49 [SPARK-6448] Make history server log parse exceptions
This helped me to debug a parse error that was due to the event log format changing recently.

Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes #5122 from ryan-williams/histerror and squashes the following commits:

5831656 [Ryan Williams] line length
c3742ae [Ryan Williams] Make history server log parse exceptions
2015-03-22 11:54:23 +00:00
Jongyoul Lee 49a01c7ea2 [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set
- Fixed calculateTotalMemory to use spark.mesos.executor.memoryOverhead
- Added testCase

Author: Jongyoul Lee <jongyoul@gmail.com>

Closes #5099 from jongyoul/SPARK-6423 and squashes the following commits:

6747fce [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Changed a description of spark.mesos.executor.memoryOverhead
475a7c8 [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Fit the import rules
453c5a2 [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Fixed calculateTotalMemory to use spark.mesos.executor.memoryOverhead - Added testCase
2015-03-20 19:14:35 +00:00
Marcelo Vanzin a74564591f [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT.
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #5056 from vanzin/SPARK-6371 and squashes the following commits:

63220df [Marcelo Vanzin] Merge branch 'master' into SPARK-6371
6506f75 [Marcelo Vanzin] Use more fine-grained exclusion.
178ba71 [Marcelo Vanzin] Oops.
75b2375 [Marcelo Vanzin] Exclude VertexRDD in MiMA.
a45a62c [Marcelo Vanzin] Work around MIMA warning.
1d8a670 [Marcelo Vanzin] Re-group jetty exclusion.
0e8e909 [Marcelo Vanzin] Ignore ml, don't ignore graphx.
cef4603 [Marcelo Vanzin] Indentation.
296cf82 [Marcelo Vanzin] [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT.
2015-03-20 18:43:57 +00:00
mbonaci 28bcb9e9e8 [SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample
The docs for the `sample` method were insufficient, now less so.

Author: mbonaci <mbonaci@gmail.com>

Closes #5097 from mbonaci/master and squashes the following commits:

a6a9d97 [mbonaci] [SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample method
2015-03-20 18:33:53 +00:00
Sean Owen 6f80c3e888 SPARK-6338 [CORE] Use standard temp dir mechanisms in tests to avoid orphaned temp files
Use `Utils.createTempDir()` to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify

Author: Sean Owen <sowen@cloudera.com>

Closes #5029 from srowen/SPARK-6338 and squashes the following commits:

27b740a [Sean Owen] Fix hive-thriftserver tests that don't expect an existing dir
4a212fa [Sean Owen] Standardize a bit more temp dir management
9004081 [Sean Owen] Revert some added recursive-delete calls
57609e4 [Sean Owen] Use Utils.createTempDir() to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify
2015-03-20 14:16:21 +00:00
Jongyoul Lee 116c553fd6 [SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR
- Made TaskState.isFailed for handling TASK_LOST and TASK_ERROR and synchronizing CoarseMesosSchedulerBackend and MesosSchedulerBackend
- This is related #5000

Author: Jongyoul Lee <jongyoul@gmail.com>

Closes #5088 from jongyoul/SPARK-6286-1 and squashes the following commits:

4f2362f [Jongyoul Lee] [SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR - Fixed scalastyle
ac4336a [Jongyoul Lee] [SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR - Made TaskState.isFailed for handling TASK_LOST and TASK_ERROR and synchronizing CoarseMesosSchedulerBackend and MesosSchedulerBackend
2015-03-20 12:24:34 +00:00
Reynold Xin 0745a305fa Tighten up field/method visibility in Executor and made some code more clear to read.
I was reading Executor just now and found that some latest changes introduced some weird code path with too much monadic chaining and unnecessary fields. I cleaned it up a bit, and also tightened up the visibility of various fields/methods. Also added some inline documentation to help understand this code better.

Author: Reynold Xin <rxin@databricks.com>

Closes #4850 from rxin/executor and squashes the following commits:

866fc60 [Reynold Xin] Code review feedback.
020efbb [Reynold Xin] Tighten up field/method visibility in Executor and made some code more clear to read.
2015-03-19 22:12:01 -04:00
Wenchen Fan 3b5aaa6a5f [Core][minor] remove unused visitedStages in DAGScheduler.stageDependsOn
We define and update `visitedStages` in `DAGScheduler.stageDependsOn`, but never read it. So we can safely remove it.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #5086 from cloud-fan/minor and squashes the following commits:

24663ea [Wenchen Fan] remove un-used variable
2015-03-19 15:25:32 -04:00
mcheah 3c4e486b9c [SPARK-5843] [API] Allowing map-side combine to be specified in Java.
Specifically, when calling JavaPairRDD.combineByKey(), there is a new
six-parameter method that exposes the map-side-combine boolean as the
fifth parameter and the serializer as the sixth parameter.

Author: mcheah <mcheah@palantir.com>

Closes #4634 from mccheah/pair-rdd-map-side-combine and squashes the following commits:

5c58319 [mcheah] Fixing compiler errors.
3ce7deb [mcheah] Addressing style and documentation comments.
7455c7a [mcheah] Allowing Java combineByKey to specify Serializer as well.
6ddd729 [mcheah] [SPARK-5843] Allowing map-side combine to be specified in Java.
2015-03-19 08:51:49 -04:00
CodingCat 2c3f83c34b [SPARK-4012] stop SparkContext when the exception is thrown from an infinite loop
https://issues.apache.org/jira/browse/SPARK-4012

This patch is a resubmission for https://github.com/apache/spark/pull/2864

What I am proposing in this patch is that ***when the exception is thrown from an infinite loop, we should stop the SparkContext, instead of let JVM throws exception forever***

So, in the infinite loops where we originally wrapped with a ` logUncaughtExceptions`, I changed to `tryOrStopSparkContext`, so that the Spark component is stopped

Early stopped JVM process is helpful for HA scheme design, for example,

The user has a script checking the existence of the pid of the Spark Streaming driver for monitoring the availability; with the code before this patch, the JVM process is still available but not functional when the exceptions are thrown

andrewor14, srowen , mind taking further consideration about the change?

Author: CodingCat <zhunansjtu@gmail.com>

Closes #5004 from CodingCat/SPARK-4012-1 and squashes the following commits:

589276a [CodingCat] throw fatal error again
3c72cd8 [CodingCat] address the comments
6087864 [CodingCat] revise comments
6ad3eb0 [CodingCat] stop SparkContext instead of quit the JVM process
6322959 [CodingCat] exit JVM process when the exception is thrown from an infinite loop
2015-03-18 23:48:45 -07:00
Wenchen Fan 540b2a4eab [SPARK-6394][Core] cleanup BlockManager companion object and improve the getCacheLocs method in DAGScheduler
The current implementation include searching a HashMap many times, we can avoid this.
Actually if you look into `BlockManager.blockIdsToBlockManagers`, the core function call is [this](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1258), so we can call `blockManagerMaster.getLocations` directly and avoid building a HashMap.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #5043 from cloud-fan/small and squashes the following commits:

e959d12 [Wenchen Fan] fix style
203c493 [Wenchen Fan] some cleanup in BlockManager companion object
d409099 [Wenchen Fan] address rxin's comment
faec999 [Wenchen Fan] add regression test
2fb57aa [Wenchen Fan] imporve the getCacheLocs method
2015-03-18 19:43:04 -07:00
Marcelo Vanzin 981fbafa2a [SPARK-6325] [core,yarn] Do not change target executor count when killing executors.
The dynamic execution code has two ways to reduce the number of executors: one
where it reduces the total number of executors it wants, by asking for an absolute
number of executors that is lower than the previous one. The second is by
explicitly killing idle executors.

YarnAllocator was mixing those up and lowering the target number of executors
when a kill was issued. Instead, trust the frontend knows what it's doing, and kill
executors without messing with other accounting. That means that if the frontend
kills an executor without lowering the target, it will get a new executor shortly.

The one situation where both actions (lower the target and kill executor) need to
happen together is when user code explicitly calls `SparkContext.killExecutors`.
In that case, issue two calls to the backend to achieve the goal.

I also did some minor cleanup in related code:
- avoid sending a request for executors when target is unchanged, to avoid log
  spam in the AM
- avoid printing misleading log messages in the AM when there are no requests
  to cancel
- fix a slow memory leak plus misleading error message on the driver caused by
  failing to completely unregister the executor.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #5018 from vanzin/SPARK-6325 and squashes the following commits:

2e782a3 [Marcelo Vanzin] Avoid redundant logging on the AM side.
a3567cd [Marcelo Vanzin] Add parentheses.
a363926 [Marcelo Vanzin] Update logic.
a158101 [Marcelo Vanzin] [SPARK-6325] [core,yarn] Disallow reducing executor count past running count.
2015-03-18 09:18:28 -04:00
Iulian Dragos 9d112a958e [SPARK-6286][minor] Handle missing Mesos case TASK_ERROR.
Author: Iulian Dragos <jaguarul@gmail.com>

Closes #5000 from dragos/issue/task-error-case and squashes the following commits:

e063627 [Iulian Dragos] Handle TASK_ERROR in Mesos scheduler backends.
ac17cf0 [Iulian Dragos] Handle missing Mesos case TASK_ERROR.
2015-03-18 09:15:33 -04:00
nemccarthy 4cca3917dc [SPARK-6313] Add config option to disable file locks/fetchFile cache to ...
...support NFS mounts.

This is a work around for now with the goal to find a more permanent solution.
https://issues.apache.org/jira/browse/SPARK-6313

Author: nemccarthy <nathan@nemccarthy.me>

Closes #5036 from nemccarthy/master and squashes the following commits:

2eaaf42 [nemccarthy] [SPARK-6313] Update config wording doc for spark.files.useFetchCache
5de7eb4 [nemccarthy] [SPARK-6313] Add config option to disable file locks/fetchFile cache to support NFS mounts
2015-03-17 09:33:11 -07:00
Josh Rosen 0f673c21f6 [SPARK-3266] Use intermediate abstract classes to fix type erasure issues in Java APIs
This PR addresses a Scala compiler bug ([SI-8905](https://issues.scala-lang.org/browse/SI-8905)) that was breaking some of the Spark Java APIs.  In a nutshell, it seems that methods whose implementations are inherited from generic traits sometimes have their type parameters erased to Object.  This was causing methods like `DoubleRDD.min()` to throw confusing NoSuchMethodErrors at runtime.

The fix implemented here is to introduce an intermediate layer of abstract classes and inherit from those instead of directly extends the `Java*Like` traits.  This should not break binary compatibility.

I also improved the test coverage of the Java API, adding several new tests for methods that failed at runtime due to this bug.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #5050 from JoshRosen/javardd-si-8905-fix and squashes the following commits:

2feb068 [Josh Rosen] Use intermediate abstract classes to work around SPARK-3266
d5f3e5d [Josh Rosen] Add failing regression tests for SPARK-3266
2015-03-17 09:18:57 -07:00
mcheah 005d1c5f29 [SPARK-6269] [CORE] Use ScalaRunTime's array methods instead of java.lang.reflect.Array in size estimation
This patch switches the usage of java.lang.reflect.Array in Size estimation to using scala's RunTime array-getter methods. The notes on https://bugs.openjdk.java.net/browse/JDK-8051447 tipped me off to the fact that using java.lang.reflect.Array was not ideal. At first, I used the code from that ticket, but it turns out that ScalaRunTime's array-related methods avoid the bottleneck of invoking native code anyways, so that was sufficient to boost performance in size estimation.

The idea is to use pure Java code in implementing the methods there, as opposed to relying on native C code which ends up being ill-performing. This improves the performance of estimating the size of arrays when we are checking for spilling in Spark.

Here's the benchmark discussion from the ticket:

I did two tests. The first, less convincing, take-with-a-block-of-salt test I did was do a simple groupByKey operation to collect objects in a 4.0 GB text file RDD into 30,000 buckets. I ran 1 Master and 4 Spark Worker JVMs on my mac, fetching the RDD from a text file simply stored on disk, and saving it out to another file located on local disk. The wall clock times I got back before and after the change were:

Before: 352.195s, 343.871s, 359.080s
After (using code directly from the JDK ticket, not the scala code in this PR): 342.929583s, 329.456623s, 326.151481s

So, there is a bit of an improvement after the change. I also did some YourKit profiling of the executors to get an idea of how much time was spent in size estimation before and after the change. I roughly saw that size estimation took up less of the time after my change, but YourKit's profiling can be inconsistent and who knows if I was profiling the executors that had the same data between runs?

The more convincing test I did was to run the size-estimation logic itself in an isolated unit test. I ran the following code:
```
val bigArray = Array.fill(1000)(Array.fill(1000)(java.util.UUID.randomUUID().toString()))
test("String arrays only perf testing") {
  val startTime = System.currentTimeMillis()
  for (i <- 1 to 50000) {
    SizeEstimator.estimate(bigArray)
  }
  println("Runtime: " + (System.currentTimeMillis() - startTime) / 1000.0000)
}
```
I wanted to use a 2D array specifically because I wanted to measure the performance of repeatedly calling Array.getLength. I used UUID-Strings to ensure that the strings were randomized (so String object re-use doesn't happen), but that they would all be the same size. The results were as follows:

Before PR: 222.681 s, 218.34 s, 211.739s
After latest change: 170.715 s, 176.775 s, 180.298 s
.

Author: mcheah <mcheah@palantir.com>
Author: Justin Uang <justin.uang@gmail.com>

Closes #4972 from mccheah/feature/spark-6269-reflect-array and squashes the following commits:

8527852 [mcheah] Respect CamelCase for numElementsDrawn
18d4b50 [mcheah] Addressing style comments - while loops instead of for loops
16ce534 [mcheah] Organizing imports properly
db890ea [mcheah] Removing CastedArray and just using ScalaRunTime.
cb67ce2 [mcheah] Fixing a scalastyle error - line too long
5d53c4c [mcheah] Removing unused parameter in visitArray.
6467759 [mcheah] Including primitive size information inside CastedArray.
93f4b05 [mcheah] Using Scala instead of Java for the array-reflection implementation.
a557ab8 [mcheah] Using a wrapper around arrays to do casting only once
ca063fc [mcheah] Fixing a compiler error made while refactoring style
1fe09de [Justin Uang] [SPARK-6269] Use a different implementation of java.lang.reflect.Array
2015-03-17 11:20:20 +00:00
CodingCat 25f35806e3 [SPARK-4011] tighten the visibility of the members in Master/Worker class
https://issues.apache.org/jira/browse/SPARK-4011

Currently, most of the members in Master/Worker are with public accessibility. We might wish to tighten the accessibility of them

a bit more discussion is here:

https://github.com/apache/spark/pull/2828

Author: CodingCat <zhunansjtu@gmail.com>

Closes #4844 from CodingCat/SPARK-4011 and squashes the following commits:

1a64175 [CodingCat] fix compilation issue
e7fd375 [CodingCat] Sean is right....
f5034a4 [CodingCat] fix rebase mistake
8d5b0c0 [CodingCat] loose more fields
0072f96 [CodingCat] lose some restrictions based on the possible design intention
de77286 [CodingCat] tighten accessibility of deploy package
12b4fd3 [CodingCat] tighten accessibility of deploy.worker
1243bc7 [CodingCat] tighten accessibility of deploy.rest
c5f622c [CodingCat] tighten the accessibility of deploy.history
d441e20 [CodingCat] tighten accessibility of deploy.client
4e0ce4a [CodingCat] tighten the accessibility of the members of classes in master
23cddbb [CodingCat] stylistic fix
9a3a340 [CodingCat] tighten the access of worker class
67a0559 [CodingCat] tighten the access permission in Master
2015-03-17 11:18:27 +00:00
Sean Owen b2d8c02224 SPARK-6044 [CORE] RDD.aggregate() should not use the closure serializer on the zero value
Use configured serializer in RDD.aggregate, to match PairRDDFunctions.aggregateByKey, instead of closure serializer.

Compare with e60ad2f4c4/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala (L127)

Author: Sean Owen <sowen@cloudera.com>

Closes #5028 from srowen/SPARK-6044 and squashes the following commits:

a4040a7 [Sean Owen] Use configured serializer in RDD.aggregate, to match PairRDDFunctions.aggregateByKey, instead of closure serializer
2015-03-16 23:58:52 -07:00
Kevin (Sangwoo) Kim f0edeae7f9 [SPARK-6299][CORE] ClassNotFoundException in standalone mode when running groupByKey with class defined in REPL
```
case class ClassA(value: String)
val rdd = sc.parallelize(List(("k1", ClassA("v1")), ("k1", ClassA("v2")) ))
rdd.groupByKey.collect
```
This code used to be throw exception in spark-shell, because while shuffling ```JavaSerializer```uses ```defaultClassLoader``` which was defined like ```env.serializer.setDefaultClassLoader(urlClassLoader)```.

It should be ```env.serializer.setDefaultClassLoader(replClassLoader)```, like
```
    override def run() {
      val deserializeStartTime = System.currentTimeMillis()
      Thread.currentThread.setContextClassLoader(replClassLoader)
```
in TaskRunner.

When ```replClassLoader``` cannot be defined, it's identical with ```urlClassLoader```

Author: Kevin (Sangwoo) Kim <sangwookim.me@gmail.com>

Closes #5046 from swkimme/master and squashes the following commits:

fa2b9ee [Kevin (Sangwoo) Kim] stylish test codes ( collect -> collect() )
6e9620b [Kevin (Sangwoo) Kim] stylish test codes ( collect -> collect() )
d23e4e2 [Kevin (Sangwoo) Kim] stylish test codes ( collect -> collect() )
a4a3c8a [Kevin (Sangwoo) Kim] add 'class defined in repl - shuffle' test to ReplSuite
bd00da5 [Kevin (Sangwoo) Kim] add 'class defined in repl - shuffle' test to ReplSuite
c1b1fc7 [Kevin (Sangwoo) Kim] use REPL class loader for executor's serializer
2015-03-16 23:49:23 -07:00
lisurprise f149b8b5e5 [SPARK-6077] Remove streaming tab while stopping StreamingContext
Currently we would create a new streaming tab for each streamingContext even if there's already one on the same sparkContext which would cause duplicate StreamingTab created and none of them is taking effect.
snapshot: https://www.dropbox.com/s/t4gd6hqyqo0nivz/bad%20multiple%20streamings.png?dl=0
How to reproduce:
1)
import org.apache.spark.SparkConf
import org.apache.spark.streaming.
{Seconds, StreamingContext}
import org.apache.spark.storage.StorageLevel
val ssc = new StreamingContext(sc, Seconds(1))
val lines = ssc.socketTextStream("localhost", 9999, StorageLevel.MEMORY_AND_DISK_SER)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
ssc.start()
.....
2)
ssc.stop(false)
val ssc = new StreamingContext(sc, Seconds(1))
val lines = ssc.socketTextStream("localhost", 9999, StorageLevel.MEMORY_AND_DISK_SER)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
ssc.start()

Author: lisurprise <zhichao.li@intel.com>

Closes #4828 from zhichao-li/master and squashes the following commits:

c329806 [lisurprise] add test for attaching/detaching streaming tab
51e6c7f [lisurprise] move detach method into StreamingTab
31a44fa [lisurprise] add unit test for attaching and detaching new tab
db25ed2 [lisurprise] clean code
8281bcb [lisurprise] clean code
193c542 [lisurprise] remove streaming tab while closing streaming context
2015-03-16 13:10:32 -07:00
DoingDone9 00e730b94c [SPARK-6300][Spark Core] sc.addFile(path) does not support the relative path.
when i run cmd like that sc.addFile("../test.txt"), it did not work and throwed an exception:
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:../test.txt
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
........
.......
Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:../test.txt
at java.net.URI.checkPath(URI.java:1804)
at java.net.URI.<init>(URI.java:752)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)

Author: DoingDone9 <799203320@qq.com>

Closes #4993 from DoingDone9/relativePath and squashes the following commits:

ee375cd [DoingDone9] Update SparkContextSuite.scala
d594e16 [DoingDone9] Update SparkContext.scala
0ff3fa8 [DoingDone9] test for add file
dced8eb [DoingDone9] Update SparkContext.scala
e4a13fe [DoingDone9] getCanonicalPath
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
2015-03-16 12:27:15 +00:00
Wenchen Fan ea3d2eed9b [CORE][minor] remove unnecessary ClassTag in DAGScheduler
This existed at the very beginning, but became unnecessary after [this commit](37d8f37a8e (diff-6a9ff7fb74fd490a50462d45db2d5e11L272)). I think we should remove it if we don't plan to use it in the future.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #4992 from cloud-fan/small and squashes the following commits:

e857f2e [Wenchen Fan] remove unnecessary ClassTag
2015-03-13 14:08:56 +00:00
Zhang, Liye 9048e8102e [SPARK-6197][CORE] handle json exception when hisotry file not finished writing
For details, please refer to [SPARK-6197](https://issues.apache.org/jira/browse/SPARK-6197)

Author: Zhang, Liye <liye.zhang@intel.com>

Closes #4927 from liyezhang556520/jsonParseError and squashes the following commits:

5cbdc82 [Zhang, Liye] without unnecessary wrap
2b48831 [Zhang, Liye] small changes with sean owen's comments
2973024 [Zhang, Liye] handle json exception when file not finished writing
2015-03-13 14:00:45 +00:00
Ilya Ganelin 0af9ea74a0 [SPARK-5845][Shuffle] Time to cleanup spilled shuffle files not included in shuffle write time
I've added a timer in the right place to fix this inaccuracy.

Author: Ilya Ganelin <ilya.ganelin@capitalone.com>

Closes #4965 from ilganeli/SPARK-5845 and squashes the following commits:

bfabf88 [Ilya Ganelin] Changed to using a foreach vs. getorelse
3e059b0 [Ilya Ganelin] Switched to using getorelse
b946d08 [Ilya Ganelin] Fixed error with option
9434b50 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5845
db8647e [Ilya Ganelin] Added update for shuffleWriteTime around spilled file cleanup in ExternalSorter
2015-03-13 13:21:04 +00:00
Davies Liu 712679a7b4 [SPARK-6294] fix hang when call take() in JVM on PythonRDD
The Thread.interrupt() can not terminate the thread in some cases, so we should not wait for the writerThread of PythonRDD.

This PR also ignore some exception during clean up.

cc JoshRosen mengxr

Author: Davies Liu <davies@databricks.com>

Closes #4987 from davies/fix_take and squashes the following commits:

4488f1a [Davies Liu] fix hang when call take() in JVM on PythonRDD
2015-03-12 01:34:38 -07:00
Sean Owen 6e94c4eadf SPARK-6225 [CORE] [SQL] [STREAMING] Resolve most build warnings, 1.3.0 edition
Resolve javac, scalac warnings of various types -- deprecations, Scala lang, unchecked cast, etc.

Author: Sean Owen <sowen@cloudera.com>

Closes #4950 from srowen/SPARK-6225 and squashes the following commits:

3080972 [Sean Owen] Ordered imports: Java, Scala, 3rd party, Spark
c67985b [Sean Owen] Resolve javac, scalac warnings of various types -- deprecations, Scala lang, unchecked cast, etc.
2015-03-11 13:15:19 +00:00
Marcelo Vanzin 517975d89d [SPARK-4924] Add a library for launching Spark jobs programmatically.
This change encapsulates all the logic involved in launching a Spark job
into a small Java library that can be easily embedded into other applications.

The overall goal of this change is twofold, as described in the bug:

- Provide a public API for launching Spark processes. This is a common request
  from users and currently there's no good answer for it.

- Remove a lot of the duplicated code and other coupling that exists in the
  different parts of Spark that deal with launching processes.

A lot of the duplication was due to different code needed to build an
application's classpath (and the bootstrapper needed to run the driver in
certain situations), and also different code needed to parse spark-submit
command line options in different contexts. The change centralizes those
as much as possible so that all code paths can rely on the library for
handling those appropriately.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #3916 from vanzin/SPARK-4924 and squashes the following commits:

18c7e4d [Marcelo Vanzin] Fix make-distribution.sh.
2ce741f [Marcelo Vanzin] Add lots of quotes.
3b28a75 [Marcelo Vanzin] Update new pom.
a1b8af1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
897141f [Marcelo Vanzin] Review feedback.
e2367d2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
28cd35e [Marcelo Vanzin] Remove stale comment.
b1d86b0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
00505f9 [Marcelo Vanzin] Add blurb about new API in the programming guide.
5f4ddcc [Marcelo Vanzin] Better usage messages.
92a9cfb [Marcelo Vanzin] Fix Win32 launcher, usage.
6184c07 [Marcelo Vanzin] Rename field.
4c19196 [Marcelo Vanzin] Update comment.
7e66c18 [Marcelo Vanzin] Fix pyspark tests.
0031a8e [Marcelo Vanzin] Review feedback.
c12d84b [Marcelo Vanzin] Review feedback. And fix spark-submit on Windows.
e2d4d71 [Marcelo Vanzin] Simplify some code used to launch pyspark.
43008a7 [Marcelo Vanzin] Don't make builder extend SparkLauncher.
b4d6912 [Marcelo Vanzin] Use spark-submit script in SparkLauncher.
28b1434 [Marcelo Vanzin] Add a comment.
304333a [Marcelo Vanzin] Fix propagation of properties file arg.
bb67b93 [Marcelo Vanzin] Remove unrelated Yarn change (that is also wrong).
8ec0243 [Marcelo Vanzin] Add missing newline.
95ddfa8 [Marcelo Vanzin] Fix handling of --help for spark-class command builder.
72da7ec [Marcelo Vanzin] Rename SparkClassLauncher.
62978e4 [Marcelo Vanzin] Minor cleanup of Windows code path.
9cd5b44 [Marcelo Vanzin] Make all non-public APIs package-private.
e4c80b6 [Marcelo Vanzin] Reorganize the code so that only SparkLauncher is public.
e50dc5e [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
de81da2 [Marcelo Vanzin] Fix CommandUtils.
86a87bf [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
2061967 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
46d46da [Marcelo Vanzin] Clean up a test and make it more future-proof.
b93692a [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
ad03c48 [Marcelo Vanzin] Revert "Fix a thread-safety issue in "local" mode."
0b509d0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
23aa2a9 [Marcelo Vanzin] Read java-opts from conf dir, not spark home.
7cff919 [Marcelo Vanzin] Javadoc updates.
eae4d8e [Marcelo Vanzin] Fix new unit tests on Windows.
e570fb5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
44cd5f7 [Marcelo Vanzin] Add package-info.java, clean up javadocs.
f7cacff [Marcelo Vanzin] Remove "launch Spark in new thread" feature.
7ed8859 [Marcelo Vanzin] Some more feedback.
54cd4fd [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
61919df [Marcelo Vanzin] Clean leftover debug statement.
aae5897 [Marcelo Vanzin] Use launcher classes instead of jars in non-release mode.
e584fc3 [Marcelo Vanzin] Rework command building a little bit.
525ef5b [Marcelo Vanzin] Rework Unix spark-class to handle argument with newlines.
8ac4e92 [Marcelo Vanzin] Minor test cleanup.
e946a99 [Marcelo Vanzin] Merge PySparkLauncher into SparkSubmitCliLauncher.
c617539 [Marcelo Vanzin] Review feedback round 1.
fc6a3e2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
f26556b [Marcelo Vanzin] Fix a thread-safety issue in "local" mode.
2f4e8b4 [Marcelo Vanzin] Changes needed to make this work with SPARK-4048.
799fc20 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
bb5d324 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
53faef1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
a7936ef [Marcelo Vanzin] Fix pyspark tests.
656374e [Marcelo Vanzin] Mima fixes.
4d511e7 [Marcelo Vanzin] Fix tools search code.
7a01e4a [Marcelo Vanzin] Fix pyspark on Yarn.
1b3f6e9 [Marcelo Vanzin] Call SparkSubmit from spark-class launcher for unknown classes.
25c5ae6 [Marcelo Vanzin] Centralize SparkSubmit command line parsing.
27be98a [Marcelo Vanzin] Modify Spark to use launcher lib.
6f70eea [Marcelo Vanzin] [SPARK-4924] Add a library for launching Spark jobs programatically.
2015-03-11 01:03:01 -07:00
Lev Khomich c4c4b07bf6 [SPARK-6087][CORE] Provide actionable exception if Kryo buffer is not large enough
A simple try-catch wrapping KryoException to be more informative.

Author: Lev Khomich <levkhomich@gmail.com>

Closes #4947 from levkhomich/master and squashes the following commits:

0f7a947 [Lev Khomich] [SPARK-6087][CORE] Provide actionable exception if Kryo buffer is not large enough
2015-03-10 10:55:42 +00:00
Davies Liu 8767565cef [SPARK-6194] [SPARK-677] [PySpark] fix memory leak in collect()
Because circular reference between JavaObject and JavaMember, an Java object can not be released until Python GC kick in, then it will cause memory leak in collect(), which may consume lots of memory in JVM.

This PR change the way we sending collected data back into Python from local file to socket, which could avoid any disk IO during collect, also avoid any referrers of Java object in Python.

cc JoshRosen

Author: Davies Liu <davies@databricks.com>

Closes #4923 from davies/fix_collect and squashes the following commits:

d730286 [Davies Liu] address comments
24c92a4 [Davies Liu] fix style
ba54614 [Davies Liu] use socket to transfer data from JVM
9517c8f [Davies Liu] fix memory leak in collect()
2015-03-09 16:24:06 -07:00
GuoQiang Li 05cb6b34d8 [Minor] Resolve sbt warnings: postfix operator second should be enabled
Resolve sbt warnings:

```
[warn] spark/streaming/src/main/scala/org/apache/spark/streaming/util/WriteAheadLogManager.scala:155: postfix operator second should be enabled
[warn] by making the implicit value scala.language.postfixOps visible.
[warn] This can be achieved by adding the import clause 'import scala.language.postfixOps'
[warn] or by setting the compiler option -language:postfixOps.
[warn] See the Scala docs for value scala.language.postfixOps for a discussion
[warn] why the feature should be explicitly enabled.
[warn]         Await.ready(f, 1 second)
[warn]                          ^
```

Author: GuoQiang Li <witgo@qq.com>

Closes #4908 from witgo/sbt_warnings and squashes the following commits:

0629af4 [GuoQiang Li] Resolve sbt warnings: postfix operator second should be enabled
2015-03-06 13:20:20 +00:00
Josh Rosen 424a86a1ed [SPARK-6175] Fix standalone executor log links when ephemeral ports or SPARK_PUBLIC_DNS are used
This patch fixes two issues with the executor log viewing links added in Spark 1.3.  In standalone mode, the log URLs might include a port value of 0 rather than the actual bound port of the UI, which broke the ability to view logs from workers whose web UIs had been configured to bind to ephemeral ports.  In addition, the URLs used workers' local hostnames instead of respecting SPARK_PUBLIC_DNS, which prevented this feature from working properly on Spark EC2 clusters because the links would point to internal DNS names instead of external ones.

I included tests for both of these bugs:

- We now browse to the URLs and verify that they point to the expected pages.
- To test SPARK_PUBLIC_DNS, I changed the code that reads the environment variable to do so via `SparkConf.getenv`, then used a custom SparkConf subclass to mock the environment variable (this pattern is used elsewhere in Spark's tests).

Author: Josh Rosen <joshrosen@databricks.com>

Closes #4903 from JoshRosen/SPARK-6175 and squashes the following commits:

5577f41 [Josh Rosen] Remove println
cfec135 [Josh Rosen] Use webUi.boundPort and publicAddress in log links
27918c7 [Josh Rosen] Add failing unit tests for standalone log URL viewing
c250fbe [Josh Rosen] Respect SparkConf in local-cluster Workers.
422a2ef [Josh Rosen] Use conf.getenv to read SPARK_PUBLIC_DNS
2015-03-05 12:04:00 -08:00
Marcelo Vanzin 3a35a0dfe9 [SPARK-6144] [core] Fix addFile when source files are on "hdfs:"
The code failed in two modes: it complained when it tried to re-create a directory that already existed, and it was placing some files in the wrong parent directory. The patch fixes both issues.

Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: trystanleftwich <trystan@atscale.com>

Closes #4894 from vanzin/SPARK-6144 and squashes the following commits:

100b3a1 [Marcelo Vanzin] Style fix.
58266aa [Marcelo Vanzin] Fix fetchHcfs file for directories.
91733b7 [trystanleftwich] [SPARK-6144]When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail
2015-03-04 12:58:39 -08:00
Zhang, Liye f6773edce0 [SPARK-6107][CORE] Display inprogress application information for event log history for standalone mode
when application is finished running abnormally (Ctrl + c for example), the history event log file is still ends with `.inprogress` suffix. And the application state can not be showed on webUI, User can only see "*Application history not foud xxxx, Application xxx is still in progress*".

For application that not finished normally, the history will show:
![image](https://cloud.githubusercontent.com/assets/4716022/6437137/184f9fc0-c0f5-11e4-88cc-a2eb087e4561.png)

Author: Zhang, Liye <liye.zhang@intel.com>

Closes #4848 from liyezhang556520/showLogInprogress and squashes the following commits:

03589ac [Zhang, Liye] change inprogress to in progress
b55f19f [Zhang, Liye] scala modify after rebase
8aa66a2 [Zhang, Liye] use softer wording
b030bd4 [Zhang, Liye] clean code
79c8cb1 [Zhang, Liye] fix some mistakes
11cdb68 [Zhang, Liye] add a missing space
c29205b [Zhang, Liye] refine code according to sean owen's comments
e9952a7 [Zhang, Liye] scala style fix again
150502d [Zhang, Liye] scala style fix
f11a5da [Zhang, Liye] small fix for file path
22e878b [Zhang, Liye] enable in progress eventlog file
2015-03-04 12:28:27 +00:00
tedyu 8d3e2414d4 SPARK-6085 Increase default value for memory overhead
Author: tedyu <yuzhihong@gmail.com>

Closes #4836 from tedyu/master and squashes the following commits:

d65b495 [tedyu] SPARK-6085 Increase default value for memory overhead
1fdd4df [tedyu] SPARK-6085 Increase default value for memory overhead
2015-03-04 11:00:52 +00:00
Andrew Or d334bfbcf3 [SPARK-6132][HOTFIX] ContextCleaner InterruptedException should be quiet
If the cleaner is stopped, we shouldn't print a huge stack trace when the cleaner thread is interrupted because we purposefully did this.

Author: Andrew Or <andrew@databricks.com>

Closes #4882 from andrewor14/cleaner-interrupt and squashes the following commits:

8652120 [Andrew Or] Just a hot fix
2015-03-03 20:49:45 -08:00
Imran Rashid 1f1fccc5ce [SPARK-5949] HighlyCompressedMapStatus needs more classes registered w/ kryo
https://issues.apache.org/jira/browse/SPARK-5949

Author: Imran Rashid <irashid@cloudera.com>

Closes #4877 from squito/SPARK-5949_register_roaring_bitmap and squashes the following commits:

7e13316 [Imran Rashid] style style style
5f6bb6d [Imran Rashid] more style
709bfe0 [Imran Rashid] style
a5cb744 [Imran Rashid] update tests to cover both types of RoaringBitmapContainers
09610c6 [Imran Rashid] formatting
f9a0b7c [Imran Rashid] put primitive array registrations together
97beaf8 [Imran Rashid] SPARK-5949 HighlyCompressedMapStatus needs more classes registered w/ kryo
2015-03-03 15:33:19 -08:00
Andrew Or 6c20f35290 [SPARK-6133] Make sc.stop() idempotent
Before we would get the following (benign) error if we called `sc.stop()` twice. This is because the listener bus would try to post the end event again even after it has already stopped. This happens occasionally when flaky tests fail, usually as a result of other sources of error. Either way we shouldn't be logging this error when it is not the cause of the failure.
```
ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerApplicationEnd(1425348445682)
```

Author: Andrew Or <andrew@databricks.com>

Closes #4871 from andrewor14/sc-stop and squashes the following commits:

a14afc5 [Andrew Or] Move code after code
915db16 [Andrew Or] Move code into code
2015-03-03 15:09:57 -08:00
Andrew Or fe63e82291 [SPARK-6132] ContextCleaner race condition across SparkContexts
The problem is that `ContextCleaner` may clean variables that belong to a different `SparkContext`. This can happen if the `SparkContext` to which the cleaner belongs stops, and a new one is started immediately afterwards in the same JVM. In this case, if the cleaner is in the middle of cleaning a broadcast, for instance, it will do so through `SparkEnv.get.blockManager`, which could be one that belongs to a different `SparkContext`.

JoshRosen and I suspect that this is the cause of many flaky tests, most notably the `JavaAPISuite`. We were able to reproduce the failure locally (though it is not deterministic and very hard to reproduce).

Author: Andrew Or <andrew@databricks.com>

Closes #4869 from andrewor14/cleaner-masquerade and squashes the following commits:

29168c0 [Andrew Or] Synchronize ContextCleaner stop
2015-03-03 13:44:05 -08:00
Andrew Or 9af001749a Revert "[SPARK-5423][Core] Cleanup resources in DiskMapIterator.finalize to ensure deleting the temp file"
This reverts commit 90095bf3ce.
2015-03-03 13:03:52 -08:00
Wenchen Fan e359794cec [SPARK-6138][CORE][minor] enhance the toArray method in SizeTrackingVector
Use array copy instead of `Iterator#toArray` to make it more efficient.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #4825 from cloud-fan/minor and squashes the following commits:

c933ee5 [Wenchen Fan] make toArray method just in parent
946a35b [Wenchen Fan] minor enhance
2015-03-03 12:12:23 +00:00
hushan[胡珊] e3a88d1104 [SPARK-4777][CORE] Some block memory after unrollSafely not count into used memory(memoryStore.entrys or unrollMemory)
Some memory not count into memory used by memoryStore or unrollMemory.
Thread A after unrollsafely memory, it will release 40MB unrollMemory(40MB will used by other threads). then ThreadA wait get accountingLock to tryToPut blockA(30MB). before Thread A get accountingLock, blockA memory size is not counting into unrollMemory or memoryStore.currentMemory.
IIUC, freeMemory should minus that block memory

So, put this release memory into pending, and release it in tryToPut before ensureSpace

Author: hushan[胡珊] <hushan@xiaomi.com>

Closes #3629 from suyanNone/unroll-memory and squashes the following commits:

809cc41 [hushan[胡珊]] Refine
407b2c9 [hushan[胡珊]] Refine according comments
39960d0 [hushan[胡珊]] Refine comments
0fd0213 [hushan[胡珊]] add comments
0fc2bec [hushan[胡珊]] Release pending unroll memory after put block in memoryStore
3a3f2c8 [hushan[胡珊]] Refine blockManagerSuite unroll test
3323c45 [hushan[胡珊]] Refine getOrElse
f664317 [hushan[胡珊]] Make sure not add pending in every releaseUnrollMemory call
08b32ba [hushan[胡珊]] Pending unroll memory for this block untill tryToPut
2015-03-02 16:53:54 -08:00
Andrew Or 258d154c9f [SPARK-6048] SparkConf should not translate deprecated configs on set
There are multiple issues with translating on set outlined in the JIRA.

This PR reverts the translation logic added to `SparkConf`. In the future, after the 1.3.0 release we will figure out a way to reorganize the internal structure more elegantly. For now, let's preserve the existing semantics of `SparkConf` since it's a public interface. Unfortunately this means duplicating some code for now, but this is all internal and we can always clean it up later.

Author: Andrew Or <andrew@databricks.com>

Closes #4799 from andrewor14/conf-set-translate and squashes the following commits:

11c525b [Andrew Or] Move warning to driver
10e77b5 [Andrew Or] Add documentation for deprecation precedence
a369cb1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into conf-set-translate
c26a9e3 [Andrew Or] Revert all translate logic in SparkConf
fef6c9c [Andrew Or] Restore deprecation logic for spark.executor.userClassPathFirst
94b4dfa [Andrew Or] Translate on get, not set
2015-03-02 16:36:42 -08:00
Andrew Or 6776cb33ea [SPARK-6066] Make event log format easier to parse
Some users have reported difficulty in parsing the new event log format. Since we embed the metadata in the beginning of the file, when we compress the event log we need to skip the metadata because we need that information to parse the log later. This means we'll end up with a partially compressed file if event logging compression is turned on. The old format looks like:
```
sparkVersion = 1.3.0
compressionCodec = org.apache.spark.io.LZFCompressionCodec
=== LOG_HEADER_END ===
// actual events, could be compressed bytes
```
The new format in this patch puts the compression codec in the log file name instead. It also removes the metadata header altogether along with the Spark version, which was not needed. The new file name looks something like:
```
app_without_compression
app_123.lzf
app_456.snappy
```

I tested this with and without compression, using different compression codecs and event logging directories. I verified that both the `Master` and the `HistoryServer` can render both compressed and uncompressed logs as before.

Author: Andrew Or <andrew@databricks.com>

Closes #4821 from andrewor14/event-log-format and squashes the following commits:

8511141 [Andrew Or] Fix test
654883d [Andrew Or] Add back metadata with Spark version
7f537cd [Andrew Or] Address review feedback
7d6aa61 [Andrew Or] Make codec an extension
59abee9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-format
27c9a6c [Andrew Or] Address review feedback
519e51a [Andrew Or] Address review feedback
ef69276 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-format
88a091d [Andrew Or] Add tests for new format and file name
f32d8d2 [Andrew Or] Fix tests
8db5a06 [Andrew Or] Embed metadata in the event log file name instead
2015-03-02 16:34:32 -08:00
guliangliang 26c1c56dea [SPARK-5522] Accelerate the Histroty Server start
When starting the history server, all the log files will be fetched and parsed in order to get the applications' meta data e.g. App Name, Start Time, Duration, etc. In our production cluster, there exist 2600 log files (160G) in HDFS and it costs 3 hours to restart the history server, which is a little bit too long for us.

It would be better, if the history server can show logs with missing information during start-up and fill the missing information after fetching and parsing a log file.

Author: guliangliang <guliangliang@qiyi.com>

Closes #4525 from marsishandsome/Spark5522 and squashes the following commits:

a865c11 [guliangliang] fix bug2
4340c2b [guliangliang] fix bug
af92a5a [guliangliang] [SPARK-5522] Accelerate the Histroty Server start
2015-03-02 15:33:23 -08:00
Sean Owen 948c2390ab SPARK-3357 [CORE] Internal log messages should be set at DEBUG level instead of INFO
Demote some 'noisy' log messages to debug level. I added a few more, to include everything that gets logged in stanzas like this:

```
15/03/01 00:03:54 INFO BlockManager: Removing broadcast 0
15/03/01 00:03:54 INFO BlockManager: Removing block broadcast_0_piece0
15/03/01 00:03:54 INFO MemoryStore: Block broadcast_0_piece0 of size 839 dropped from memory (free 277976091)
15/03/01 00:03:54 INFO BlockManagerInfo: Removed broadcast_0_piece0 on localhost:49524 in memory (size: 839.0 B, free: 265.1 MB)
15/03/01 00:03:54 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/01 00:03:54 INFO BlockManager: Removing block broadcast_0
15/03/01 00:03:54 INFO MemoryStore: Block broadcast_0 of size 1088 dropped from memory (free 277977179)
15/03/01 00:03:54 INFO ContextCleaner: Cleaned broadcast 0
```

as well as regular messages like

```
15/03/01 00:02:33 INFO MemoryStore: ensureFreeSpace(2640) called with curMem=47322, maxMem=278019440
```

WDYT? good or should some be left alone?

CC mengxr who suggested some of this.

Author: Sean Owen <sowen@cloudera.com>

Closes #4838 from srowen/SPARK-3357 and squashes the following commits:

dce75c1 [Sean Owen] Back out some debug level changes
d9b784d [Sean Owen] Demote some 'noisy' log messages to debug level
2015-03-02 08:51:03 +00:00
Josh Rosen 2df5f1f006 [SPARK-6075] Fix bug in that caused lost accumulator updates: do not store WeakReferences in localAccums map
This fixes a non-deterministic bug introduced in #4021 that could cause tasks' accumulator updates to be lost.  The problem is that `localAccums` should not hold weak references: after the task finishes running there won't be any strong references to these local accumulators, so they can get garbage-collected before the executor reads the `localAccums` map.  We don't need weak references here anyways, since this map is cleared at the end of each task.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #4835 from JoshRosen/SPARK-6075 and squashes the following commits:

4f4b5b2 [Josh Rosen] Remove defensive assertions that caused test failures in code unrelated to this change
120c7b0 [Josh Rosen] [SPARK-6075] Do not store WeakReferences in localAccums map
2015-02-28 22:51:01 -08:00
Evan Yu 643300a6e2 SPARK-5984: Fix TimSort bug causes ArrayOutOfBoundsException
Fix TimSort bug which causes a ArrayOutOfBoundsException.

Using the proposed fix here
http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/

Author: Evan Yu <ehotou@gmail.com>

Closes #4804 from hotou/SPARK-5984 and squashes the following commits:

3421b6c [Evan Yu] SPARK-5984: Add info to LICENSE
e61c6b8 [Evan Yu] SPARK-5984: Fix license and document
6ccc280 [Evan Yu] SPARK-5984: Add License header to file
e06c0d2 [Evan Yu] SPARK-5984: Add License header to file
4d95f75 [Evan Yu] SPARK-5984: Fix TimSort bug causes ArrayOutOfBoundsException
479a106 [Evan Yu] SPARK-5984: Fix TimSort bug causes ArrayOutOfBoundsException
2015-02-28 18:55:34 -08:00
Sean Owen 86fcdaef62 SPARK-1965 [WEBUI] Spark UI throws NPE on trying to load the app page for non-existent app
Don't throw NPE if appId is unknown. kayousterhout is this a decent enough band-aid for avoiding a full-blown NPE? it should just render empty content instead

Author: Sean Owen <sowen@cloudera.com>

Closes #4777 from srowen/SPARK-1965 and squashes the following commits:

7e16590 [Sean Owen] Update app not found message
cb878d6 [Sean Owen] Return basic "not found" page for unknown appId
d8270da [Sean Owen] Don't throw NPE if appId is unknown
2015-02-28 15:34:08 +00:00
Sean Owen f91298e2c5 SPARK-5983 [WEBUI] Don't respond to HTTP TRACE in HTTP-based UIs
Disallow TRACE HTTP method in servlets

Author: Sean Owen <sowen@cloudera.com>

Closes #4765 from srowen/SPARK-5983 and squashes the following commits:

421b25b [Sean Owen] Disallow TRACE HTTP method in servlets
2015-02-28 15:23:59 +00:00
Burak Yavuz 6d8e5fbc0d [SPARK-5979][SPARK-6032] Smaller safer --packages fix
pwendell tdas
This is the safer parts of PR #4754:
 - SPARK-5979: All dependencies with the groupId `org.apache.spark` passed through `--packages`, were being excluded from the dependency tree on the assumption that they would be in the assembly jar. This is not the case, therefore the exclusion rules had to be defined more explicitly.
 - SPARK-6032: Ivy prints a whole lot of logs while retrieving dependencies. These were printed to `System.out`. Moved the logging to `System.err`.

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #4802 from brkyvz/simple-streaming-fix and squashes the following commits:

e0f38cb [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into simple-streaming-fix
bad921c [Burak Yavuz] [SPARK-5979][SPARK-6032] Smaller safer fix
2015-02-27 22:59:35 -08:00
Zhang, Liye 8cd1692c90 [SPARK-6036][CORE] avoid race condition between eventlogListener and akka actor system
For detail description, pls refer to [SPARK-6036](https://issues.apache.org/jira/browse/SPARK-6036).

Author: Zhang, Liye <liye.zhang@intel.com>

Closes #4785 from liyezhang556520/EventLogInProcess and squashes the following commits:

8b0b0a6 [Zhang, Liye] stop listener after DAGScheduler
79b15b3 [Zhang, Liye] SPARK-6036 avoid race condition between eventlogListener and akka actor system
2015-02-26 23:11:43 -08:00
Andrew Or 7c99a014fb [SPARK-6046] Privatize SparkConf.translateConfKey
The warning of deprecated configs is actually done when the configs are set, not when they are get. As a result we don't need to explicitly call `translateConfKey` outside of `SparkConf` just to print the warning again in vain.

Author: Andrew Or <andrew@databricks.com>

Closes #4797 from andrewor14/warn-deprecated-config and squashes the following commits:

8fb43e6 [Andrew Or] Privatize SparkConf.translateConfKey
2015-02-26 22:39:46 -08:00
jerryshao 67595eb8fb [SPARK-5495][UI] Add app and driver kill function in master web UI
Add application kill function in master web UI for standalone mode. Details can be seen in [SPARK-5495](https://issues.apache.org/jira/browse/SPARK-5495).

The snapshot of UI shows as below:
![snapshot](https://dl.dropboxusercontent.com/u/19230832/master_ui.png)

Please help to review, thanks a lot.

Author: jerryshao <saisai.shao@intel.com>

Closes #4288 from jerryshao/SPARK-5495 and squashes the following commits:

fa3e486 [jerryshao] Add some conditions
9a7be93 [jerryshao] Add kill Driver function
a239776 [jerryshao] Change the code format
ff5195d [jerryshao] Add app kill function in master web UI
2015-02-26 22:36:48 -08:00
jerryshao 12135e9054 [SPARK-5771][UI][hotfix] Change Requested Cores into * if default cores is not set
cc andrewor14, srowen.

Author: jerryshao <saisai.shao@intel.com>

Closes #4800 from jerryshao/SPARK-5771 and squashes the following commits:

a2483c2 [jerryshao] Change the UI of Requested Cores into * if default cores is not set
2015-02-26 22:35:43 -08:00
Hong Shen 18f2098433 [SPARK-5529][CORE]Add expireDeadHosts in HeartbeatReceiver
If a blockManager has not send heartBeat more than 120s, BlockManagerMasterActor will remove it. But coarseGrainedSchedulerBackend can only remove executor after an DisassociatedEvent.  We should expireDeadHosts at HeartbeatReceiver.

Author: Hong Shen <hongshen@tencent.com>

Closes #4363 from shenh062326/my_change3 and squashes the following commits:

2c9a46a [Hong Shen] Change some code style.
1a042ff [Hong Shen] Change some code style.
2dc456e [Hong Shen] Change some code style.
d221493 [Hong Shen] Fix test failed
7448ac6 [Hong Shen] A minor change in sparkContext and heartbeatReceiver
b904aed [Hong Shen] Fix failed test
52725af [Hong Shen] Remove assert in SparkContext.killExecutors
5bedcb8 [Hong Shen] Remove assert in SparkContext.killExecutors
a858fb5 [Hong Shen] A minor change in HeartbeatReceiver
3e221d9 [Hong Shen] A minor change in HeartbeatReceiver
6bab7aa [Hong Shen] Change a code style.
07952f3 [Hong Shen] Change configs name and code style.
ce9257e [Hong Shen] Fix test failed
bccd515 [Hong Shen] Fix test failed
8e77408 [Hong Shen] Fix test failed
c1dfda1 [Hong Shen] Fix test failed
e197e20 [Hong Shen] Fix test failed
fb5df97 [Hong Shen] Remove ExpireDeadHosts in BlockManagerMessages
b5c0441 [Hong Shen] Remove expireDeadHosts in BlockManagerMasterActor
c922cb0 [Hong Shen] Add expireDeadHosts in HeartbeatReceiver
2015-02-26 18:43:23 -08:00
Sean Owen fbc469473d SPARK-4579 [WEBUI] Scheduling Delay appears negative
Ensure scheduler delay handles unfinished task case, and ensure delay is never negative even due to rounding

Author: Sean Owen <sowen@cloudera.com>

Closes #4796 from srowen/SPARK-4579 and squashes the following commits:

ad6713c [Sean Owen] Ensure scheduler delay handles unfinished task case, and ensure delay is never negative even due to rounding
2015-02-26 17:35:09 -08:00
tedyu e60ad2f4c4 SPARK-6045 RecordWriter should be checked against null in PairRDDFunctio...
...ns#saveAsNewAPIHadoopDataset

Author: tedyu <yuzhihong@gmail.com>

Closes #4794 from tedyu/master and squashes the following commits:

2632a57 [tedyu] SPARK-6045 RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset
2d8d4b1 [tedyu] SPARK-6045 RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset
2015-02-26 23:27:09 +00:00
Sean Owen 3fb53c0298 SPARK-4300 [CORE] Race condition during SparkWorker shutdown
Close appender saving stdout/stderr before destroying process to avoid exception on reading closed input stream.
(This also removes a redundant `waitFor()` although it was harmless)

CC tdas since I think you wrote this method.

Author: Sean Owen <sowen@cloudera.com>

Closes #4787 from srowen/SPARK-4300 and squashes the following commits:

e0cdabf [Sean Owen] Close appender saving stdout/stderr before destroying process to avoid exception on reading closed input stream
2015-02-26 14:08:56 -08:00
xukun 00228947 8942b522d8 [SPARK-3562]Periodic cleanup event logs
Author: xukun 00228947 <xukun.xu@huawei.com>

Closes #4214 from viper-kun/cleaneventlog and squashes the following commits:

7a5b9c5 [xukun 00228947] fix issue
31674ee [xukun 00228947] fix issue
6e3d06b [xukun 00228947] fix issue
373f3b9 [xukun 00228947] fix issue
71782b5 [xukun 00228947] fix issue
5b45035 [xukun 00228947] fix issue
70c28d6 [xukun 00228947] fix issues
adcfe86 [xukun 00228947] Periodic cleanup event logs
2015-02-26 13:24:00 -08:00
Sean Owen cd5c8d7bbd SPARK-4704 [CORE] SparkSubmitDriverBootstrap doesn't flush output
Join on output threads to make sure any lingering output from process reaches stdout, stderr before exiting

CC andrewor14 since I believe he created this section of code

Author: Sean Owen <sowen@cloudera.com>

Closes #4788 from srowen/SPARK-4704 and squashes the following commits:

ad7114e [Sean Owen] Join on output threads to make sure any lingering output from process reaches stdout, stderr before exiting
2015-02-26 12:56:54 -08:00
Davies Liu 7fa960e653 [SPARK-5363] Fix bug in PythonRDD: remove() inside iterator is not safe
Removing elements from a mutable HashSet while iterating over it can cause the
iteration to incorrectly skip over entries that were not removed. If this
happened, PythonRDD would write fewer broadcast variables than the Python
worker was expecting to read, which would cause the Python worker to hang
indefinitely.

Author: Davies Liu <davies@databricks.com>

Closes #4776 from davies/fix_hang and squashes the following commits:

a4384a5 [Davies Liu] fix bug: remvoe() inside iterator is not safe
2015-02-26 11:54:27 -08:00
Marcelo Vanzin df3d559b32 [SPARK-5801] [core] Avoid creating nested directories.
Cache the value of the local root dirs to use for storing local data,
so that the same directories are reused.

Also, to avoid an extra level of nesting, use a different env variable
to propagate the local dirs from the Worker to the executors. And make
the executor directory use a different name.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #4747 from vanzin/SPARK-5801 and squashes the following commits:

e0114e1 [Marcelo Vanzin] Update unit test.
18ee0a7 [Marcelo Vanzin] [SPARK-5801] [core] Avoid creating nested directories.
2015-02-26 17:35:03 +00:00
Judy Nash 51a6f9097b [SPARK-5914] to run spark-submit requiring only user perm on windows
Because windows on-default does not grant read permission to jars except to admin, spark-submit would fail with "ClassNotFound" exception if user runs slave service with only user permission.
This fix is to add read permission to owner of the jar (which would be the slave service account in windows )

Author: Judy Nash <judynash@microsoft.com>

Closes #4742 from judynash/SPARK-5914 and squashes the following commits:

e288e56 [Judy Nash] Fix spacing and refactor code
1de3c0e [Judy Nash] [SPARK-5914] Enable spark-submit to run requiring only user permission on windows
2015-02-26 11:15:27 +00:00
Kay Ousterhout 838a48036c [SPARK-5982] Remove incorrect Local Read Time Metric
This metric is incomplete, because the files are memory mapped, so much of the read from disk occurs later as tasks actually read the file's data.

This should be merged into 1.3, so that we never expose this incorrect metric to users.

CC pwendell ksakellis sryza

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #4749 from kayousterhout/SPARK-5982 and squashes the following commits:

9737b5e [Kay Ousterhout] More fixes
a1eb300 [Kay Ousterhout] Removed one more use of local read time
cf13497 [Kay Ousterhout] [SPARK-5982] Remove incorrectwq Local Read Time Metric
2015-02-25 14:55:24 -08:00
Milan Straka a777c65da9 [SPARK-5970][core] Register directory created in getOrCreateLocalRootDirs for automatic deletion.
As documented in createDirectory, the result of createDirectory is not registered for automatic removal. Currently there are 4 directories left in `/tmp` after just running `pyspark`.

Author: Milan Straka <fox@ucw.cz>

Closes #4759 from foxik/remove-tmp-dirs and squashes the following commits:

280450d [Milan Straka] Use createTempDir in getOrCreateLocalRootDirs...
2015-02-25 21:33:34 +00:00
guliangliang dd077abf2e [SPARK-5771] Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called
In Standalone mode, the number of cores in Completed Applications of the Master Web Page will always be zero, if sc.stop() is called.
But the number will always be right, if sc.stop() is not called.
The reason maybe:
after sc.stop() is called, the function removeExecutor of class ApplicationInfo will be called, thus reduce the variable coresGranted to zero. The variable coresGranted is used to display the number of Cores on the Web Page.

Author: guliangliang <guliangliang@qiyi.com>

Closes #4567 from marsishandsome/Spark5771 and squashes the following commits:

694796e [guliangliang] remove duplicate code
a20e390 [guliangliang] change to Cores Using & Requested
0c19c95 [guliangliang] change Cores to Cores (max)
cfbd97d [guliangliang] [SPARK-5771] Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called
2015-02-25 14:48:02 +00:00
Judy c5ba975ee8 [Spark-5708] Add Slf4jSink to Spark Metrics
Add Slf4jSink to Spark Metrics using Coda Hale's SlfjReporter.
This sends metrics to log4j, allowing spark users to reuse log4j pipeline for metrics collection.

Reviewed existing unit tests and didn't see any sink-related tests. Please advise on if tests should be added.

Author: Judy <judynash@microsoft.com>
Author: judynash <judynash@microsoft.com>

Closes #4644 from judynash/master and squashes the following commits:

57ef214 [judynash] doc clarification and indent fixes
a751a66 [Judy] Spark-5708: Add Slf4jSink to Spark Metrics
2015-02-24 20:50:16 +00:00
Andrew Or 6d2caa576f [SPARK-5965] Standalone Worker UI displays {{USER_JAR}}
For screenshot see: https://issues.apache.org/jira/browse/SPARK-5965
This was caused by 20a6013106.

Author: Andrew Or <andrew@databricks.com>

Closes #4739 from andrewor14/user-jar-blocker and squashes the following commits:

23c4a9e [Andrew Or] Use right argument
2015-02-24 11:08:07 -08:00