Commit graph

4706 commits

Author SHA1 Message Date
jerryshao 9407baa2a7 [SPARK-9877] [CORE] Fix StandaloneRestServer NPE when submitting application
Detailed exception log can be seen in [SPARK-9877](https://issues.apache.org/jira/browse/SPARK-9877), the problem is when creating `StandaloneRestServer`, `self` (`masterEndpoint`) is null.  So this fix is creating `StandaloneRestServer` when `self` is available.

Author: jerryshao <sshao@hortonworks.com>

Closes #8127 from jerryshao/SPARK-9877.
2015-08-14 13:44:38 -07:00
Andrew Or 6518ef6303 [SPARK-9948] Fix flaky AccumulatorSuite - internal accumulators
In these tests, we use a custom listener and we assert on fields in the stage / task completion events. However, these events are posted in a separate thread so they're not guaranteed to be posted in time. This commit fixes this flakiness through a job end registration callback.

Author: Andrew Or <andrew@databricks.com>

Closes #8176 from andrewor14/fix-accumulator-suite.
2015-08-14 13:42:53 -07:00
Carson Wang 33bae585d4 [SPARK-9809] Task crashes because the internal accumulators are not properly initialized
When a stage failed and another stage was resubmitted with only part of partitions to compute, all the tasks failed with error message: java.util.NoSuchElementException: key not found: peakExecutionMemory.
This is because the internal accumulators are not properly initialized for this stage while other codes assume the internal accumulators always exist.

Author: Carson Wang <carson.wang@intel.com>

Closes #8090 from carsonwang/SPARK-9809.
2015-08-14 13:38:25 -07:00
Neelesh Srinivas Salian 57c2d08800 [SPARK-9923] [CORE] ShuffleMapStage.numAvailableOutputs should be an Int instead of Long
Modified type of ShuffleMapStage.numAvailableOutputs from Long to Int

Author: Neelesh Srinivas Salian <nsalian@cloudera.com>

Closes #8183 from nssalian/SPARK-9923.
2015-08-14 20:03:50 +01:00
Davies Liu bd35385d53 [SPARK-9945] [SQL] pageSize should be calculated from executor.memory
Currently, pageSize of TungstenSort is calculated from driver.memory, it should use executor.memory instead.

Also, in the worst case, the safeFactor could be 4 (because of rounding), increase it to 16.

cc rxin

Author: Davies Liu <davies@databricks.com>

Closes #8175 from davies/page_size.
2015-08-13 21:12:59 -07:00
Andrew Or 8815ba2f67 [SPARK-9649] Fix MasterSuite, third time's a charm
This particular test did not load the default configurations so
it continued to start the REST server, which causes port bind
exceptions.
2015-08-13 11:31:10 -07:00
Davies Liu a8ab2634c1 [SPARK-9832] [SQL] add a thread-safe lookup for BytesToBytseMap
This patch add a thread-safe lookup for BytesToBytseMap, and use that in broadcasted HashedRelation.

Author: Davies Liu <davies@databricks.com>

Closes #8151 from davies/safeLookup.
2015-08-12 21:26:00 -07:00
Josh Rosen 7b13ed27c1 [SPARK-9870] Disable driver UI and Master REST server in SparkSubmitSuite
I think that we should pass additional configuration flags to disable the driver UI and Master REST server in SparkSubmitSuite and HiveSparkSubmitSuite. This might cut down on port-contention-related flakiness in Jenkins.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8124 from JoshRosen/disable-ui-in-sparksubmitsuite.
2015-08-12 18:52:11 -07:00
Rohit Agarwal 0d1d146c22 [SPARK-9724] [WEB UI] Avoid unnecessary redirects in the Spark Web UI.
Author: Rohit Agarwal <rohita@qubole.com>

Closes #8014 from mindprince/SPARK-9724 and squashes the following commits:

a7af5ff [Rohit Agarwal] [SPARK-9724] [WEB UI] Inline attachPrefix and attachPrefixForRedirect. Fix logic of attachPrefix
8a977cd [Rohit Agarwal] [SPARK-9724] [WEB UI] Address review comments: Remove unneeded code, update scaladoc.
b257844 [Rohit Agarwal] [SPARK-9724] [WEB UI] Avoid unnecessary redirects in the Spark Web UI.
2015-08-12 17:48:43 -07:00
Michel Lemay ab7e721cfe [SPARK-9826] [CORE] Fix cannot use custom classes in log4j.properties
Refactor Utils class and create ShutdownHookManager.

NOTE: Wasn't able to run /dev/run-tests on windows machine.
Manual tests were conducted locally using custom log4j.properties file with Redis appender and logstash formatter (bundled in the fat-jar submitted to spark)

ex:
log4j.rootCategory=WARN,console,redis
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.spark.graphx.Pregel=INFO

log4j.appender.redis=com.ryantenney.log4j.FailoverRedisAppender
log4j.appender.redis.endpoints=hostname:port
log4j.appender.redis.key=mykey
log4j.appender.redis.alwaysBatch=false
log4j.appender.redis.layout=net.logstash.log4j.JSONEventLayoutV1

Author: michellemay <mlemay@gmail.com>

Closes #8109 from michellemay/SPARK-9826.
2015-08-12 16:41:35 -07:00
Niranjan Padmanabhan 738f353988 [SPARK-9092] Fixed incompatibility when both num-executors and dynamic...
… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.

Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com>

Closes #7657 from neurons/SPARK-9092.
2015-08-12 16:10:21 -07:00
Xiangrui Meng 6f60298b1d [SPARK-8967] [DOC] add Since annotation
Add `Since` as a Scala annotation. The benefit is that we can use it without having explicit JavaDoc. This is useful for inherited methods. The limitation is that is doesn't show up in the generated Java API documentation. This might be fixed by modifying genjavadoc. I think we could leave it as a TODO.

This is how the generated Scala doc looks:

`since` JavaDoc tag:

![screen shot 2015-08-11 at 10 00 37 pm](https://cloud.githubusercontent.com/assets/829644/9230761/fa72865c-40d8-11e5-807e-0f3c815c5acd.png)

`Since` annotation:

![screen shot 2015-08-11 at 10 00 28 pm](https://cloud.githubusercontent.com/assets/829644/9230764/0041d7f4-40d9-11e5-8124-c3f3e5d5b31f.png)

rxin

Author: Xiangrui Meng <meng@databricks.com>

Closes #8131 from mengxr/SPARK-8967.
2015-08-12 14:28:23 -07:00
Andrew Or e0110792ef [SPARK-9747] [SQL] Avoid starving an unsafe operator in aggregation
This is the sister patch to #8011, but for aggregation.

In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance.

rxin yhuai

Author: Andrew Or <andrew@databricks.com>

Closes #8038 from andrewor14/unsafe-starve-memory-agg.
2015-08-12 10:08:35 -07:00
Andrew Or be5d191207 [SPARK-9795] Dynamic allocation: avoid double counting when killing same executor twice
This is based on KaiXinXiaoLei's changes in #7716.

The issue is that when someone calls `sc.killExecutor("1")` on the same executor twice quickly, then the executor target will be adjusted downwards by 2 instead of 1 even though we're only actually killing one executor. In certain cases where we don't adjust the target back upwards quickly, we'll end up with jobs hanging.

This is a common danger because there are many places where this is called:
- `HeartbeatReceiver` kills an executor that has not been sending heartbeats
- `ExecutorAllocationManager` kills an executor that has been idle
- The user code might call this, which may interfere with the previous callers

While it's not clear whether this fixes SPARK-9745, fixing this potential race condition seems like a strict improvement. I've added a regression test to illustrate the issue.

Author: Andrew Or <andrew@databricks.com>

Closes #8078 from andrewor14/da-double-kill.
2015-08-12 09:24:50 -07:00
Tom White 2e680668f7 [SPARK-8625] [CORE] Propagate user exceptions in tasks back to driver
This allows clients to retrieve the original exception from the
cause field of the SparkException that is thrown by the driver.
If the original exception is not in fact Serializable then it will
not be returned, but the message and stacktrace will be. (All Java
Throwables implement the Serializable interface, but this is no
guarantee that a particular implementation can actually be
serialized.)

Author: Tom White <tom@cloudera.com>

Closes #7014 from tomwhite/propagate-user-exceptions.
2015-08-12 10:07:11 -05:00
Timothy Chen 5c99d8bf98 [SPARK-8798] [MESOS] Allow additional uris to be fetched with mesos
Some users like to download additional files in their sandbox that they can refer to from their spark program, or even later mount these files to another directory.

Author: Timothy Chen <tnachen@gmail.com>

Closes #7195 from tnachen/mesos_files.
2015-08-11 23:26:33 -07:00
Carson Wang bab8923285 [SPARK-9426] [WEBUI] Job page DAG visualization is not shown
To reproduce the issue, go to the stage page and click DAG Visualization once, then go to the job page to show the job DAG visualization. You will only see the first stage of the job.
Root cause: the java script use local storage to remember your selection. Once you click the stage DAG visualization, the local storage set `expand-dag-viz-arrow-stage` to true. When you go to the job page, the js checks `expand-dag-viz-arrow-stage` in the local storage first and will try to show stage DAG visualization on the job page.
To fix this, I set an id to the DAG span to differ job page and stage page. In the js code, we check the id and local storage together to make sure we show the correct DAG visualization.

Author: Carson Wang <carson.wang@intel.com>

Closes #8104 from carsonwang/SPARK-9426.
2015-08-11 23:25:02 -07:00
zsxwing 4e3f4b934f [SPARK-9829] [WEBUI] Display the update value for peak execution memory
The peak execution memory is not correct because it shows the sum of finished tasks' values when a task finishes.

This PR fixes it by using the update value rather than the accumulator value.

Author: zsxwing <zsxwing@gmail.com>

Closes #8121 from zsxwing/SPARK-9829.
2015-08-11 23:23:17 -07:00
Rohit Agarwal a807fcbe50 [SPARK-9806] [WEB UI] Don't share ReplayListenerBus between multiple applications
Author: Rohit Agarwal <rohita@qubole.com>

Closes #8088 from mindprince/SPARK-9806.
2015-08-11 23:20:39 -07:00
xutingjun b85f9a242a [SPARK-8366] maxNumExecutorsNeeded should properly handle failed tasks
Author: xutingjun <xutingjun@huawei.com>
Author: meiyoula <1039320815@qq.com>

Closes #6817 from XuTingjun/SPARK-8366.
2015-08-11 23:19:35 -07:00
zsxwing f16bc68dfb [SPARK-9824] [CORE] Fix the issue that InternalAccumulator leaks WeakReference
`InternalAccumulator.create` doesn't call `registerAccumulatorForCleanup` to register itself with ContextCleaner, so `WeakReference`s for these accumulators in `Accumulators.originals` won't be removed.

This PR added `registerAccumulatorForCleanup` for internal accumulators to avoid the memory leak.

Author: zsxwing <zsxwing@gmail.com>

Closes #8108 from zsxwing/internal-accumulators-leak.
2015-08-11 14:06:23 -07:00
Jeff Zhang bce72797f3 Fix comment error
API is updated but its doc comment is not updated.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #8097 from zjffdu/dev.
2015-08-11 10:42:17 -07:00
Reynold Xin d378396f86 [SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform.
PlatformDependent.UNSAFE is way too verbose.

Author: Reynold Xin <rxin@databricks.com>

Closes #8094 from rxin/SPARK-9815 and squashes the following commits:

229b603 [Reynold Xin] [SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform.
2015-08-11 08:41:06 -07:00
Marcelo Vanzin 0f3366a4c7 [SPARK-9710] [TEST] Fix RPackageUtilsSuite when R is not available.
RUtils.isRInstalled throws an exception if R is not installed,
instead of returning false. Fix that.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8008 from vanzin/SPARK-9710 and squashes the following commits:

df72d8c [Marcelo Vanzin] [SPARK-9710] [test] Fix RPackageUtilsSuite when R is not available.
2015-08-10 10:10:40 -07:00
Shivaram Venkataraman 46025616b4 [CORE] [SPARK-9760] Use Option instead of Some for Ivy repos
This was introduced in #7599

cc rxin brkyvz

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8055 from shivaram/spark-packages-repo-fix and squashes the following commits:

890f306 [Shivaram Venkataraman] Remove test case
51d69ee [Shivaram Venkataraman] Add test case for --packages without --repository
c02e0b4 [Shivaram Venkataraman] Use Option instead of Some for Ivy repos
2015-08-09 14:30:30 -07:00
Reynold Xin e9c36938ba [SPARK-9752][SQL] Support UnsafeRow in Sample operator.
In order for this to work, I had to disable gap sampling.

Author: Reynold Xin <rxin@databricks.com>

Closes #8040 from rxin/SPARK-9752 and squashes the following commits:

f9e248c [Reynold Xin] Fix the test case for real this time.
adbccb3 [Reynold Xin] Fixed test case.
589fb23 [Reynold Xin] Merge branch 'SPARK-9752' of github.com:rxin/spark into SPARK-9752
55ccddc [Reynold Xin] Fixed core test.
78fa895 [Reynold Xin] [SPARK-9752][SQL] Support UnsafeRow in Sample operator.
c9e7112 [Reynold Xin] [SPARK-9752][SQL] Support UnsafeRow in Sample operator.
2015-08-09 10:58:36 -07:00
Carson Wang ef062c1599 [SPARK-9731] Standalone scheduling incorrect cores if spark.executor.cores is not set
The issue only happens if `spark.executor.cores` is not set and executor memory is set to a high value.
For example, if we have a worker with 4G and 10 cores and we set `spark.executor.memory` to 3G, then only 1 core is assigned to the executor. The correct number should be 10 cores.
I've added a unit test to illustrate the issue.

Author: Carson Wang <carson.wang@intel.com>

Closes #8017 from carsonwang/SPARK-9731 and squashes the following commits:

d09ec48 [Carson Wang] Fix code style
86b651f [Carson Wang] Simplify the code
943cc4c [Carson Wang] fix scheduling correct cores to executors
2015-08-07 23:36:26 -07:00
Andrew Or 881548ab20 [SPARK-9674] Re-enable ignored test in SQLQuerySuite
The original code that this test tests is removed in 9270bd06fd. It was ignored shortly before that so we never caught it. This patch re-enables the test and adds the code necessary to make it pass.

JoshRosen yhuai

Author: Andrew Or <andrew@databricks.com>

Closes #8015 from andrewor14/SPARK-9674 and squashes the following commits:

225eac2 [Andrew Or] Merge branch 'master' of github.com:apache/spark into SPARK-9674
8c24209 [Andrew Or] Fix NPE
e541d64 [Andrew Or] Track aggregation memory for both sort and hash
0be3a42 [Andrew Or] Fix test
2015-08-07 14:20:13 -07:00
zsxwing ebfd91c542 [SPARK-9467][SQL]Add SQLMetric to specialize accumulators to avoid boxing
This PR adds SQLMetric/SQLMetricParam/SQLMetricValue to specialize accumulators to avoid boxing. All SQL metrics should use these classes rather than `Accumulator`.

Author: zsxwing <zsxwing@gmail.com>

Closes #7996 from zsxwing/sql-accu and squashes the following commits:

14a5f0a [zsxwing] Address comments
367ca23 [zsxwing] Use localValue directly to avoid changing Accumulable
42f50c3 [zsxwing] Add SQLMetric to specialize accumulators to avoid boxing
2015-08-07 00:09:58 -07:00
Davies Liu 15bd6f338d [SPARK-9453] [SQL] support records larger than page size in UnsafeShuffleExternalSorter
This patch follows exactly #7891 (except testing)

Author: Davies Liu <davies@databricks.com>

Closes #8005 from davies/larger_record and squashes the following commits:

f9c4aff [Davies Liu] address comments
9de5c72 [Davies Liu] support records larger than page size in UnsafeShuffleExternalSorter
2015-08-06 23:40:38 -07:00
Reynold Xin 4309262ec9 [SPARK-9700] Pick default page size more intelligently.
Previously, we use 64MB as the default page size, which was way too big for a lot of Spark applications (especially for single node).

This patch changes it so that the default page size, if unset by the user, is determined by the number of cores available and the total execution memory available.

Author: Reynold Xin <rxin@databricks.com>

Closes #8012 from rxin/pagesize and squashes the following commits:

16f4756 [Reynold Xin] Fixed failing test.
5afd570 [Reynold Xin] private...
0d5fb98 [Reynold Xin] Update default value.
674a6cd [Reynold Xin] Address review feedback.
dc00e05 [Reynold Xin] Merge with master.
73ebdb6 [Reynold Xin] [SPARK-9700] Pick default page size more intelligently.
2015-08-06 23:18:29 -07:00
zsxwing 672f467668 [SPARK-8057][Core]Call TaskAttemptContext.getTaskAttemptID using Reflection
Someone may use the Spark core jar in the maven repo with hadoop 1. SPARK-2075 has already resolved the compatibility issue to support it. But `SparkHadoopMapRedUtil.commitTask` broke it recently.

This PR uses Reflection to call `TaskAttemptContext.getTaskAttemptID` to fix the compatibility issue.

Author: zsxwing <zsxwing@gmail.com>

Closes #6599 from zsxwing/SPARK-8057 and squashes the following commits:

f7a343c [zsxwing] Remove the redundant import
6b7f1af [zsxwing] Call TaskAttemptContext.getTaskAttemptID using Reflection
2015-08-06 21:42:42 -07:00
Andrew Or 014a9f9d8c [SPARK-9709] [SQL] Avoid starving unsafe operators that use sort
The issue is that a task may run multiple sorts, and the sorts run by the child operator (i.e. parent RDD) may acquire all available memory such that other sorts in the same task do not have enough to proceed. This manifests itself in an `IOException("Unable to acquire X bytes of memory")` thrown by `UnsafeExternalSorter`.

The solution is to reserve a page in each sorter in the chain before computing the child operator's (parent RDD's) partitions. This requires us to use a new special RDD that does some preparation before computing the parent's partitions.

Author: Andrew Or <andrew@databricks.com>

Closes #8011 from andrewor14/unsafe-starve-memory and squashes the following commits:

35b69a4 [Andrew Or] Simplify test
0b07782 [Andrew Or] Minor: update comments
5d5afdf [Andrew Or] Merge branch 'master' of github.com:apache/spark into unsafe-starve-memory
254032e [Andrew Or] Add tests
234acbd [Andrew Or] Reserve a page in sorter when preparing each partition
b889e08 [Andrew Or] MapPartitionsWithPreparationRDD
2015-08-06 19:04:57 -07:00
Reynold Xin b87825310a [SPARK-9692] Remove SqlNewHadoopRDD's generated Tuple2 and InterruptibleIterator.
A small performance optimization – we don't need to generate a Tuple2 and then immediately discard the key. We also don't need an extra wrapper from InterruptibleIterator.

Author: Reynold Xin <rxin@databricks.com>

Closes #8000 from rxin/SPARK-9692 and squashes the following commits:

1d4d0b3 [Reynold Xin] [SPARK-9692] Remove SqlNewHadoopRDD's generated Tuple2 and InterruptibleIterator.
2015-08-06 18:25:38 -07:00
Marcelo Vanzin e234ea1b49 [SPARK-9645] [YARN] [CORE] Allow shuffle service to read shuffle files.
Spark should not mess with the permissions of directories created
by the cluster manager. Here, by setting the block manager dir
permissions to 700, the shuffle service (running as the YARN user)
wouldn't be able to serve shuffle files created by applications.

Also, the code to protect the local app dir was missing in standalone's
Worker; that has been now added. Since all processes run as the same
user in standalone, `chmod 700` should not cause problems.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #7966 from vanzin/SPARK-9645 and squashes the following commits:

6e07b31 [Marcelo Vanzin] Protect the app dir in standalone mode.
384ba6a [Marcelo Vanzin] [SPARK-9645] [yarn] [core] Allow shuffle service to read shuffle files.
2015-08-06 15:30:27 -07:00
Tathagata Das 0a078303d0 [SPARK-9556] [SPARK-9619] [SPARK-9624] [STREAMING] Make BlockGenerator more robust and make all BlockGenerators subscribe to rate limit updates
In some receivers, instead of using the default `BlockGenerator` in `ReceiverSupervisorImpl`, custom generator with their custom listeners are used for reliability (see [`ReliableKafkaReceiver`](https://github.com/apache/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala#L99) and [updated `KinesisReceiver`](https://github.com/apache/spark/pull/7825/files)). These custom generators do not receive rate updates. This PR modifies the code to allow custom `BlockGenerator`s to be created through the `ReceiverSupervisorImpl` so that they can be kept track and rate updates can be applied.

In the process, I did some simplification, and de-flaki-fication of some rate controller related tests. In particular.
- Renamed `Receiver.executor` to `Receiver.supervisor` (to match `ReceiverSupervisor`)
- Made `RateControllerSuite` faster (by increasing batch interval) and less flaky
- Changed a few internal API to return the current rate of block generators as Long instead of Option\[Long\] (was inconsistent at places).
- Updated existing `ReceiverTrackerSuite` to test that custom block generators get rate updates as well.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #7913 from tdas/SPARK-9556 and squashes the following commits:

41d4461 [Tathagata Das] fix scala style
eb9fd59 [Tathagata Das] Updated kinesis receiver
d24994d [Tathagata Das] Updated BlockGeneratorSuite to use manual clock in BlockGenerator
d70608b [Tathagata Das] Updated BlockGenerator with states and proper synchronization
f6bd47e [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9556
31da173 [Tathagata Das] Fix bug
12116df [Tathagata Das] Add BlockGeneratorSuite
74bd069 [Tathagata Das] Fix style
989bb5c [Tathagata Das] Made BlockGenerator fail is used after stop, and added better unit tests for it
3ff618c [Tathagata Das] Fix test
b40eff8 [Tathagata Das] slight refactoring
f0df0f1 [Tathagata Das] Scala style fixes
51759cb [Tathagata Das] Refactored rate controller tests and added the ability to update rate of any custom block generator
2015-08-06 14:35:30 -07:00
Liang-Chi Hsieh 21fdfd7d6f [SPARK-9548][SQL] Add a destructive iterator for BytesToBytesMap
This pull request adds a destructive iterator to BytesToBytesMap. When used, the iterator frees pages as it traverses them. This is part of the effort to avoid starving when we have more than one operators that can exhaust memory.

This is based on #7924, but fixes a bug there (Don't use destructive iterator in UnsafeKVExternalSorter).

Closes #7924.

Author: Liang-Chi Hsieh <viirya@appier.com>
Author: Reynold Xin <rxin@databricks.com>

Closes #8003 from rxin/map-destructive-iterator and squashes the following commits:

6b618c3 [Reynold Xin] Don't use destructive iterator in UnsafeKVExternalSorter.
a7bd8ec [Reynold Xin] Merge remote-tracking branch 'viirya/destructive_iter' into map-destructive-iterator
7652083 [Liang-Chi Hsieh] For comments: add destructiveIterator(), modify unit test, remove code block.
4a3e9de [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into destructive_iter
581e9e3 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into destructive_iter
f0ff783 [Liang-Chi Hsieh] No need to free last page.
9e9d2a3 [Liang-Chi Hsieh] Add a destructive iterator for BytesToBytesMap.
2015-08-06 14:33:29 -07:00
Yin Huai 4581badbc8 [SPARK-9611] [SQL] Fixes a few corner cases when we spill a UnsafeFixedWidthAggregationMap
This PR has the following three small fixes.

1. UnsafeKVExternalSorter does not use 0 as the initialSize to create an UnsafeInMemorySorter if its BytesToBytesMap is empty.
2. We will not not spill a InMemorySorter if it is empty.
3. We will not add a SpillReader to a SpillMerger if this SpillReader is empty.

JIRA: https://issues.apache.org/jira/browse/SPARK-9611

Author: Yin Huai <yhuai@databricks.com>

Closes #7948 from yhuai/unsafeEmptyMap and squashes the following commits:

9727abe [Yin Huai] Address Josh's comments.
34b6f76 [Yin Huai] 1. UnsafeKVExternalSorter does not use 0 as the initialSize to create an UnsafeInMemorySorter if its BytesToBytesMap is empty. 2. Do not spill a InMemorySorter if it is empty. 3. Do not add spill to SpillMerger if this spill is empty.
2015-08-05 19:19:09 -07:00
Marcelo Vanzin 4399b7b090 [SPARK-9651] Fix UnsafeExternalSorterSuite.
First, it's probably a bad idea to call generated Scala methods
from Java. In this case, the method being called wasn't actually
"Utils.createTempDir()", but actually the method that returns the
first default argument to the actual createTempDir method, which
is just the location of java.io.tmpdir; meaning that all tests in
the class were using the same temp dir, and thus affecting each
other.

Second, spillingOccursInResponseToMemoryPressure was not writing
enough records to actually cause a spill.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #7970 from vanzin/SPARK-9651 and squashes the following commits:

74d357f [Marcelo Vanzin] Clean up temp dir on test tear down.
a64f36a [Marcelo Vanzin] [SPARK-9651] Fix UnsafeExternalSorterSuite.
2015-08-05 17:58:36 -07:00
Andrew Or 5f0fb6466f [SPARK-9649] Fix flaky test MasterSuite - randomize ports
```
Error Message

Failed to bind to: /127.0.0.1:7093: Service 'sparkMaster' failed after 16 retries!
Stacktrace

      java.net.BindException: Failed to bind to: /127.0.0.1:7093: Service 'sparkMaster' failed after 16 retries!
      at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
      at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
      at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
      at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
      at scala.util.Try$.apply(Try.scala:161)
```

Author: Andrew Or <andrew@databricks.com>

Closes #7968 from andrewor14/fix-master-flaky-test and squashes the following commits:

fcc42ef [Andrew Or] Randomize port
2015-08-05 14:12:22 -07:00
Josh Rosen 26b06f1c46 [HOTFIX] Add static import to fix build break from #7676. 2015-08-05 02:40:50 -07:00
zsxwing 1b0317f64c [SPARK-8861][SPARK-8862][SQL] Add basic instrumentation to each SparkPlan operator and add a new SQL tab
This PR includes the following changes:

### SPARK-8862: Add basic instrumentation to each SparkPlan operator

A SparkPlan can override `def accumulators: Map[String, Accumulator[_]]` to expose its metrics that can be displayed in UI. The UI will use them to track the updates and show them in the web page in real-time.

### SparkSQLExecution and SQLSparkListener

`SparkSQLExecution.withNewExecutionId` will set `spark.sql.execution.id` to the local properties so that we can use it to track all jobs that belong to the same query.

SQLSparkListener is a listener to track all accumulator updates of all tasks for a query. It receives them from heartbeats can the UI can query them in real-time.

When running a query, `SQLSparkListener.onExecutionStart` will be called. When a query is finished,  `SQLSparkListener.onExecutionEnd` will be called. And the Spark jobs with the same execution id will be tracked and stored with this query.

`SQLSparkListener` has to store all accumulator updates for tasks separately. When a task fails and starts to retry, we need to drop the old accumulator updates. Because we can not revert our changes to an accumulator, we have to maintain these accumulator updates by ourselves so as to drop accumulator updates for a failed task.

### SPARK-8862: A new SQL tab
Includes two pages:
#### A page for all DataFrame/SQL queries
It will show the running, completed and failed queries in 3 tables. It also displays the jobs and their links for a query in each row.
#### A detail page for a DataFrame/SQL query
In this page, it also shows the SparkPlan metrics in real-time. Run a long-running query, such as
```
val testData = sc.parallelize((1 to 1000000).map(i => (i, i.toString))).toDF()
testData.select($"_1").filter($"_1" < 1000).foreach(_ => Thread.sleep(60))
```
and you will see the metrics keep updating in real-time.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7774)
<!-- Reviewable:end -->

Author: zsxwing <zsxwing@gmail.com>

Closes #7774 from zsxwing/sql-ui and squashes the following commits:

5a2bc99 [zsxwing] Remove UISeleniumSuite and its dependency
57d4cd2 [zsxwing] Use VisibleForTesting annotation
cc1c736 [zsxwing] Add SparkPlan.trackNumOfRowsEnabled to make subclasses easy to track the number of rows; fix the issue that the "save" action cannot collect metrics
3771ab0 [zsxwing] Register SQL metrics accmulators
3a101c0 [zsxwing] Change prepareCalled's type to AtomicBoolean for thread-safety
b8d5605 [zsxwing] Make prepare idempotent; call children's prepare in SparkPlan.prepare; change doPrepare to def
4ed11a1 [zsxwing] var -> val
332639c [zsxwing] Ignore UISeleniumSuite and SQLListenerSuite."no memory leak" because of SPARK-9580
bb52359 [zsxwing] Address other commens in SQLListener
c4d0f5d [zsxwing] Move newPredicate out of the iterator loop
957473c [zsxwing] Move STATIC_RESOURCE_DIR to object SQLTab
7ab4816 [zsxwing] Make SparkPlan accumulator API private[sql]
dae195e [zsxwing] Fix the code style and comments
3a66207 [zsxwing] Ignore irrelevant accumulators
b8484a1 [zsxwing] Merge branch 'master' into sql-ui
9406592 [zsxwing] Implement the SparkPlan viz
4ebce68 [zsxwing] Add SparkPlan.prepare to support BroadcastHashJoin to run background work in parallel
ca1811f [zsxwing] Merge branch 'master' into sql-ui
fef6fc6 [zsxwing] Fix a corner case
25f335c [zsxwing] Fix the code style
6eae828 [zsxwing] SQLSparkListener -> SQLListener; SparkSQLExecutionUIData -> SQLExecutionUIData; SparkSQLExecution -> SQLExecution
822af75 [zsxwing] Add SQLSparkListenerSuite and fix the issue about onExecutionEnd and onJobEnd
6be626f [zsxwing] Add UISeleniumSuite to test UI
d02a24d [zsxwing] Make ExecutionPage private
23abf73 [zsxwing] [SPARK-8862][SPARK-8862][SQL] Add basic instrumentation to each SparkPlan operator and add a new SQL tab
2015-08-05 01:51:22 -07:00
Takeshi YAMAMURO 6d8a6e4161 [SPARK-9360] [SQL] Support BinaryType in PrefixComparators for UnsafeExternalSort
The current implementation of UnsafeExternalSort uses NoOpPrefixComparator for binary-typed data.
So, we need to add BinaryPrefixComparator in PrefixComparators.

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes #7676 from maropu/BinaryTypePrefixComparator and squashes the following commits:

fe6f31b [Takeshi YAMAMURO] Apply comments
d943c04 [Takeshi YAMAMURO] Add a codegen'd entry for BinaryType in SortPrefix
ecf3ac5 [Takeshi YAMAMURO] Support BinaryType in PrefixComparator
2015-08-05 00:56:35 -07:00
Burak Yavuz c9a4c36d05 [SPARK-8313] R Spark packages support
shivaram cafreeman Could you please help me in testing this out? Exposing and running `rPackageBuilder` from inside the shell works, but for some reason, I can't get it to work during Spark Submit. It just starts relaunching Spark Submit.

For testing, you may use the R branch with [sbt-spark-package](https://github.com/databricks/sbt-spark-package). You can call spPackage, and then pass the jar using `--jars`.

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #7139 from brkyvz/r-submit and squashes the following commits:

0de384f [Burak Yavuz] remove unused imports 2
d253708 [Burak Yavuz] removed unused imports
6603d0d [Burak Yavuz] addressed comments
4258ffe [Burak Yavuz] merged master
ddfcc06 [Burak Yavuz] added zipping test
3a1be7d [Burak Yavuz] don't zip
77995df [Burak Yavuz] fix URI
ac45527 [Burak Yavuz] added zipping of all libs
e6bf7b0 [Burak Yavuz] add println ignores
1bc5554 [Burak Yavuz] add assumes for tests
9778e03 [Burak Yavuz] addressed comments
b42b300 [Burak Yavuz] merged master
ffd134e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit
d867756 [Burak Yavuz] add apache header
eff5ba1 [Burak Yavuz] ready for review
8838edb [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit
e5b5a06 [Burak Yavuz] added doc
bb751ce [Burak Yavuz] fix null bug
0226768 [Burak Yavuz] fixed issues
8810beb [Burak Yavuz] R packages support
2015-08-04 18:20:12 -07:00
CodingCat 9d668b7368 [SPARK-9602] remove "Akka/Actor" words from comments
https://issues.apache.org/jira/browse/SPARK-9602

Although we have hidden Akka behind RPC interface, I found that the Akka/Actor-related comments are still spreading everywhere. To make it consistent, we shall remove "actor"/"akka" words from the comments...

Author: CodingCat <zhunansjtu@gmail.com>

Closes #7936 from CodingCat/SPARK-9602 and squashes the following commits:

e8296a3 [CodingCat] remove actor words from comments
2015-08-04 14:54:11 -07:00
Josh Rosen ab8ee1a3b9 [SPARK-9452] [SQL] Support records larger than page size in UnsafeExternalSorter
This patch extends UnsafeExternalSorter to support records larger than the page size. The basic strategy is the same as in #7762: store large records in their own overflow pages.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #7891 from JoshRosen/large-records-in-sql-sorter and squashes the following commits:

967580b [Josh Rosen] Merge remote-tracking branch 'origin/master' into large-records-in-sql-sorter
948c344 [Josh Rosen] Add large records tests for KV sorter.
3c17288 [Josh Rosen] Combine memory and disk cleanup into general cleanupResources() method
380f217 [Josh Rosen] Merge remote-tracking branch 'origin/master' into large-records-in-sql-sorter
27eafa0 [Josh Rosen] Fix page size in PackedRecordPointerSuite
a49baef [Josh Rosen] Address initial round of review comments
3edb931 [Josh Rosen] Remove accidentally-committed debug statements.
2b164e2 [Josh Rosen] Support large records in UnsafeExternalSorter.
2015-08-04 14:42:11 -07:00
Carson Wang cb7fa0aa93 [SPARK-2016] [WEBUI] RDD partition table pagination for the RDD Page
Add pagination for the RDD page to avoid unresponsive UI when the number of the RDD partitions is large.
Before:
![rddpagebefore](https://cloud.githubusercontent.com/assets/9278199/8951533/3d9add54-3601-11e5-99d0-5653b473c49b.png)
After:
![rddpageafter](https://cloud.githubusercontent.com/assets/9278199/8951536/439d66e0-3601-11e5-9cee-1b380fe6620d.png)

Author: Carson Wang <carson.wang@intel.com>

Closes #7692 from carsonwang/SPARK-2016 and squashes the following commits:

03c7168 [Carson Wang] Fix style issues
612c18c [Carson Wang] RDD partition table pagination for the RDD Page
2015-08-04 22:12:30 +09:00
Sean Owen 76d74090d6 [SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition
Enable most javac lint warnings; fix a lot of build warnings. In a few cases, touch up surrounding code in the process.

I'll explain several of the changes inline in comments.

Author: Sean Owen <sowen@cloudera.com>

Closes #7862 from srowen/SPARK-9534 and squashes the following commits:

ea51618 [Sean Owen] Enable most javac lint warnings; fix a lot of build warnings. In a few cases, touch up surrounding code in the process.
2015-08-04 12:02:26 +01:00
Reynold Xin 5eb89f67e3 [SPARK-9577][SQL] Surface concrete iterator types in various sort classes.
We often return abstract iterator types in various sort-related classes (e.g. UnsafeKVExternalSorter). It is actually better to return a more concrete type, so the callsite uses that type and JIT can inline the iterator calls.

Author: Reynold Xin <rxin@databricks.com>

Closes #7911 from rxin/surface-concrete-type and squashes the following commits:

0422add [Reynold Xin] [SPARK-9577][SQL] Surface concrete iterator types in various sort classes.
2015-08-03 18:47:02 -07:00
CodingCat 3b0e44490a [SPARK-8416] highlight and topping the executor threads in thread dumping page
https://issues.apache.org/jira/browse/SPARK-8416

To facilitate debugging, I made this patch with three changes:

* render the executor-thread and non executor-thread entries with different background colors

* put the executor threads on the top of the list

* sort the threads alphabetically

Author: CodingCat <zhunansjtu@gmail.com>

Closes #7808 from CodingCat/SPARK-8416 and squashes the following commits:

34fc708 [CodingCat] fix className
d7b79dd [CodingCat] lowercase threadName
d032882 [CodingCat] sort alphabetically and change the css class name
f0513b1 [CodingCat] change the color & group threads by name
2da6e06 [CodingCat] small fix
3fc9f36 [CodingCat] define classes in webui.css
8ee125e [CodingCat] highlight and put on top the executor threads in thread dumping page
2015-08-03 18:20:40 -07:00