ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Liang-Chi Hsieh	7920296bf8	[SPARK-15430][SQL] Fix potential ConcurrentModificationException for ListAccumulator ## What changes were proposed in this pull request? In `ListAccumulator` we create an unmodifiable view for underlying list. However, it doesn't prevent the underlying to be modified further. So as we access the unmodifiable list, the underlying list can be modified in the same time. It could cause `java.util.ConcurrentModificationException`. We can observe such exception in recent tests. To fix it, we can copy a list of the underlying list and then create the unmodifiable view of this list instead. ## How was this patch tested? The exception might be difficult to test. Existing tests should be passed. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Closes #13211 from viirya/fix-concurrentmodify.	2016-05-22 08:08:46 -05:00
Shixiong Zhu	4e3cb7a5d9	[SPARK-15317][CORE] Don't store accumulators for every task in listeners ## What changes were proposed in this pull request? In general, the Web UI doesn't need to store the Accumulator/AccumulableInfo for every task. It only needs the Accumulator values. In this PR, it creates new UIData classes to store the necessary fields and make `JobProgressListener` store only these new classes, so that `JobProgressListener` won't store Accumulator/AccumulableInfo and the size of `JobProgressListener` becomes pretty small. I also eliminates `AccumulableInfo` from `SQLListener` so that we don't keep any references for those unused `AccumulableInfo`s. ## How was this patch tested? I ran two tests reported in JIRA locally: The first one is: ``` val data = spark.range(0, 10000, 1, 10000) data.cache().count() ``` The retained size of JobProgressListener decreases from 60.7M to 6.9M. The second one is: ``` import org.apache.spark.ml.CC import org.apache.spark.sql.SQLContext val sqlContext = SQLContext.getOrCreate(sc) CC.runTest(sqlContext) ``` This test won't cause OOM after applying this patch. Author: Shixiong Zhu <shixiong@databricks.com> Closes #13153 from zsxwing/memory.	2016-05-19 12:05:17 -07:00
Davies Liu	ad182086cc	[SPARK-15300] Fix writer lock conflict when remove a block ## What changes were proposed in this pull request? A writer lock could be acquired when 1) create a new block 2) remove a block 3) evict a block to disk. 1) and 3) could happen in the same time within the same task, all of them could happen in the same time outside a task. It's OK that when someone try to grab the write block for a block, but the block is acquired by another one that has the same task attempt id. This PR remove the check. ## How was this patch tested? Updated existing tests. Author: Davies Liu <davies@databricks.com> Closes #13082 from davies/write_lock_conflict.	2016-05-19 11:47:17 -07:00
Shixiong Zhu	5c9117a3ed	[SPARK-15395][CORE] Use getHostString to create RpcAddress ## What changes were proposed in this pull request? Right now the netty RPC uses `InetSocketAddress.getHostName` to create `RpcAddress` for network events. If we use an IP address to connect, then the RpcAddress's host will be a host name (if the reverse lookup successes) instead of the IP address. However, some places need to compare the original IP address and the RpcAddress in `onDisconnect` (e.g., CoarseGrainedExecutorBackend), and this behavior will make the check incorrect. This PR uses `getHostString` to resolve the issue. ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #13185 from zsxwing/host-string.	2016-05-18 20:15:00 -07:00
Dongjoon Hyun	cc6a47dd81	[SPARK-15373][WEB UI] Spark UI should show consistent timezones. ## What changes were proposed in this pull request? Currently, SparkUI shows two timezones in a single page when the timezone of browser is different from the server JVM timezone. The following is an example on Databricks CE which uses 'Etc/UTC' timezone. - The time of `submitted` column of list and pop-up description shows `2016/05/18 00:03:07` - The time of `timeline chart` shows `2016/05/17 17:03:07`. ![Different Timezone](https://issues.apache.org/jira/secure/attachment/12804553/12804553_timezone.png) This PR fixes the timeline chart to use the same timezone by the followings. - Upgrade `vis` from 3.9.0(2015-01-16) to 4.16.1(2016-04-18) - Override `moment` of `vis` to get `offset` - Update `AllJobsPage`, `JobPage`, and `StagePage`. ## How was this patch tested? Manual. Run the following command and see the Spark UI's event timelines. ``` $ SPARK_SUBMIT_OPTS="-Dscala.usejavacp=true -Duser.timezone=Etc/UTC" bin/spark-submit --class org.apache.spark.repl.Main ... scala> sql("select 1").head ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13158 from dongjoon-hyun/SPARK-15373.	2016-05-18 23:19:55 +01:00
Davies Liu	8fb1d1c7f3	[SPARK-15357] Cooperative spilling should check consumer memory mode ## What changes were proposed in this pull request? Since we support forced spilling for Spillable, which only works in OnHeap mode, different from other SQL operators (could be OnHeap or OffHeap), we should considering the mode of consumer before calling trigger forced spilling. ## How was this patch tested? Add new test. Author: Davies Liu <davies@databricks.com> Closes #13151 from davies/fix_mode.	2016-05-18 09:44:21 -07:00
Shixiong Zhu	8e8bc9f957	[SPARK-11735][CORE][SQL] Add a check in the constructor of SQLContext/SparkSession to make sure its SparkContext is not stopped ## What changes were proposed in this pull request? Add a check in the constructor of SQLContext/SparkSession to make sure its SparkContext is not stopped. ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #13154 from zsxwing/check-spark-context-stop.	2016-05-17 14:57:21 -07:00
Sean Owen	122302cbf5	[SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into spark-tags ## What changes were proposed in this pull request? (See https://github.com/apache/spark/pull/12416 where most of this was already reviewed and committed; this is just the module structure and move part. This change does not move the annotations into test scope, which was the apparently problem last time.) Rename `spark-test-tags` -> `spark-tags`; move common annotations like `Since` to `spark-tags` ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #13074 from srowen/SPARK-15290.	2016-05-17 09:55:53 +01:00
Nicholas Tietz	0f1f31d3a6	[SPARK-15197][DOCS] Added Scaladoc for countApprox and countByValueApprox parameters This pull request simply adds Scaladoc documentation of the parameters for countApprox and countByValueApprox. This is an important documentation change, as it clarifies what should be passed in for the timeout. Without units, this was previously unclear. I did not open a JIRA ticket per my understanding of the project contribution guidelines; as they state, the description in the ticket would be essentially just what is in the PR. If I should open one, let me know and I will do so. Author: Nicholas Tietz <nicholas.tietz@crosschx.com> Closes #12955 from ntietz/rdd-countapprox-docs.	2016-05-14 09:44:20 +01:00
Holden Karau	382dbc12bb	[SPARK-15061][PYSPARK] Upgrade to Py4J 0.10.1 ## What changes were proposed in this pull request? This upgrades to Py4J 0.10.1 which reduces syscal overhead in Java gateway ( see https://github.com/bartdag/py4j/issues/201 ). Related https://issues.apache.org/jira/browse/SPARK-6728 . ## How was this patch tested? Existing doctests & unit tests pass Author: Holden Karau <holden@us.ibm.com> Closes #13064 from holdenk/SPARK-15061-upgrade-to-py4j-0.10.1.	2016-05-13 08:59:18 +01:00
Takuya UESHIN	a57aadae84	[SPARK-13902][SCHEDULER] Make DAGScheduler not to create duplicate stage. ## What changes were proposed in this pull request? `DAGScheduler`sometimes generate incorrect stage graph. Suppose you have the following DAG: ``` [A] <--(s_A)-- [B] <--(s_B)-- [C] <--(s_C)-- [D] \ / <------------- ``` Note: [] means an RDD, () means a shuffle dependency. Here, RDD `B` has a shuffle dependency on RDD `A`, and RDD `C` has shuffle dependency on both `B` and `A`. The shuffle dependency IDs are numbers in the `DAGScheduler`, but to make the example easier to understand, let's call the shuffled data from `A` shuffle dependency ID `s_A` and the shuffled data from `B` shuffle dependency ID `s_B`. The `getAncestorShuffleDependencies` method in `DAGScheduler` (incorrectly) does not check for duplicates when it's adding ShuffleDependencies to the parents data structure, so for this DAG, when `getAncestorShuffleDependencies` gets called on `C` (previous of the final RDD), `getAncestorShuffleDependencies` will return `s_A`, `s_B`, `s_A` (`s_A` gets added twice: once when the method "visit"s RDD `C`, and once when the method "visit"s RDD `B`). This is problematic because this line of code: https://github.com/apache/spark/blob/8ef3399/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L289 then generates a new shuffle stage for each dependency returned by `getAncestorShuffleDependencies`, resulting in duplicate map stages that compute the map output from RDD `A`. As a result, `DAGScheduler` generates the following stages and their parents for each shuffle: \| \| stage \| parents \| \|----\|----\|----\| \| s_A \| ShuffleMapStage 2 \| List() \| \| s_B \| ShuffleMapStage 1 \| List(ShuffleMapStage 0) \| \| s_C \| ShuffleMapStage 3 \| List(ShuffleMapStage 1, ShuffleMapStage 2) \| \| - \| ResultStage 4 \| List(ShuffleMapStage 3) \| The stage for s_A should be `ShuffleMapStage 0`, but the stage for `s_A` is generated twice as `ShuffleMapStage 2` and `ShuffleMapStage 0` is overwritten by `ShuffleMapStage 2`, and the stage `ShuffleMap Stage1` keeps referring the old stage `ShuffleMapStage 0`. This patch is fixing it. ## How was this patch tested? I added the sample RDD graph to show the illegal stage graph to `DAGSchedulerSuite`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #12655 from ueshin/issues/SPARK-13902.	2016-05-12 12:36:18 -07:00
bomeng	81bf870848	[SPARK-14897][SQL] upgrade to jetty 9.2.16 ## What changes were proposed in this pull request? Since Jetty 8 is EOL (end of life) and has critical security issue [http://www.securityweek.com/critical-vulnerability-found-jetty-web-server], I think upgrading to 9 is necessary. I am using latest 9.2 since 9.3 requires Java 8+. `javax.servlet` and `derby` were also upgraded since Jetty 9.2 needs corresponding version. ## How was this patch tested? Manual test and current test cases should cover it. Author: bomeng <bmeng@us.ibm.com> Closes #12916 from bomeng/SPARK-14897.	2016-05-12 20:07:44 +01:00
Sandeep Singh	ff92eb2e80	[SPARK-15080][CORE] Break copyAndReset into copy and reset ## What changes were proposed in this pull request? Break copyAndReset into two methods copy and reset instead of just one. ## How was this patch tested? Existing Tests Author: Sandeep Singh <sandeep@techaddict.me> Closes #12936 from techaddict/SPARK-15080.	2016-05-12 11:12:09 +08:00
Andrew Or	40a949aae9	[SPARK-15262] Synchronize block manager / scheduler executor state ## What changes were proposed in this pull request? If an executor is still alive even after the scheduler has removed its metadata, we may receive a heartbeat from that executor and tell its block manager to reregister itself. If that happens, the block manager master will know about the executor, but the scheduler will not. That is a dangerous situation, because when the executor does get disconnected later, the scheduler will not ask the block manager to also remove metadata for that executor. Later, when we try to clean up an RDD or a broadcast variable, we may try to send a message to that executor, triggering an exception. ## How was this patch tested? Jenkins. Author: Andrew Or <andrew@databricks.com> Closes #13055 from andrewor14/block-manager-remove.	2016-05-11 13:36:58 -07:00
Andrew Or	bb88ad4e0e	[SPARK-15260] Atomically resize memory pools ## What changes were proposed in this pull request? When we acquire execution memory, we do a lot of things between shrinking the storage memory pool and enlarging the execution memory pool. In particular, we call `memoryStore.evictBlocksToFreeSpace`, which may do a lot of I/O and can throw exceptions. If an exception is thrown, the pool sizes on that executor will be in a bad state. This patch minimizes the things we do between the two calls to make the resizing more atomic. ## How was this patch tested? Jenkins. Author: Andrew Or <andrew@databricks.com> Closes #13039 from andrewor14/safer-pool.	2016-05-11 12:58:57 -07:00
cody koeninger	89e67d6667	[SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifact ## What changes were proposed in this pull request? Renaming the streaming-kafka artifact to include kafka version, in anticipation of needing a different artifact for later kafka versions ## How was this patch tested? Unit tests Author: cody koeninger <cody@koeninger.org> Closes #12946 from koeninger/SPARK-15085.	2016-05-11 12:15:41 -07:00
Eric Liang	6d0368ab8d	[SPARK-15259] Sort time metric should not include spill and record insertion time ## What changes were proposed in this pull request? After SPARK-14669 it seems the sort time metric includes both spill and record insertion time. This makes it not very useful since the metric becomes close to the total execution time of the node. We should track just the time spent for in-memory sort, as before. ## How was this patch tested? Verified metric in the UI, also unit test on UnsafeExternalRowSorter. cc davies Author: Eric Liang <ekl@databricks.com> Author: Eric Liang <ekhliang@gmail.com> Closes #13035 from ericl/fix-metrics.	2016-05-11 11:25:46 -07:00
Kousuke Saruta	ba181c0c7a	[SPARK-15235][WEBUI] Corresponding row cannot be highlighted even though cursor is on the job on Web UI's timeline ## What changes were proposed in this pull request? To extract job descriptions and stage name, there are following regular expressions in timeline-view.js ``` var jobIdText = $($(baseElem).find(".application-timeline-content")[0]).text(); var jobId = jobIdText.match("\$Job (\\d+)\$")[1]; ... var stageIdText = $($(baseElem).find(".job-timeline-content")[0]).text(); var stageIdAndAttempt = stageIdText.match("\$Stage (\\d+\\.\\d+)\$")[1].split("."); ``` But if job descriptions include patterns like "(Job x)" or stage names include patterns like "(Stage x.y)", the regular expressions cannot be match as we expected, ending up with corresponding row cannot be highlighted even though we move the cursor onto the job on Web UI's timeline. ## How was this patch tested? Manually tested with spark-shell and Web UI. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #13016 from sarutak/SPARK-15235.	2016-05-10 22:32:38 -07:00
Lianhui Wang	9f0a642f84	[SPARK-15246][SPARK-4452][CORE] Fix code style and improve volatile for ## What changes were proposed in this pull request? 1. Fix code style 2. remove volatile of elementsRead method because there is only one thread to use it. 3. avoid volatile of _elementsRead because Collection increases number of _elementsRead when it insert a element. It is very expensive. So we can avoid it. After this PR, I will push another PR for branch 1.6. ## How was this patch tested? unit tests Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes #13020 from lianhuiwang/SPARK-4452-hotfix.	2016-05-10 22:30:39 -07:00
Wenchen Fan	bcfee153b1	[SPARK-12837][CORE] reduce network IO for accumulators Sending un-updated accumulators back to driver makes no sense, as merging a zero value accumulator is a no-op. We should only send back updated accumulators, to save network IO. new test in `TaskContextSuite` Author: Wenchen Fan <wenchen@databricks.com> Closes #12899 from cloud-fan/acc.	2016-05-10 11:16:56 -07:00
Marcelo Vanzin	0b9cae4242	[SPARK-11249][LAUNCHER] Throw error if app resource is not provided. Without this, the code would build an invalid spark-submit command line, and a more cryptic error would be presented to the user. Also, expose a constant that allows users to set a dummy resource in cases where they don't need an actual resource file; for backwards compatibility, that uses the same "spark-internal" resource that Spark itself uses. Tested via unit tests, run-example, spark-shell, and running the thrift server with mixed spark and hive command line arguments. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #12909 from vanzin/SPARK-11249.	2016-05-10 10:35:54 -07:00
Sital Kedia	a019e6efb7	[SPARK-14542][CORE] PipeRDD should allow configurable buffer size for… ## What changes were proposed in this pull request? Currently PipedRDD internally uses PrintWriter to write data to the stdin of the piped process, which by default uses a BufferedWriter of buffer size 8k. In our experiment, we have seen that 8k buffer size is too small and the job spends significant amount of CPU time in system calls to copy the data. We should have a way to configure the buffer size for the writer. ## How was this patch tested? Ran PipedRDDSuite tests. Author: Sital Kedia <skedia@fb.com> Closes #12309 from sitalkedia/bufferedPipedRDD.	2016-05-10 15:28:35 +01:00
Josh Rosen	3323d0f931	[SPARK-15209] Fix display of job descriptions with single quotes in web UI timeline ## What changes were proposed in this pull request? This patch fixes an escaping bug in the Web UI's event timeline that caused Javascript errors when displaying timeline entries whose descriptions include single quotes. The original bug can be reproduced by running ```scala sc.setJobDescription("double quote: \" ") sc.parallelize(1 to 10).count() sc.setJobDescription("single quote: ' ") sc.parallelize(1 to 10).count() ``` and then browsing to the driver UI. Previously, this resulted in an "Uncaught SyntaxError" because the single quote from the description was not escaped and ended up closing a Javascript string literal too early. The fix implemented here is to change the relevant Javascript to define its string literals using double-quotes. Our escaping logic already properly escapes double quotes in the description, so this is safe to do. ## How was this patch tested? Tested manually in `spark-shell` using the following cases: ```scala sc.setJobDescription("double quote: \" ") sc.parallelize(1 to 10).count() sc.setJobDescription("single quote: ' ") sc.parallelize(1 to 10).count() sc.setJobDescription("ampersand: &") sc.parallelize(1 to 10).count() sc.setJobDescription("newline: \n text after newline ") sc.parallelize(1 to 10).count() sc.setJobDescription("carriage return: \r text after return ") sc.parallelize(1 to 10).count() ``` /cc sarutak for review. Author: Josh Rosen <joshrosen@databricks.com> Closes #12995 from JoshRosen/SPARK-15209.	2016-05-10 08:21:32 +09:00
Alex Bozarth	c3e23bc0c3	[SPARK-10653][CORE] Remove unnecessary things from SparkEnv ## What changes were proposed in this pull request? Removed blockTransferService and sparkFilesDir from SparkEnv since they're rarely used and don't need to be in stored in the env. Edited their few usages to accommodate the change. ## How was this patch tested? ran dev/run-tests locally Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #12970 from ajbozarth/spark10653.	2016-05-09 11:51:37 -07:00
mwws	f8aca5b4a9	[SAPRK-15220][UI] add hyperlink to running application and completed application ## What changes were proposed in this pull request? Add hyperlink to "running application" and "completed application", so user can jump to application table directly, In my environment, I set up 1000+ works and it's painful to scroll down to skip worker list. ## How was this patch tested? manual tested (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) ![sceenshot](https://cloud.githubusercontent.com/assets/13216322/15105718/97e06768-15f6-11e6-809d-3574046751a9.png) Author: mwws <wei.mao@intel.com> Closes #12997 from mwws/SPARK_UI.	2016-05-09 11:17:14 -07:00
Sandeep Singh	a21a3bbe69	[SPARK-15087][MINOR][DOC] Follow Up: Fix the Comments ## What changes were proposed in this pull request? Remove the Comment, since it not longer applies. see the discussion here(https://github.com/apache/spark/pull/12865#discussion-diff-61946906) Author: Sandeep Singh <sandeep@techaddict.me> Closes #12953 from techaddict/SPARK-15087-FOLLOW-UP.	2016-05-07 11:10:14 +08:00
Thomas Graves	cc95f1ed5f	[SPARK-1239] Improve fetching of map output statuses The main issue we are trying to solve is the memory bloat of the Driver when tasks request the map output statuses. This means with a large number of tasks you either need a huge amount of memory on Driver or you have to repartition to smaller number. This makes it really difficult to run over say 50000 tasks. The main issues that cause the memory bloat are: 1) no flow control on sending the map output status responses. We serialize the map status output and then hand off to netty to send. netty is sending asynchronously and it can't send them fast enough to keep up with incoming requests so we end up with lots of copies of the serialized map output statuses sitting there and this causes huge bloat when you have 10's of thousands of tasks and map output status is in the 10's of MB. 2) When initial reduce tasks are started up, they all request the map output statuses from the Driver. These requests are handled by multiple threads in parallel so even though we check to see if we have a cached version, initially when we don't have a cached version yet, many of initial requests can all end up serializing the exact same map output statuses. This patch does a couple of things: - When the map output status size is over a threshold (default 512K) then it uses broadcast to send the map statuses. This means we no longer serialize a large map output status and thus we don't have issues with memory bloat. the messages sizes are now in the 300-400 byte range and the map status output are broadcast. If its under the threadshold it sends it as before, the message contains the DIRECT indicator now. - synchronize the incoming requests to allow one thread to cache the serialized output and broadcast the map output status that can then be used by everyone else. This ensures we don't create multiple broadcast variables when we don't need to. To ensure this happens I added a second thread pool which the Dispatcher hands the requests to so that those threads can block without blocking the main dispatcher threads (which would cause things like heartbeats and such not to come through) Note that some of design and code was contributed by mridulm ## How was this patch tested? Unit tests and a lot of manually testing. Ran with akka and netty rpc. Ran with both dynamic allocation on and off. one of the large jobs I used to test this was a join of 15TB of data. it had 200,000 map tasks, and 20,000 reduce tasks. Executors ranged from 200 to 2000. This job ran successfully with 5GB of memory on the driver with these changes. Without these changes I was using 20GB and only had 500 reduce tasks. The job has 50mb of serialized map output statuses and took roughly the same amount of time for the executors to get the map output statuses as before. Ran a variety of other jobs, from large wordcounts to small ones not using broadcasts. Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes #12113 from tgravescs/SPARK-1239.	2016-05-06 19:31:26 -07:00
Jacek Laskowski	bbb7773437	[SPARK-15152][DOC][MINOR] Scaladoc and Code style Improvements ## What changes were proposed in this pull request? Minor doc and code style fixes ## How was this patch tested? local build Author: Jacek Laskowski <jacek@japila.pl> Closes #12928 from jaceklaskowski/SPARK-15152.	2016-05-05 16:34:27 -07:00
Ryan Blue	08db491265	[SPARK-9926] Parallelize partition logic in UnionRDD. This patch has the new logic from #8512 that uses a parallel collection to compute partitions in UnionRDD. The rest of #8512 added an alternative code path for calculating splits in S3, but that isn't necessary to get the same speedup. The underlying problem wasn't that bulk listing wasn't used, it was that an extra FileStatus was retrieved for each file. The fix was just committed as [HADOOP-12810](https://issues.apache.org/jira/browse/HADOOP-12810). (I think the original commit also used a single prefix to enumerate all paths, but that isn't always helpful and it was removed in later versions so there is no need for SparkS3Utils.) I tested this using the same table that piapiaozhexiu was using. Calculating splits for a 10-day period took 25 seconds with this change and HADOOP-12810, which is on par with the results from #8512. Author: Ryan Blue <blue@apache.org> Author: Cheolsoo Park <cheolsoop@netflix.com> Closes #11242 from rdblue/SPARK-9926-parallelize-union-rdd.	2016-05-05 14:40:37 -07:00
depend	5c47db0657	[SPARK-15158][CORE] downgrade shouldRollover message to debug level ## What changes were proposed in this pull request? set log level to debug when check shouldRollover ## How was this patch tested? It's tested manually. Author: depend <depend@gmail.com> Closes #12931 from depend/master.	2016-05-05 14:39:35 -07:00
Jason Moore	77361a433a	[SPARK-14915][CORE] Don't re-queue a task if another attempt has already succeeded ## What changes were proposed in this pull request? Don't re-queue a task if another attempt has already succeeded. This currently happens when a speculative task is denied from committing the result due to another copy of the task already having succeeded. ## How was this patch tested? I'm running a job which has a fair bit of skew in the processing time across the tasks for speculation to trigger in the last quarter (default settings), causing many commit denied exceptions to be thrown. Previously, these tasks were then being retried over and over again until the stage possibly completes (despite using compute resources on these superfluous tasks). With this change (applied to the 1.6 branch), they no longer retry and the stage completes successfully without these extra task attempts. Author: Jason Moore <jasonmoore2k@outlook.com> Closes #12751 from jasonmoore2k/SPARK-14915.	2016-05-05 11:02:35 +01:00
mcheah	b7fdc23ccc	[SPARK-12154] Upgrade to Jersey 2 ## What changes were proposed in this pull request? Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things. ## How was this patch tested? I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications. Author: mcheah <mcheah@palantir.com> Closes #12715 from mccheah/feature/upgrade-jersey.	2016-05-05 10:51:03 +01:00
Abhinav Gupta	1a5c6fcef1	[SPARK-15045] [CORE] Remove dead code in TaskMemoryManager.cleanUpAllAllocatedMemory for pageTable ## What changes were proposed in this pull request? Removed the DeadCode as suggested. Author: Abhinav Gupta <abhi.951990@gmail.com> Closes #12829 from abhi951990/master.	2016-05-04 22:22:01 -07:00
Sebastien Rainville	eb019af9a9	[SPARK-13001][CORE][MESOS] Prevent getting offers when reached max cores Similar to https://github.com/apache/spark/pull/8639 This change rejects offers for 120s when reached `spark.cores.max` in coarse-grained mode to mitigate offer starvation. This prevents Mesos to send us offers again and again, starving other frameworks. This is especially problematic when running many small frameworks on the same Mesos cluster, e.g. many small Sparks streaming jobs, and cause the bigger spark jobs to stop receiving offers. By rejecting the offers for a long period of time, they become available to those other frameworks. Author: Sebastien Rainville <sebastien@hopper.com> Closes #10924 from sebastienrainville/master.	2016-05-04 14:32:36 -07:00
Bryan Cutler	cf2e9da612	[SPARK-12299][CORE] Remove history serving functionality from Master Remove history server functionality from standalone Master. Previously, the Master process rebuilt a SparkUI once the application was completed which sometimes caused problems, such as OOM, when the application event log is large (see SPARK-6270). Keeping this functionality out of the Master will help to simplify the process and increase stability. Testing for this change included running core unit tests and manually running an application on a standalone cluster to verify that it completed successfully and that the Master UI functioned correctly. Also added 2 unit tests to verify killing an application and driver from MasterWebUI makes the correct request to the Master. Author: Bryan Cutler <cutlerb@gmail.com> Closes #10991 from BryanCutler/remove-history-master-SPARK-12299.	2016-05-04 14:29:54 -07:00
Reynold Xin	6274a520fa	[SPARK-15115][SQL] Reorganize whole stage codegen benchmark suites ## What changes were proposed in this pull request? We currently have a single suite that is very large, making it difficult to maintain and play with specific primitives. This patch reorganizes the file by creating multiple benchmark suites in a single package. Most of the changes are straightforward move of code. On top of the code moving, I did: 1. Use SparkSession instead of SQLContext. 2. Turned most benchmark scenarios into a their own test cases, rather than having multiple scenarios in a single test case, which takes forever to run. ## How was this patch tested? This is a test only change. Author: Reynold Xin <rxin@databricks.com> Closes #12891 from rxin/SPARK-15115.	2016-05-04 11:00:01 -07:00
Dhruve Ashar	a45647746d	[SPARK-4224][CORE][YARN] Support group acls ## What changes were proposed in this pull request? Currently only a list of users can be specified for view and modify acls. This change enables a group of admins/devs/users to be provisioned for viewing and modifying Spark jobs. Changes Proposed in the fix Three new corresponding config entries have been added where the user can specify the groups to be given access. ``` spark.admin.acls.groups spark.modify.acls.groups spark.ui.view.acls.groups ``` New config entries were added because specifying the users and groups explicitly is a better and cleaner way compared to specifying them in the existing config entry using a delimiter. A generic trait has been introduced to provide the user to group mapping which makes it pluggable to support a variety of mapping protocols - similar to the one used in hadoop. A default unix shell based implementation has been provided. Custom user to group mapping protocol can be specified and configured by the entry ```spark.user.groups.mapping``` How the patch was Tested We ran different spark jobs setting the config entries in combinations of admin, modify and ui acls. For modify acls we tried killing the job stages from the ui and using yarn commands. For view acls we tried accessing the UI tabs and the logs. Headless accounts were used to launch these jobs and different users tried to modify and view the jobs to ensure that the groups mapping applied correctly. Additional Unit tests have been added without modifying the existing ones. These test for different ways of setting the acls through configuration and/or API and validate the expected behavior. Author: Dhruve Ashar <dhruveashar@gmail.com> Closes #12760 from dhruve/impr/SPARK-4224.	2016-05-04 08:45:43 -05:00
Reynold Xin	695f0e9195	[SPARK-15107][SQL] Allow varying # iterations by test case in Benchmark ## What changes were proposed in this pull request? This patch changes our micro-benchmark util to allow setting different iteration numbers for different test cases. For some of our benchmarks, turning off whole-stage codegen can make the runtime 20X slower, making it very difficult to run a large number of times without substantially shortening the input cardinality. With this change, I set the default num iterations to 2 for whole stage codegen off, and 5 for whole stage codegen on. I also updated some results. ## How was this patch tested? N/A - this is a test util. Author: Reynold Xin <rxin@databricks.com> Closes #12884 from rxin/SPARK-15107.	2016-05-03 22:56:40 -07:00
Timothy Chen	c1839c9911	[SPARK-14645][MESOS] Fix python running on cluster mode mesos to have non local uris ## What changes were proposed in this pull request? Fix SparkSubmit to allow non-local python uris ## How was this patch tested? Manually tested with mesos-spark-dispatcher Author: Timothy Chen <tnachen@gmail.com> Closes #12403 from tnachen/enable_remote_python.	2016-05-03 18:04:04 -07:00
Andrew Ash	dbacd99983	[SPARK-15104] Fix spacing in log line Otherwise get logs that look like this (note no space before NODE_LOCAL) ``` INFO [2016-05-03 21:18:51,477] org.apache.spark.scheduler.TaskSetManager: Starting task 0.0 in stage 101.0 (TID 7029, localhost, partition 0,NODE_LOCAL, 1894 bytes) ``` Author: Andrew Ash <andrew@andrewash.com> Closes #12880 from ash211/patch-7.	2016-05-03 14:54:43 -07:00
Thomas Graves	83ee92f603	[SPARK-11316] coalesce doesn't handle UnionRDD with partial locality properly ## What changes were proposed in this pull request? coalesce doesn't handle UnionRDD with partial locality properly. I had a user who had a UnionRDD that was made up of mapPartitionRDD without preferred locations and a checkpointedRDD with preferred locations (getting from hdfs). It took the driver over 20 minutes to setup the groups and put the partitions into those groups before it even started any tasks. Even perhaps worse is it didn't end up with the number of partitions he was asking for because it didn't put a partition in each of the groups properly. The changes in this patch get rid of a n^2 while loop that was causing the 20 minutes, it properly distributes the partitions to have at least one per group, and it changes from using the rotation iterator which got the preferred locations many times to get all the preferred locations once up front. Note that the n^2 while loop that I removed in setupGroups took so long because all of the partitions with preferred locations were already assigned to group, so it basically looped through every single one and wasn't ever able to assign it. At the time I had 960 partitions with preferred locations and 1020 without and did the outer while loop 319 times because that is the # of groups left to create. Note that each of those times through the inner while loop is going off to hdfs to get the block locations, so this is extremely inefficient. ## How was the this patch tested? Added unit tests for this case and ran existing ones that applied to make sure no regressions. Also manually tested on the users production job to make sure it fixed their issue. It created the proper number of partitions and now it takes about 6 seconds rather then 20 minutes. I did also run some basic manual tests with spark-shell doing coalesced to smaller number, same number, and then greater with shuffle. Author: Thomas Graves <tgraves@prevailsail.corp.gq1.yahoo.com> Closes #11327 from tgravescs/SPARK-11316.	2016-05-03 13:43:20 -07:00
Devaraj K	659f635d3b	[SPARK-14234][CORE] Executor crashes for TaskRunner thread interruption ## What changes were proposed in this pull request? Resetting the task interruption status before updating the task status. ## How was this patch tested? I have verified it manually by running multiple applications, Executor doesn't crash and updates the status to the driver without any exceptions with the patch changes. Author: Devaraj K <devaraj@apache.org> Closes #12031 from devaraj-kavali/SPARK-14234.	2016-05-03 13:25:28 -07:00
Zheng Tan	f5623b4602	[SPARK-15059][CORE] Remove fine-grained lock in ChildFirstURLClassLoader to avoid dead lock ## What changes were proposed in this pull request? In some cases, fine-grained lock have race condition with class-loader lock and have caused dead lock issue. It is safe to drop this fine grained lock and load all classes by single class-loader lock. Author: Zheng Tan <zheng.tan@hulu.com> Closes #12857 from tankkyo/master.	2016-05-03 12:22:52 -07:00
Sandeep Singh	ca813330c7	[SPARK-15087][CORE][SQL] Remove AccumulatorV2.localValue and keep only value ## What changes were proposed in this pull request? Remove AccumulatorV2.localValue and keep only value ## How was this patch tested? existing tests Author: Sandeep Singh <sandeep@techaddict.me> Closes #12865 from techaddict/SPARK-15087.	2016-05-03 11:38:43 -07:00
Reynold Xin	d557a5e01e	[SPARK-15081] Move AccumulatorV2 and subclasses into util package ## What changes were proposed in this pull request? This patch moves AccumulatorV2 and subclasses into util package. ## How was this patch tested? Updated relevant tests. Author: Reynold Xin <rxin@databricks.com> Closes #12863 from rxin/SPARK-15081.	2016-05-03 19:45:12 +08:00
Holden Karau	f10ae4b1e1	[SPARK-6717][ML] Clear shuffle files after checkpointing in ALS ## What changes were proposed in this pull request? When ALS is run with a checkpoint interval, during the checkpoint materialize the current state and cleanup the previous shuffles (non-blocking). ## How was this patch tested? Existing ALS unit tests, new ALS checkpoint cleanup unit tests added & shuffle files checked after ALS w/checkpointing run. Author: Holden Karau <holden@us.ibm.com> Author: Holden Karau <holden@pigscanfly.ca> Closes #11919 from holdenk/SPARK-6717-clear-shuffle-files-after-checkpointing-in-ALS.	2016-05-03 00:18:10 -07:00
Reynold Xin	bb9ab56b96	[SPARK-15079] Support average/count/sum in Long/DoubleAccumulator ## What changes were proposed in this pull request? This patch removes AverageAccumulator and adds the ability to compute average to LongAccumulator and DoubleAccumulator. The patch also improves documentation for the two accumulators. ## How was this patch tested? Added unit tests for this. Author: Reynold Xin <rxin@databricks.com> Closes #12858 from rxin/SPARK-15079.	2016-05-02 21:12:48 -07:00
Marcin Tustin	8028f3a0b4	[SPARK-14685][CORE] Document heritability of localProperties ## What changes were proposed in this pull request? This updates the java-/scala- doc for setLocalProperty to document heritability of localProperties. This also adds tests for that behaviour. ## How was this patch tested? Tests pass. New tests were added. Author: Marcin Tustin <marcin.tustin@gmail.com> Closes #12455 from marcintustin/SPARK-14685.	2016-05-02 19:37:57 -07:00
Reynold Xin	d5c79f564f	[SPARK-15054] Deprecate old accumulator API ## What changes were proposed in this pull request? This patch deprecates the old accumulator API. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #12832 from rxin/SPARK-15054.	2016-05-02 14:57:00 -07:00
Jeff Zhang	0a3026990b	[SPARK-14845][SPARK_SUBMIT][YARN] spark.files in properties file is n… ## What changes were proposed in this pull request? initialize SparkSubmitArgument#files first from spark-submit arguments then from properties file, so that sys property spark.yarn.dist.files will be set correctly. ``` OptionAssigner(args.files, YARN, ALL_DEPLOY_MODES, sysProp = "spark.yarn.dist.files"), ``` ## How was this patch tested? manul test. file defined in properties file is also distributed to driver in yarn-cluster mode. Author: Jeff Zhang <zjffdu@apache.org> Closes #12656 from zjffdu/SPARK-14845.	2016-05-02 11:03:37 -07:00
Reynold Xin	44da8d8eab	[SPARK-15049] Rename NewAccumulator to AccumulatorV2 ## What changes were proposed in this pull request? NewAccumulator isn't the best name if we ever come up with v3 of the API. ## How was this patch tested? Updated tests to reflect the change. Author: Reynold Xin <rxin@databricks.com> Closes #12827 from rxin/SPARK-15049.	2016-05-01 20:21:02 -07:00
Allen	cdf9e9753d	[SPARK-14505][CORE] Fix bug : creating two SparkContext objects in the same jvm, the first one will can not run any task! After creating two SparkContext objects in the same jvm(the second one can not be created successfully!), use the first one to run job will throw exception like below: ![image](https://cloud.githubusercontent.com/assets/7162889/14402832/0c8da2a6-fe73-11e5-8aba-68ee3ddaf605.png) Author: Allen <yufan_1990@163.com> Closes #12273 from the-sea/context-create-bug.	2016-05-01 15:39:14 +01:00
Herman van Hovell	e5fb78baf9	[SPARK-14952][CORE][ML] Remove methods that were deprecated in 1.6.0 #### What changes were proposed in this pull request? This PR removes three methods the were deprecated in 1.6.0: - `PortableDataStream.close()` - `LinearRegression.weights` - `LogisticRegression.weights` The rationale for doing this is that the impact is small and that Spark 2.0 is a major release. #### How was this patch tested? Compilation succeded. Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #12732 from hvanhovell/SPARK-14952.	2016-04-30 16:06:20 +01:00
Reynold Xin	8dc3987d09	[SPARK-15028][SQL] Remove HiveSessionState.setDefaultOverrideConfs ## What changes were proposed in this pull request? This patch removes some code that are no longer relevant -- mainly HiveSessionState.setDefaultOverrideConfs. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #12806 from rxin/SPARK-15028.	2016-04-30 01:32:00 -07:00
Wenchen Fan	b056e8cb0a	[SPARK-15010][CORE] new accumulator shoule be tolerant of local RPC message delivery ## What changes were proposed in this pull request? The RPC framework will not serialize and deserialize messages in local mode, we should not call `acc.value` when receive heartbeat message, because the serialization hook of new accumulator may not be triggered and the `atDriverSide` flag may not be set. ## How was this patch tested? tested it locally via spark shell Author: Wenchen Fan <wenchen@databricks.com> Closes #12795 from cloud-fan/bug.	2016-04-29 19:01:38 -07:00
tedyu	dcfaeadea7	[SPARK-15003] Use ConcurrentHashMap in place of HashMap for NewAccumulator.originals ## What changes were proposed in this pull request? This PR proposes to use ConcurrentHashMap in place of HashMap for NewAccumulator.originals This should result in better performance. ## How was this patch tested? Existing unit test suite cloud-fan Author: tedyu <yuzhihong@gmail.com> Closes #12776 from tedyu/master.	2016-04-30 07:54:53 +08:00
Sun Rui	4ae9fe091c	[SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR. ## What changes were proposed in this pull request? dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame. The function signature is: dapply(df, function(localDF) {}, schema = NULL) R function input: local data.frame from the partition on local node R function output: local data.frame Schema specifies the Row format of the resulting DataFrame. It must match the R function's output. If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply(). ## How was this patch tested? SparkR unit tests. Author: Sun Rui <rui.sun@intel.com> Author: Sun Rui <sunrui2016@gmail.com> Closes #12493 from sun-rui/SPARK-12919.	2016-04-29 16:41:07 -07:00
Wenchen Fan	6f9a18fe31	[HOTFIX][CORE] fix a concurrence issue in NewAccumulator ## What changes were proposed in this pull request? `AccumulatorContext` is not thread-safe, that's why all of its methods are synchronized. However, there is one exception: the `AccumulatorContext.originals`. `NewAccumulator` use it to check if it's registered, which is wrong as it's not synchronized. This PR mark `AccumulatorContext.originals` as `private` and now all access to `AccumulatorContext` is synchronized. ## How was this patch tested? I verified it locally. To be safe, we can let jenkins test it many times to make sure this problem is gone. Author: Wenchen Fan <wenchen@databricks.com> Closes #12773 from cloud-fan/debug.	2016-04-28 21:57:58 -07:00
Yin Huai	9c7c42bc6a	Revert "[SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local" This reverts commit `dae538a4d7`.	2016-04-28 19:57:41 -07:00
Pravin Gadakh	dae538a4d7	[SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local ## What changes were proposed in this pull request? This PR adds `since` tag into the matrix and vector classes in spark-mllib-local. ## How was this patch tested? Scala-style checks passed. Author: Pravin Gadakh <prgadakh@in.ibm.com> Closes #12416 from pravingadakh/SPARK-14613.	2016-04-28 15:59:18 -07:00
Ergin Seyfe	23256be0d0	[SPARK-14576][WEB UI] Spark console should display Web UI url ## What changes were proposed in this pull request? This is a proposal to print the Spark Driver UI link when spark-shell is launched. ## How was this patch tested? Launched spark-shell in local mode and cluster mode. Spark-shell console output included following line: "Spark context Web UI available at <Spark web url>" Author: Ergin Seyfe <eseyfe@fb.com> Closes #12341 from seyfe/spark_console_display_webui_link.	2016-04-28 16:16:28 +01:00
Wenchen Fan	bf5496dbda	[SPARK-14654][CORE] New accumulator API ## What changes were proposed in this pull request? This PR introduces a new accumulator API which is much simpler than before: 1. the type hierarchy is simplified, now we only have an `Accumulator` class 2. Combine `initialValue` and `zeroValue` concepts into just one concept: `zeroValue` 3. there in only one `register` method, the accumulator registration and cleanup registration are combined. 4. the `id`,`name` and `countFailedValues` are combined into an `AccumulatorMetadata`, and is provided during registration. `SQLMetric` is a good example to show the simplicity of this new API. What we break: 1. no `setValue` anymore. In the new API, the intermedia type can be different from the result type, it's very hard to implement a general `setValue` 2. accumulator can't be serialized before registered. Problems need to be addressed in follow-ups: 1. with this new API, `AccumulatorInfo` doesn't make a lot of sense, the partial output is not partial updates, we need to expose the intermediate value. 2. `ExceptionFailure` should not carry the accumulator updates. Why do users care about accumulator updates for failed cases? It looks like we only use this feature to update the internal metrics, how about we sending a heartbeat to update internal metrics after the failure event? 3. the public event `SparkListenerTaskEnd` carries a `TaskMetrics`. Ideally this `TaskMetrics` don't need to carry external accumulators, as the only method of `TaskMetrics` that can access external accumulators is `private[spark]`. However, `SQLListener` use it to retrieve sql metrics. ## How was this patch tested? existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes #12612 from cloud-fan/acc.	2016-04-28 00:26:39 -07:00
Jakob Odersky	be317d4a90	[SPARK-10001][CORE] Don't short-circuit actions in signal handlers ## What changes were proposed in this pull request? The current signal handlers have a subtle bug that stops evaluating registered actions as soon as one of them returns true, this is because `forall` is short-circuited. This PR adds a strict mapping stage before evaluating returned result. There are no known occurrences of the bug and this is a preemptive fix. ## How was this patch tested? As with the original introduction of signal handlers, this was tested manually (unit testing with signals is not straightforward). Author: Jakob Odersky <jakob@odersky.com> Closes #12745 from jodersky/SPARK-10001-hotfix.	2016-04-27 22:46:43 -07:00
Josh Rosen	8c49cebce5	[SPARK-14966] SizeEstimator should ignore classes in the scala.reflect package In local profiling, I noticed SizeEstimator spending tons of time estimating the size of objects which contain TypeTag or ClassTag fields. The problem with these tags is that they reference global Scala reflection objects, which, in turn, reference many singletons, such as TestHive. This throws off the accuracy of the size estimation and wastes tons of time traversing a huge object graph. As a result, I think that SizeEstimator should ignore any classes in the `scala.reflect` package. Author: Josh Rosen <joshrosen@databricks.com> Closes #12741 from JoshRosen/ignore-scala-reflect-in-size-estimator.	2016-04-27 17:34:55 -07:00
Hemant Bhanawat	e4d439c831	[SPARK-14729][SCHEDULER] Refactored YARN scheduler creation code to use newly added ExternalClusterManager ## What changes were proposed in this pull request? With the addition of ExternalClusterManager(ECM) interface in PR #11723, any cluster manager can now be integrated with Spark. It was suggested in ExternalClusterManager PR that one of the existing cluster managers should start using the new interface to ensure that the API is correct. Ideally, all the existing cluster managers should eventually use the ECM interface but as a first step yarn will now use the ECM interface. This PR refactors YARN code from SparkContext.createTaskScheduler function into YarnClusterManager that implements ECM interface. ## How was this patch tested? Since this is refactoring, no new tests has been added. Existing tests have been run. Basic manual testing with YARN was done too. Author: Hemant Bhanawat <hemant@snappydata.io> Closes #12641 from hbhanawat/yarnClusterMgr.	2016-04-27 10:59:23 -07:00
Victor Chima	08dc89361d	Unintentional white spaces in kryo classes configuration parameters ## What changes were proposed in this pull request? Pruned off white spaces present in the user provided comma separated list of classes for spark.kryo.classesToRegister and spark.kryo.registrator. ## How was this patch tested? Manual tests Author: Victor Chima <blazy2k9@gmail.com> Closes #12701 from blazy2k9/master.	2016-04-27 16:52:34 +01:00
Dongjoon Hyun	c5443560b7	[MINOR][BUILD] Enable RAT checking on `LZ4BlockInputStream.java`. ## What changes were proposed in this pull request? Since `LZ4BlockInputStream.java` is not licensed to Apache Software Foundation (ASF), the Apache License header of that file is not monitored until now. This PR aims to enable RAT checking on `LZ4BlockInputStream.java` by excluding from `dev/.rat-excludes`. This will prevent accidental removal of Apache License header from that file. ## How was this patch tested? Pass the Jenkins tests (Specifically, RAT check stage). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12677 from dongjoon-hyun/minor_rat_exclusion_file.	2016-04-27 09:15:06 +01:00
Liwei Lin	b2a4560648	[SPARK-14911] [CORE] Fix a potential data race in TaskMemoryManager ## What changes were proposed in this pull request? [[SPARK-13210][SQL] catch OOM when allocate memory and expand array](`37bc203c8d`) introduced an `acquiredButNotUsed` field, but it might not be correctly synchronized: - the write `acquiredButNotUsed += acquired` is guarded by `this` lock (see [here](https://github.com/apache/spark/blame/master/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java#L271)); - the read `memoryManager.releaseExecutionMemory(acquiredButNotUsed, taskAttemptId, tungstenMemoryMode)` (see [here](https://github.com/apache/spark/blame/master/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java#L400)) might not be correctly synchronized, and thus might not see `acquiredButNotUsed`'s most recent value. This patch makes `acquiredButNotUsed` volatile to fix this. ## How was this patch tested? This should be covered by existing suits. Author: Liwei Lin <lwlin7@gmail.com> Closes #12681 from lw-lin/fix-acquiredButNotUsed.	2016-04-26 23:08:40 -07:00
Azeem Jiva	de6e633420	[SPARK-14756][CORE] Use parseLong instead of valueOf ## What changes were proposed in this pull request? Use Long.parseLong which returns a primative. Use a series of appends() reduces the creation of an extra StringBuilder type ## How was this patch tested? Unit tests Author: Azeem Jiva <azeemj@gmail.com> Closes #12520 from javawithjiva/minor.	2016-04-26 11:49:04 +01:00
Subhobrata Dey	f70e4fff0e	[SPARK-14889][SPARK CORE] scala.MatchError: NONE (of class scala.Enumeration) when spark.scheduler.mode=NONE ## What changes were proposed in this pull request? Handling exception for the below mentioned issue ``` ➜ spark git:(master) ✗ ./bin/spark-shell -c spark.scheduler.mode=NONE 16/04/25 09:15:00 ERROR SparkContext: Error initializing SparkContext. scala.MatchError: NONE (of class scala.Enumeration$Val) at org.apache.spark.scheduler.Pool.<init>(Pool.scala:53) at org.apache.spark.scheduler.TaskSchedulerImpl.initialize(TaskSchedulerImpl.scala:131) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2352) at org.apache.spark.SparkContext.<init>(SparkContext.scala:492) ``` The exception now looks like ``` java.lang.RuntimeException: The scheduler mode NONE is not supported by Spark. ``` ## How was this patch tested? manual tests Author: Subhobrata Dey <sbcd90@gmail.com> Closes #12666 from sbcd90/schedulerModeIssue.	2016-04-26 11:46:24 +01:00
Lianhui Wang	6bfe42a3be	[SPARK-14731][shuffle]Revert SPARK-12130 to make 2.0 shuffle service compatible with 1.x ## What changes were proposed in this pull request? SPARK-12130 make 2.0 shuffle service incompatible with 1.x. So from discussion: [http://apache-spark-developers-list.1001551.n3.nabble.com/YARN-Shuffle-service-and-its-compatibility-td17222.html](url) we should maintain compatibility between Spark 1.x and Spark 2.x's shuffle service. I put string comparison into executor's register at first avoid string comparison in getBlockData every time. ## How was this patch tested? N/A Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes #12568 from lianhuiwang/SPARK-14731.	2016-04-25 12:33:32 -07:00
Peter Ableda	cef77d1f68	[SPARK-14636] Add minimum memory checks for drivers and executors ## What changes were proposed in this pull request? Implement the same memory size validations for the StaticMemoryManager (Legacy) as the UnifiedMemoryManager has. ## How was this patch tested? Manual tests were done in CDH cluster. Test with small executor memory: ` spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client --master yarn --executor-memory 15m --conf spark.memory.useLegacyMode=true /opt/cloudera/parcels/CDH/lib/spark/examples/lib/spark-examples*.jar 10 ` Exception thrown: ``` ERROR spark.SparkContext: Error initializing SparkContext. java.lang.IllegalArgumentException: Executor memory 15728640 must be at least 471859200. Please increase executor memory using the --executor-memory option or spark.executor.memory in Spark configuration. at org.apache.spark.memory.StaticMemoryManager$.org$apache$spark$memory$StaticMemoryManager$$getMaxExecutionMemory(StaticMemoryManager.scala:127) at org.apache.spark.memory.StaticMemoryManager.<init>(StaticMemoryManager.scala:46) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:352) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:289) at org.apache.spark.SparkContext.<init>(SparkContext.scala:462) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ``` Author: Peter Ableda <peter.ableda@cloudera.com> Closes #12395 from peterableda/SPARK-14636.	2016-04-25 10:42:49 +02:00
felixcheung	c752b6c5ec	[SPARK-14881] [PYTHON] [SPARKR] pyspark and sparkR shell default log level should match spark-shell/Scala ## What changes were proposed in this pull request? Change default logging to WARN for pyspark shell and sparkR shell for a much cleaner environment. ## How was this patch tested? Manually running pyspark and sparkR shell Author: felixcheung <felixcheung_m@hotmail.com> Closes #12648 from felixcheung/pylogging.	2016-04-24 22:51:18 -07:00
Dongjoon Hyun	d34d650378	[SPARK-14868][BUILD] Enable NewLineAtEofChecker in checkstyle and fix lint-java errors ## What changes were proposed in this pull request? Spark uses `NewLineAtEofChecker` rule in Scala by ScalaStyle. And, most Java code also comply with the rule. This PR aims to enforce the same rule `NewlineAtEndOfFile` by CheckStyle explicitly. Also, this fixes lint-java errors since SPARK-14465. The followings are the items. - Adds a new line at the end of the files (19 files) - Fixes 25 lint-java errors (12 RedundantModifier, 6 ArrayTypeStyle, 2 LineLength, 2 UnusedImports, 2 RegexpSingleline, 1 ModifierOrder) ## How was this patch tested? After the Jenkins test succeeds, `dev/lint-java` should pass. (Currently, Jenkins dose not run lint-java.) ```bash $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12632 from dongjoon-hyun/SPARK-14868.	2016-04-24 20:40:03 -07:00
Sean Owen	be0d5d3bbe	[SPARK-14873][CORE] Java sampleByKey methods take ju.Map but with Scala Double values; results in type Object ## What changes were proposed in this pull request? Java `sampleByKey` methods should accept `Map` with `java.lang.Double` values ## How was this patch tested? Existing (updated) Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #12637 from srowen/SPARK-14873.	2016-04-23 10:47:50 -07:00
Davies Liu	0dcf9dbebb	[SPARK-14669] [SQL] Fix some SQL metrics in codegen and added more ## What changes were proposed in this pull request? 1. Fix the "spill size" of TungstenAggregate and Sort 2. Rename "data size" to "peak memory" to match the actual meaning (also consistent with task metrics) 3. Added "data size" for ShuffleExchange and BroadcastExchange 4. Added some timing for Sort, Aggregate and BroadcastExchange (this requires another patch to work) ## How was this patch tested? Existing tests. ![metrics](https://cloud.githubusercontent.com/assets/40902/14573908/21ad2f00-030d-11e6-9e2c-c544f30039ea.png) Author: Davies Liu <davies@databricks.com> Closes #12425 from davies/fix_metrics.	2016-04-22 12:59:32 -07:00
Reynold Xin	c089c6f4e8	[SPARK-10001] Consolidate Signaling and SignalLogger. ## What changes were proposed in this pull request? This is a follow-up to #12557, with the following changes: 1. Fixes some of the style issues. 2. Merges Signaling and SignalLogger into a new class called SignalUtils. It was pretty confusing to have Signaling and Signal in one file, and it was also confusing to have two classes named Signaling and one called the other. 3. Made logging registration idempotent. ## How was this patch tested? N/A. Author: Reynold Xin <rxin@databricks.com> Closes #12605 from rxin/SPARK-10001.	2016-04-22 09:36:59 -07:00
Joan	bf95b8da27	[SPARK-6429] Implement hashCode and equals together ## What changes were proposed in this pull request? Implement some `hashCode` and `equals` together in order to enable the scalastyle. This is a first batch, I will continue to implement them but I wanted to know your thoughts. Author: Joan <joan@goyeau.com> Closes #12157 from joan38/SPARK-6429-HashCode-Equals.	2016-04-22 12:24:12 +01:00
Jakob Odersky	80127935df	[SPARK-10001] [CORE] Interrupt tasks in repl with Ctrl+C ## What changes were proposed in this pull request? Improve signal handling to allow interrupting running tasks from the REPL (with Ctrl+C). If no tasks are running or Ctrl+C is pressed twice, the signal is forwarded to the default handler resulting in the usual termination of the application. This PR is a rewrite of -- and therefore closes #8216 -- as per piaozhexiu's request ## How was this patch tested? Signal handling is not easily testable therefore no unit tests were added. Nevertheless, the new functionality is implemented in a best-effort approach, soft-failing in case signals aren't available on a specific OS. Author: Jakob Odersky <jakob@odersky.com> Closes #12557 from jodersky/SPARK-10001-sigint.	2016-04-21 22:04:08 -07:00
Reynold Xin	0bf8df250e	[HOTFIX] Fix Java 7 compilation break	2016-04-21 17:52:10 -07:00
Eric Liang	e2b5647ab9	[SPARK-14724] Use radix sort for shuffles and sort operator when possible ## What changes were proposed in this pull request? Spark currently uses TimSort for all in-memory sorts, including sorts done for shuffle. One low-hanging fruit is to use radix sort when possible (e.g. sorting by integer keys). This PR adds a radix sort implementation to the unsafe sort package and switches shuffles and sorts to use it when possible. The current implementation does not have special support for null values, so we cannot radix-sort `LongType`. I will address this in a follow-up PR. ## How was this patch tested? Unit tests, enabling radix sort on existing tests. Microbenchmark results: ``` Running benchmark: radix sort 25000000 Java HotSpot(TM) 64-Bit Server VM 1.8.0_66-b17 on Linux 3.13.0-44-generic Intel(R) Core(TM) i7-4600U CPU 2.10GHz radix sort 25000000: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------- reference TimSort key prefix array 15546 / 15859 1.6 621.9 1.0X reference Arrays.sort 2416 / 2446 10.3 96.6 6.4X radix sort one byte 133 / 137 188.4 5.3 117.2X radix sort two bytes 255 / 258 98.2 10.2 61.1X radix sort eight bytes 991 / 997 25.2 39.6 15.7X radix sort key prefix array 1540 / 1563 16.2 61.6 10.1X ``` I also ran a mix of the supported TPCDS queries and compared TimSort vs RadixSort metrics. The overall benchmark ran ~10% faster with radix sort on. In the breakdown below, the radix-enabled sort phases averaged about 20x faster than TimSort, however sorting is only a small fraction of the overall runtime. About half of the TPCDS queries were able to take advantage of radix sort. ``` TPCDS on master: 2499s real time, 8185s executor - 1171s in TimSort, avg 267 MB/s (note the /s accounting is weird here since dataSize counts the record sizes too) TPCDS with radix enabled: 2294s real time, 7391s executor - 596s in TimSort, avg 254 MB/s - 26s in radix sort, avg 4.2 GB/s ``` cc davies rxin Author: Eric Liang <ekl@databricks.com> Closes #12490 from ericl/sort-benchmark.	2016-04-21 16:48:51 -07:00
Shixiong Zhu	e4904d870a	[SPARK-14699][CORE] Stop endpoints before closing the connections and don't stop client in Outbox ## What changes were proposed in this pull request? In general, `onDisconnected` is for dealing with unexpected network disconnections. When RpcEnv.shutdown is called, the disconnections are expected so RpcEnv should not fire these events. This PR moves `dispatcher.stop()` above closing the connections so that when stopping RpcEnv, the endpoints won't receive `onDisconnected` events. In addition, Outbox should not close the client since it will be reused by others. This PR fixes it as well. ## How was this patch tested? test("SPARK-14699: RpcEnv.shutdown should not fire onDisconnected events") Author: Shixiong Zhu <shixiong@databricks.com> Closes #12481 from zsxwing/SPARK-14699.	2016-04-21 11:51:04 -07:00
Lianhui Wang	4f369176b7	[SPARK-4452] [CORE] Shuffle data structures can starve others on the same thread for memory ## What changes were proposed in this pull request? In #9241 It implemented a mechanism to call spill() on those SQL operators that support spilling if there is not enough memory for execution. But ExternalSorter and AppendOnlyMap in Spark core are not worked. So this PR make them benefit from #9241. Now when there is not enough memory for execution, it can get memory by spilling ExternalSorter and AppendOnlyMap in Spark core. ## How was this patch tested? add two unit tests for it. Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes #10024 from lianhuiwang/SPARK-4452-2.	2016-04-21 10:02:23 -07:00
Parth Brahmbhatt	6fdd0e32a6	[SPARK-13988][CORE] Make replaying event logs multi threaded in Histo…ry server to ensure a single large log does not block other logs from being rendered. ## What changes were proposed in this pull request? The patch makes event log processing multi threaded. ## How was this patch tested? Existing tests pass, there is no new tests needed to test the functionality as this is a perf improvement. I tested the patch locally by generating one big event log (big1), one small event log(small1) and again a big event log(big2). Without this patch UI does not render any app for almost 30 seconds and then big2 and small1 appears. another 30 second delay and finally big1 also shows up in UI. With this change small1 shows up immediately and big1 and big2 comes up in 30 seconds. Locally it also displays them in the correct order in the UI. Author: Parth Brahmbhatt <pbrahmbhatt@netflix.com> Closes #11800 from Parth-Brahmbhatt/SPARK-13988.	2016-04-21 06:58:00 -05:00
Bryan Cutler	d53a51c1e5	[SPARK-14779][CORE] Corrected log message in Worker case KillExecutor In o.a.s.deploy.worker.Worker.scala, when receiving a KillExecutor message from an invalid Master, fixed typo by changing the log message to read "..attemped to kill executor.." Author: Bryan Cutler <cutlerb@gmail.com> Closes #12546 from BryanCutler/worker-killexecutor-log-message.	2016-04-21 11:33:42 +01:00
Wenchen Fan	cb51680d22	[SPARK-14753][CORE] remove internal flag in Accumulable ## What changes were proposed in this pull request? the `Accumulable.internal` flag is only used to avoid registering internal accumulators for 2 certain cases: 1. `TaskMetrics.createTempShuffleReadMetrics`: the accumulators in the temp shuffle read metrics should not be registered. 2. `TaskMetrics.fromAccumulatorUpdates`: the created task metrics is only used to post event, accumulators inside it should not be registered. For 1, we can create a `TempShuffleReadMetrics` that don't create accumulators, just keep the data and merge it at last. For 2, we can un-register these accumulators immediately. TODO: remove `internal` flag in `AccumulableInfo` with followup PR ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #12525 from cloud-fan/acc.	2016-04-21 01:06:22 -07:00
Marcelo Vanzin	f47dbf27fa	[SPARK-14602][YARN] Use SparkConf to propagate the list of cached files. This change avoids using the environment to pass this information, since with many jars it's easy to hit limits on certain OSes. Instead, it encodes the information into the Spark configuration propagated to the AM. The first problem that needed to be solved is a chicken & egg issue: the config file is distributed using the cache, and it needs to contain information about the files that are being distributed. To solve that, the code now treats the config archive especially, and uses slightly different code to distribute it, so that only its cache path needs to be saved to the config file. The second problem is that the extra information would show up in the Web UI, which made the environment tab even more noisy than it already is when lots of jars are listed. This is solved by two changes: the list of cached files is now read only once in the AM, and propagated down to the ExecutorRunnable code (which actually sends the list to the NMs when starting containers). The second change is to unset those config entries after the list is read, so that the SparkContext never sees them. Tested with both client and cluster mode by running "run-example SparkPi". This uploads a whole lot of files when run from a build dir (instead of a distribution, where the list is cleaned up), and I verified that the configs do not show up in the UI. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #12487 from vanzin/SPARK-14602.	2016-04-20 16:57:23 -07:00
Andrew Or	8fc267ab33	[SPARK-14720][SPARK-13643] Move Hive-specific methods into HiveSessionState and Create a SparkSession class ## What changes were proposed in this pull request? This PR has two main changes. 1. Move Hive-specific methods from HiveContext to HiveSessionState, which help the work of removing HiveContext. 2. Create a SparkSession Class, which will later be the entry point of Spark SQL users. ## How was this patch tested? Existing tests This PR is trying to fix test failures of https://github.com/apache/spark/pull/12485. Author: Andrew Or <andrew@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #12522 from yhuai/spark-session.	2016-04-20 12:58:48 -07:00
jerryshao	90cbc82fd4	[SPARK-14725][CORE] Remove HttpServer class ## What changes were proposed in this pull request? This proposal removes the class `HttpServer`, with the changing of internal file/jar/class transmission to RPC layer, currently there's no code using this `HttpServer`, so here propose to remove it. ## How was this patch tested? Unit test is verified locally. Author: jerryshao <sshao@hortonworks.com> Closes #12526 from jerryshao/SPARK-14725.	2016-04-20 10:48:11 -07:00
Alex Bozarth	834277884f	[SPARK-8171][WEB UI] Javascript based infinite scrolling for the log page Updated the log page by replacing the current pagination with a javascript-based infinite scroll solution Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #10910 from ajbozarth/spark8171.	2016-04-20 21:24:11 +09:00
Liwei Lin	17db4bfeaa	[SPARK-14687][CORE][SQL][MLLIB] Call path.getFileSystem(conf) instead of call FileSystem.get(conf) ## What changes were proposed in this pull request? - replaced `FileSystem.get(conf)` calls with `path.getFileSystem(conf)` ## How was this patch tested? N/A Author: Liwei Lin <lwlin7@gmail.com> Closes #12450 from lw-lin/fix-fs-get.	2016-04-20 11:28:51 +01:00
Ryan Blue	a3451119d9	[SPARK-14679][UI] Fix UI DAG visualization OOM. ## What changes were proposed in this pull request? The DAG visualization can cause an OOM when generating the DOT file. This happens because clusters are not correctly deduped by a contains check because they use the default equals implementation. This adds a working equals implementation. ## How was this patch tested? This adds a test suite that checks the new equals implementation. Author: Ryan Blue <blue@apache.org> Closes #12437 from rdblue/SPARK-14679-fix-ui-oom.	2016-04-20 11:26:42 +01:00
Wenchen Fan	85d759ca3a	[SPARK-14704][CORE] create accumulators in TaskMetrics ## What changes were proposed in this pull request? Before this PR, we create accumulators at driver side(and register them) and send them to executor side, then we create `TaskMetrics` with these accumulators at executor side. After this PR, we will create `TaskMetrics` at driver side and send it to executor side, so that we can create accumulators inside `TaskMetrics` directly, which is cleaner. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #12472 from cloud-fan/acc.	2016-04-19 21:20:24 -07:00
felixcheung	ecd877e833	[SPARK-12224][SPARKR] R support for JDBC source Add R API for `read.jdbc`, `write.jdbc`. Tested this quite a bit manually with different combinations of parameters. It's not clear if we could have automated tests in R for this - Scala `JDBCSuite` depends on Java H2 in-memory database. Refactored some code into util so they could be tested. Core's R SerDe code needs to be updated to allow access to java.util.Properties as `jobj` handle which is required by DataFrameReader/Writer's `jdbc` method. It would be possible, though more code to add a `sql/r/SQLUtils` helper function. Tested: ``` # with postgresql ../bin/sparkR --driver-class-path /usr/share/java/postgresql-9.4.1207.jre7.jar # read.jdbc df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", user = "user", password = "12345") df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", user = "user", password = 12345) # partitionColumn and numPartitions test df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", partitionColumn = "did", lowerBound = 0, upperBound = 200, numPartitions = 4, user = "user", password = 12345) a <- SparkR:::toRDD(df) SparkR:::getNumPartitions(a) [1] 4 SparkR:::collectPartition(a, 2L) # defaultParallelism test df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", partitionColumn = "did", lowerBound = 0, upperBound = 200, user = "user", password = 12345) SparkR:::getNumPartitions(a) [1] 2 # predicates test df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", predicates = list("did<=105"), user = "user", password = 12345) count(df) == 1 # write.jdbc, default save mode "error" irisDf <- as.DataFrame(sqlContext, iris) write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "films2", user = "user", password = "12345") "error, already exists" write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "iris", user = "user", password = "12345") ``` Author: felixcheung <felixcheung_m@hotmail.com> Closes #10480 from felixcheung/rreadjdbc.	2016-04-19 15:59:47 -07:00
Eric Liang	008a8bbef0	[SPARK-14733] Allow custom timing control in microbenchmarks ## What changes were proposed in this pull request? The current benchmark framework runs a code block for several iterations and reports statistics. However there is no way to exclude per-iteration setup time from the overall results. This PR adds a timer control object passed into the closure that can be used for this purpose. ## How was this patch tested? Existing benchmark code. Also see https://github.com/apache/spark/pull/12490 Author: Eric Liang <ekl@databricks.com> Closes #12502 from ericl/spark-14733.	2016-04-19 15:55:21 -07:00
Nezih Yigitbasi	3c91afec20	[SPARK-14042][CORE] Add custom coalescer support ## What changes were proposed in this pull request? This PR adds support for specifying an optional custom coalescer to the `coalesce()` method. Currently I have only added this feature to the `RDD` interface, and once we sort out the details we can proceed with adding this feature to the other APIs (`Dataset` etc.) ## How was this patch tested? Added a unit test for this functionality. /cc rxin (per our discussion on the mailing list) Author: Nezih Yigitbasi <nyigitbasi@netflix.com> Closes #11865 from nezihyigitbasi/custom_coalesce_policy.	2016-04-19 14:35:26 -07:00
Kazuaki Ishizaki	0b8369d854	[SPARK-14656][CORE] Fix Benchmark.getPorcessorName() always return "Unknown processor" on Linux ## What changes were proposed in this pull request? This PR returns correct processor name in ```/proc/cpuinfo``` on Linux from ```Benchmark.getPorcessorName()```. Now, this return ```Unknown processor```. Since ```Utils.executeAndGetOutput(Seq("which", "grep"))``` return ```/bin/grep\n```, it is failed to execute ```/bin/grep\n```. This PR strips ```\n``` at the end of the line of a result of ```Utils.executeAndGetOutput()``` Before applying this PR ```` Java HotSpot(TM) 64-Bit Server VM 1.8.0_66-b17 on Linux 2.6.32-504.el6.x86_64 Unknown processor back-to-back filter: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------- Dataset 472 / 503 21.2 47.2 1.0X DataFrame 51 / 58 198.0 5.1 9.3X RDD 189 / 211 52.8 18.9 2.5X ```` After applying this PR ``` Java HotSpot(TM) 64-Bit Server VM 1.8.0_66-b17 on Linux 2.6.32-504.el6.x86_64 Intel(R) Xeon(R) CPU E5-2667 v2 3.30GHz back-to-back filter: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------- Dataset 490 / 502 20.4 49.0 1.0X DataFrame 55 / 61 183.4 5.5 9.0X RDD 210 / 237 47.7 21.0 2.3X ``` ## How was this patch tested? Run Benchmark programs on Linux by hand Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #12411 from kiszk/SPARK-14656.	2016-04-19 23:30:34 +02:00
Josh Rosen	947b9020b0	[SPARK-14676] Wrap and re-throw Await.result exceptions in order to capture full stacktrace When `Await.result` throws an exception which originated from a different thread, the resulting stacktrace doesn't include the path leading to the `Await.result` call itself, making it difficult to identify the impact of these exceptions. For example, I've seen cases where broadcast cleaning errors propagate to the main thread and crash it but the resulting stacktrace doesn't include any of the main thread's code, making it difficult to pinpoint which exception crashed that thread. This patch addresses this issue by explicitly catching, wrapping, and re-throwing exceptions that are thrown by `Await.result`. I tested this manually using `16b31c8251`, a patch which reproduces an issue where an RPC exception which occurs while unpersisting RDDs manages to crash the main thread without any useful stacktrace, and verified that informative, full stacktraces were generated after applying the fix in this PR. /cc rxin nongli yhuai anabranch Author: Josh Rosen <joshrosen@databricks.com> Closes #12433 from JoshRosen/wrap-and-rethrow-await-exceptions.	2016-04-19 10:38:10 -07:00
tedyu	e89633605e	[SPARK-13904] Add exit code parameter to exitExecutor() ## What changes were proposed in this pull request? This PR adds exit code parameter to exitExecutor() so that caller can specify different exit code. ## How was this patch tested? Existing test rxin hbhanawat Author: tedyu <yuzhihong@gmail.com> Closes #12457 from tedyu/master.	2016-04-19 10:12:36 -07:00
Reynold Xin	5e92583d38	[SPARK-14667] Remove HashShuffleManager ## What changes were proposed in this pull request? The sort shuffle manager has been the default since Spark 1.2. It is time to remove the old hash shuffle manager. ## How was this patch tested? Removed some tests related to the old manager. Author: Reynold Xin <rxin@databricks.com> Closes #12423 from rxin/SPARK-14667.	2016-04-18 19:30:00 -07:00
CodingCat	4b3d1294ae	[SPARK-13227] Risky apply() in OpenHashMap https://issues.apache.org/jira/browse/SPARK-13227 It might confuse the future developers when they use OpenHashMap.apply() with a numeric value type. null.asInstance[Int], null.asInstance[Long], null.asInstace[Float] and null.asInstance[Double] will return 0/0.0/0L, which might confuse the developer if the value set contains 0/0.0/0L with an existing key The current patch only adds the comments describing the issue, with the respect to apply the minimum changes to the code base The more direct, yet more aggressive, approach is use Option as the return type andrewor14 JoshRosen any thoughts about how to avoid the potential issue? Author: CodingCat <zhunansjtu@gmail.com> Closes #11107 from CodingCat/SPARK-13227.	2016-04-18 18:51:23 -07:00
Wenchen Fan	602734084c	[SPARK-14628][CORE][FOLLLOW-UP] Always tracking read/write metrics ## What changes were proposed in this pull request? This PR is a follow up for https://github.com/apache/spark/pull/12417, now we always track input/output/shuffle metrics in spark JSON protocol and status API. Most of the line changes are because of re-generating the gold answer for `HistoryServerSuite`, and we add a lot of 0 values for read/write metrics. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #12462 from cloud-fan/follow.	2016-04-18 15:17:29 -07:00
Reynold Xin	8a87f7d5c8	Mark ExternalClusterManager as private[spark].	2016-04-16 23:49:26 -07:00
Hemant Bhanawat	af1f4da762	[SPARK-13904][SCHEDULER] Add support for pluggable cluster manager ## What changes were proposed in this pull request? This commit adds support for pluggable cluster manager. And also allows a cluster manager to clean up tasks without taking the parent process down. To plug a new external cluster manager, ExternalClusterManager trait should be implemented. It returns task scheduler and backend scheduler that will be used by SparkContext to schedule tasks. An external cluster manager is registered using the java.util.ServiceLoader mechanism (This mechanism is also being used to register data sources like parquet, json, jdbc etc.). This allows auto-loading implementations of ExternalClusterManager interface. Currently, when a driver fails, executors exit using system.exit. This does not bode well for cluster managers that would like to reuse the parent process of an executor. Hence, 1. Moving system.exit to a function that can be overriden in subclasses of CoarseGrainedExecutorBackend. 2. Added functionality of killing all the running tasks in an executor. ## How was this patch tested? ExternalClusterManagerSuite.scala was added to test this patch. Author: Hemant Bhanawat <hemant@snappydata.io> Closes #11723 from hbhanawat/pluggableScheduler.	2016-04-16 23:43:32 -07:00
hyukjinkwon	9f678e9754	[MINOR] Remove inappropriate type notation and extra anonymous closure within functional transformations ## What changes were proposed in this pull request? This PR removes - Inappropriate type notations For example, from ```scala words.foreachRDD { (rdd: RDD[String], time: Time) => ... ``` to ```scala words.foreachRDD { (rdd, time) => ... ``` - Extra anonymous closure within functional transformations. For example, ```scala .map(item => { ... }) ``` which can be just simply as below: ```scala .map { item => ... } ``` and corrects some obvious style nits. ## How was this patch tested? This was tested after adding rules in `scalastyle-config.xml`, which ended up with not finding all perfectly. The rules applied were below: - For the first correction, ```xml <check customId="NoExtraClosure" level="error" class="org.scalastyle.file.RegexChecker" enabled="true"> <parameters><parameter name="regex">(?m)\.[a-zA-Z_][a-zA-Z0-9]$\s[^,]+s=>\s\{[^\}]+\}\s$</parameter></parameters> </check> ``` ```xml <check customId="NoExtraClosure" level="error" class="org.scalastyle.file.RegexChecker" enabled="true"> <parameters><parameter name="regex">\.[a-zA-Z_][a-zA-Z0-9]\s[\{\|\(]([^\n>,]+=>)?\s\{([^()]\|(?R))\}^[,]</parameter></parameters> </check> ``` - For the second correction ```xml <check customId="TypeNotation" level="error" class="org.scalastyle.file.RegexChecker" enabled="true"> <parameters><parameter name="regex">\.[a-zA-Z_][a-zA-Z0-9]\s[\{\|\(]\s\([^):]:R))\}^[,]</parameter></parameters> </check> ``` Those rules were not added Author: hyukjinkwon <gurwls223@gmail.com> Closes #12413 from HyukjinKwon/SPARK-style.	2016-04-16 14:56:23 +01:00
Reynold Xin	8028a28885	[SPARK-14628][CORE] Simplify task metrics by always tracking read/write metrics ## What changes were proposed in this pull request? Part of the reason why TaskMetrics and its callers are complicated are due to the optional metrics we collect, including input, output, shuffle read, and shuffle write. I think we can always track them and just assign 0 as the initial values. It is usually very obvious whether a task is supposed to read any data or not. By always tracking them, we can remove a lot of map, foreach, flatMap, getOrElse(0L) calls throughout Spark. This patch also changes a few behaviors. 1. Removed the distinction of data read/write methods (e.g. Hadoop, Memory, Network, etc). 2. Accumulate all data reads and writes, rather than only the first method. (Fixes SPARK-5225) ## How was this patch tested? existing tests. This is bases on https://github.com/apache/spark/pull/12388, with more test fixes. Author: Reynold Xin <rxin@databricks.com> Author: Wenchen Fan <wenchen@databricks.com> Closes #12417 from cloud-fan/metrics-refactor.	2016-04-15 15:39:39 -07:00
Peter Ableda	06b9d623e8	[SPARK-14633] Use more readable format to show memory bytes in Error Message ## What changes were proposed in this pull request? Round memory bytes and convert it to Long to it’s original type. This change fixes the formatting issue in the Exception message. ## How was this patch tested? Manual tests were done in CDH cluster. Author: Peter Ableda <peter.ableda@cloudera.com> Closes #12392 from peterableda/SPARK-14633.	2016-04-15 13:18:48 +01:00
Mark Grover	ff9ae61a3b	[SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assembly ## What changes were proposed in this pull request? Removing references to assembly jar in documentation. Adding an additional (previously undocumented) usage of spark-submit to run examples. ## How was this patch tested? Ran spark-submit usage to ensure formatting was fine. Ran examples using SparkSubmit. Author: Mark Grover <mark@apache.org> Closes #12365 from markgrover/spark-14601.	2016-04-14 18:51:43 -07:00
Wenchen Fan	1d04c86fc5	[SPARK-14558][CORE] In ClosureCleaner, clean the outer pointer if it's a REPL line object ## What changes were proposed in this pull request? When we clean a closure, if its outermost parent is not a closure, we won't clone and clean it as cloning user's objects is dangerous. However, if it's a REPL line object, which may carry a lot of unnecessary references(like hadoop conf, spark conf, etc.), we should clean it as it's not a user object. This PR improves the check for user's objects to exclude REPL line object. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #12327 from cloud-fan/closure.	2016-04-14 10:58:06 -07:00
Reynold Xin	a46f98d3f4	[SPARK-14617] Remove deprecated APIs in TaskMetrics ## What changes were proposed in this pull request? This patch removes some of the deprecated APIs in TaskMetrics. This is part of my bigger effort to simplify accumulators and task metrics. ## How was this patch tested? N/A - only removals Author: Reynold Xin <rxin@databricks.com> Closes #12375 from rxin/SPARK-14617.	2016-04-14 10:56:13 -07:00
Reynold Xin	dac40b68dc	[SPARK-14619] Track internal accumulators (metrics) by stage attempt ## What changes were proposed in this pull request? When there are multiple attempts for a stage, we currently only reset internal accumulator values if all the tasks are resubmitted. It would make more sense to reset the accumulator values for each stage attempt. This will allow us to eventually get rid of the internal flag in the Accumulator class. This is part of my bigger effort to simplify accumulators and task metrics. ## How was this patch tested? Covered by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #12378 from rxin/SPARK-14619.	2016-04-14 10:54:57 -07:00
Liwei Lin	3e27940a19	[SPARK-14630][BUILD][CORE][SQL][STREAMING] Code style: public abstract methods should have explicit return types ## What changes were proposed in this pull request? Currently many public abstract methods (in abstract classes as well as traits) don't declare return types explicitly, such as in [o.a.s.streaming.dstream.InputDStream](https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/InputDStream.scala#L110): ```scala def start() // should be: def start(): Unit def stop() // should be: def stop(): Unit ``` These methods exist in core, sql, streaming; this PR fixes them. ## How was this patch tested? N/A ## Which piece of scala style rule led to the changes? the rule was added separately in https://github.com/apache/spark/pull/12396 Author: Liwei Lin <lwlin7@gmail.com> Closes #12389 from lw-lin/public-abstract-methods.	2016-04-14 10:14:38 -07:00
Reynold Xin	de2ad52855	[SPARK-14625] TaskUIData and ExecutorUIData shouldn't be case classes ## What changes were proposed in this pull request? I was trying to understand the accumulator and metrics update source code and these two classes don't really need to be case classes. It would also be more consistent with other UI classes if they are not case classes. This is part of my bigger effort to simplify accumulators and task metrics. ## How was this patch tested? This is a straightforward refactoring without behavior change. Author: Reynold Xin <rxin@databricks.com> Closes #12386 from rxin/SPARK-14625.	2016-04-14 10:12:29 -07:00
hyukjinkwon	6fc3dc8839	[MINOR][SQL] Remove extra anonymous closure within functional transformations ## What changes were proposed in this pull request? This PR removes extra anonymous closure within functional transformations. For example, ```scala .map(item => { ... }) ``` which can be just simply as below: ```scala .map { item => ... } ``` ## How was this patch tested? Related unit tests and `sbt scalastyle`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12382 from HyukjinKwon/minor-extra-closers.	2016-04-14 09:43:41 +01:00
hyukjinkwon	b4819404a6	[SPARK-14596][SQL] Remove not used SqlNewHadoopRDD and some more unused imports ## What changes were proposed in this pull request? Old `HadoopFsRelation` API includes `buildInternalScan()` which uses `SqlNewHadoopRDD` in `ParquetRelation`. Because now the old API is removed, `SqlNewHadoopRDD` is not used anymore. So, this PR removes `SqlNewHadoopRDD` and several unused imports. This was discussed in https://github.com/apache/spark/pull/12326. ## How was this patch tested? Several related existing unit tests and `sbt scalastyle`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12354 from HyukjinKwon/SPARK-14596.	2016-04-14 15:43:44 +08:00
Charles Allen	dd11e401e4	[SPARK-14537][CORE] Make TaskSchedulerImpl waiting fail if context is shut down This patch makes the postStartHook throw an IllegalStateException if the SparkContext is shutdown while it is waiting for the backend to be ready Author: Charles Allen <charles@allen-net.com> Closes #12301 from drcrallen/SPARK-14537.	2016-04-13 16:02:49 +01:00
Liwei Lin	23f93f559c	[SPARK-13992][CORE][PYSPARK][FOLLOWUP] Update OFF_HEAP semantics for Java api and Python api ## What changes were proposed in this pull request? - updated `OFF_HEAP` semantics for `StorageLevels.java` - updated `OFF_HEAP` semantics for `storagelevel.py` ## How was this patch tested? no need to test Author: Liwei Lin <lwlin7@gmail.com> Closes #12126 from lw-lin/storagelevel.py.	2016-04-12 23:06:55 -07:00
Sital Kedia	d187e7dea9	[SPARK-14363] Fix executor OOM due to memory leak in the Sorter ## What changes were proposed in this pull request? Fix memory leak in the Sorter. When the UnsafeExternalSorter spills the data to disk, it does not free up the underlying pointer array. As a result, we see a lot of executor OOM and also memory under utilization. This is a regression partially introduced in PR https://github.com/apache/spark/pull/9241 ## How was this patch tested? Tested by running a job and observed around 30% speedup after this change. Author: Sital Kedia <skedia@fb.com> Closes #12285 from sitalkedia/executor_oom.	2016-04-12 16:10:07 -07:00
Davies Liu	1ef5f8cfa6	[SPARK-14544] [SQL] improve performance of SQL UI tab ## What changes were proposed in this pull request? This PR improve the performance of SQL UI by: 1) remove the details column in all executions page (the first page in SQL tab). We can check the details by enter the execution page. 2) break-all is super slow in Chrome recently, so switch to break-word. 3) Using "display: none" to hide a block. 4) using one js closure for for all the executions, not one for each. 5) remove the height limitation of details, don't need to scroll it in the tiny window. ## How was this patch tested? Exists tests. ![ui](https://cloud.githubusercontent.com/assets/40902/14445712/68d7b258-0004-11e6-9b48-5d329b05d165.png) Author: Davies Liu <davies@databricks.com> Closes #12311 from davies/ui_perf.	2016-04-12 15:03:00 -07:00
Terence Yim	3e53de4bdd	[SPARK-14513][CORE] Fix threads left behind after stopping SparkContext ## What changes were proposed in this pull request? Shutting down `QueuedThreadPool` used by Jetty `Server` to avoid threads leakage after SparkContext is stopped. Note: If this fix is going to apply to the `branch-1.6`, one more patch on the `NettyRpcEnv` class is needed so that the `NettyRpcEnv._fileServer.shutdown` is called in the `NettyRpcEnv.cleanup` method. This is due to the removal of `_fileServer` field in the `NettyRpcEnv` class in the master branch. Please advice if a second PR is necessary for bring this fix back to `branch-1.6` ## How was this patch tested? Ran the ./dev/run-tests locally Author: Terence Yim <terence@cask.co> Closes #12318 from chtyim/fixes/SPARK-14513-thread-leak.	2016-04-12 13:46:39 -07:00
Dongjoon Hyun	b0f5497e95	[SPARK-14508][BUILD] Add a new ScalaStyle Rule `OmitBracesInCase` ## What changes were proposed in this pull request? According to the [Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) and [Scala Style Guide](http://docs.scala-lang.org/style/control-structures.html#curlybraces), we had better enforce the following rule. ``` case: Always omit braces in case clauses. ``` This PR makes a new ScalaStyle rule, 'OmitBracesInCase', and enforces it to the code. ## How was this patch tested? Pass the Jenkins tests (including Scala style checking) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12280 from dongjoon-hyun/SPARK-14508.	2016-04-12 00:43:28 -07:00
Eric Liang	6f27027d96	[SPARK-14475] Propagate user-defined context from driver to executors ## What changes were proposed in this pull request? This adds a new API call `TaskContext.getLocalProperty` for getting properties set in the driver from executors. These local properties are automatically propagated from the driver to executors. For streaming, the context for streaming tasks will be the initial driver context when ssc.start() is called. ## How was this patch tested? Unit tests. cc JoshRosen Author: Eric Liang <ekl@databricks.com> Closes #12248 from ericl/sc-2813.	2016-04-11 18:33:54 -07:00
Jason Moore	22014e6fb9	[SPARK-14357][CORE] Properly handle the root cause being a commit denied exception ## What changes were proposed in this pull request? When deciding whether a CommitDeniedException caused a task to fail, consider the root cause of the Exception. ## How was this patch tested? Added a test suite for the component that extracts the root cause of the error. Made a distribution after cherry-picking this commit to branch-1.6 and used to run our Spark application that would quite often fail due to the CommitDeniedException. Author: Jason Moore <jasonmoore2k@outlook.com> Closes #12228 from jasonmoore2k/SPARK-14357.	2016-04-09 23:34:57 -07:00
Dongjoon Hyun	aea30a1a9b	[SPARK-14465][BUILD] Checkstyle should check all Java files ## What changes were proposed in this pull request? Currently, `checkstyle` is configured to check the files under `src/main/java`. However, Spark has Java files in `src/main/scala`, too. This PR fixes the following configuration in `pom.xml` and the unchecked-so-far violations on those files. ```xml -<sourceDirectory>${basedir}/src/main/java</sourceDirectory> +<sourceDirectories>${basedir}/src/main/java,${basedir}/src/main/scala</sourceDirectories> ``` ## How was this patch tested? After passing the Jenkins build and manually `dev/lint-java`. (Note that Jenkins does not run `lint-java`) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12242 from dongjoon-hyun/SPARK-14465.	2016-04-09 21:31:20 -07:00
Davies Liu	5cb5edaf9c	[SPARK-14419] [SQL] Improve HashedRelation for key fit within Long ## What changes were proposed in this pull request? Currently, we use java HashMap for HashedRelation if the key could fit within a Long. The java HashMap and CompactBuffer are not memory efficient, the memory used by them is also accounted accurately. This PR introduce a LongToUnsafeRowMap (similar to BytesToBytesMap) for better memory efficiency and performance. This PR reopen #12190 to fix bugs. ## How was this patch tested? Existing tests. Author: Davies Liu <davies@databricks.com> Closes #12278 from davies/long_map3.	2016-04-09 17:44:38 -07:00
Sameer Agarwal	813e96e6fa	[SPARK-14454] Better exception handling while marking tasks as failed ## What changes were proposed in this pull request? This patch adds support for better handling of exceptions inside catch blocks if the code within the block throws an exception. For instance here is the code in a catch block before this change in `WriterContainer.scala`: ```scala logError("Aborting task.", cause) // call failure callbacks first, so we could have a chance to cleanup the writer. TaskContext.get().asInstanceOf[TaskContextImpl].markTaskFailed(cause) if (currentWriter != null) { currentWriter.close() } abortTask() throw new SparkException("Task failed while writing rows.", cause) ``` If `markTaskFailed` or `currentWriter.close` throws an exception, we currently lose the original cause. This PR fixes this problem by implementing a utility function `Utils.tryWithSafeCatch` that suppresses (`Throwable.addSuppressed`) the exception that are thrown within the catch block and rethrowing the original exception. ## How was this patch tested? No new functionality added Author: Sameer Agarwal <sameer@databricks.com> Closes #12234 from sameeragarwal/fix-exception.	2016-04-08 17:23:32 -07:00
Shixiong Zhu	4d7c359263	[SPARK-14437][CORE] Use the address that NettyBlockTransferService listens to create BlockManagerId ## What changes were proposed in this pull request? Here is why SPARK-14437 happens: BlockManagerId is created using NettyBlockTransferService.hostName which comes from `customHostname`. And `Executor` will set `customHostname` to the hostname which is detected by the driver. However, the driver may not be able to detect the correct address in some complicated network (Netty's Channel.remoteAddress doesn't always return a connectable address). In such case, `BlockManagerId` will be created using a wrong hostname. To fix this issue, this PR uses `hostname` provided by `SparkEnv.create` to create `NettyBlockTransferService` and set `NettyBlockTransferService.hostname` to this one directly. A bonus of this approach is NettyBlockTransferService won't bound to `0.0.0.0` which is much safer. ## How was this patch tested? Manually checked the bound address using local-cluster. Author: Shixiong Zhu <shixiong@databricks.com> Closes #12240 from zsxwing/SPARK-14437.	2016-04-08 17:18:19 -07:00
Michael Armbrust	692c74840b	[SPARK-14449][SQL] SparkContext should use SparkListenerInterface Currently all `SparkFirehoseListener` implementations are broken since we expect listeners to extend `SparkListener`, while the fire hose only extends `SparkListenerInterface`. This changes the addListener function and the config based injection to use the interface instead. The existing tests in SparkListenerSuite are improved such that they would have caught this. Follow-up to #12142 Author: Michael Armbrust <michael@databricks.com> Closes #12227 from marmbrus/fixListener.	2016-04-07 18:05:54 -07:00
Andrew Or	3e29e372ff	[SPARK-14468] Always enable OutputCommitCoordinator ## What changes were proposed in this pull request? `OutputCommitCoordinator` was introduced to deal with concurrent task attempts racing to write output, leading to data loss or corruption. For more detail, read the [JIRA description](https://issues.apache.org/jira/browse/SPARK-14468). Before: `OutputCommitCoordinator` is enabled only if speculation is enabled. After: `OutputCommitCoordinator` is always enabled. Users may still disable this through `spark.hadoop.outputCommitCoordination.enabled`, but they really shouldn't... ## How was this patch tested? `OutputCommitCoordinator*Suite` Author: Andrew Or <andrew@databricks.com> Closes #12244 from andrewor14/always-occ.	2016-04-07 17:49:39 -07:00
Dhruve Ashar	033d808152	[SPARK-12384] Enables spark-clients to set the min(-Xms) and max(.memory config) j… ## What changes were proposed in this pull request? Currently Spark clients are started with the same memory setting for Xms and Xms leading to reserving unnecessary higher amounts of memory. This behavior is changed and the clients can now specify an initial heap size using the extraJavaOptions in the config for driver,executor and am individually. Note, that only -Xms can be provided through this config option, if the client wants to set the max size(-Xmx), this has to be done via the .memory configuration knobs which are currently supported. ## How was this patch tested? Monitored executor and yarn logs in debug mode to verify the commands through which they are being launched in client and cluster mode. The driver memory was verified locally using jps -v. Setting up -Xmx parameter in the javaExtraOptions raises exception with the info provided. Author: Dhruve Ashar <dhruveashar@gmail.com> Closes #12115 from dhruve/impr/SPARK-12384.	2016-04-07 10:39:21 -05:00
Alex Bozarth	35e0db2d45	[SPARK-14245][WEB UI] Display the user in the application view ## What changes were proposed in this pull request? The Spark UI (both active and history) should show the user who ran the application somewhere when you are in the application view. This was added under the Jobs view by total uptime and scheduler mode. ## How was this patch tested? Manual testing <img width="191" alt="username" src="https://cloud.githubusercontent.com/assets/13952758/14222830/6d1fe542-f82a-11e5-885f-c05ee2cdf857.png"> Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #12123 from ajbozarth/spark14245.	2016-04-07 09:15:00 -05:00
Marcelo Vanzin	21d5ca128b	[SPARK-14134][CORE] Change the package name used for shading classes. The current package name uses a dash, which is a little weird but seemed to work. That is, until a new test tried to mock a class that references one of those shaded types, and then things started failing. Most changes are just noise to fix the logging configs. For reference, SPARK-8815 also raised this issue, although at the time it did not cause any issues in Spark, so it was not addressed. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11941 from vanzin/SPARK-14134.	2016-04-06 19:33:51 -07:00
Shixiong Zhu	f1def573f4	[SPARK-13112][CORE] Make sure RegisterExecutorResponse arrive before LaunchTask ## What changes were proposed in this pull request? Send `RegisterExecutorResponse` using `executorRef` in order to make sure RegisterExecutorResponse and LaunchTask are both sent using the same channel. Then RegisterExecutorResponse will always arrive before LaunchTask ## How was this patch tested? Existing unit tests Closes #12078 Author: Shixiong Zhu <shixiong@databricks.com> Closes #12211 from zsxwing/SPARK-13112.	2016-04-06 16:18:04 -07:00
Dongjoon Hyun	d717ae1fd7	[SPARK-14444][BUILD] Add a new scalastyle `NoScalaDoc` to prevent ScalaDoc-style multiline comments ## What changes were proposed in this pull request? According to the [Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Indentation), this PR adds a new scalastyle rule to prevent the followings. ``` /** In Spark, we don't use the ScalaDoc style so this * is not correct. */ ``` ## How was this patch tested? Pass the Jenkins tests (including `lint-scala`). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12221 from dongjoon-hyun/SPARK-14444.	2016-04-06 16:02:55 -07:00
Tathagata Das	9af5423ec2	[SPARK-12133][STREAMING] Streaming dynamic allocation ## What changes were proposed in this pull request? Added a new Executor Allocation Manager for the Streaming scheduler for doing Streaming Dynamic Allocation. ## How was this patch tested Unit tests, and cluster tests. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #12154 from tdas/streaming-dynamic-allocation.	2016-04-06 15:46:20 -07:00
Eric Liang	78c1076d04	[SPARK-14252] Executors do not try to download remote cached blocks ## What changes were proposed in this pull request? As mentioned in the ticket this was because one get path in the refactored `BlockManager` did not check for remote storage. ## How was this patch tested? Unit test, also verified manually with reproduction in the ticket. cc JoshRosen Author: Eric Liang <ekl@databricks.com> Closes #12193 from ericl/spark-14252.	2016-04-05 22:37:51 -07:00
Shixiong Zhu	48467f4eb0	[SPARK-14416][CORE] Add thread-safe comments for CoarseGrainedSchedulerBackend's fields ## What changes were proposed in this pull request? While I was reviewing #12078, I found most of CoarseGrainedSchedulerBackend's mutable fields doesn't have any comments about the thread-safe assumptions and it's hard for people to figure out which part of codes should be protected by the lock. This PR just added comments/annotations for them and also added strict access modifiers for some fields. ## How was this patch tested? Existing unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #12188 from zsxwing/comments.	2016-04-05 22:32:37 -07:00
Marcelo Vanzin	d5ee9d5c24	[SPARK-529][SQL] Modify SQLConf to use new config API from core. Because SQL keeps track of all known configs, some customization was needed in SQLConf to allow that, since the core API does not have that feature. Tested via existing (and slightly updated) unit tests. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11570 from vanzin/SPARK-529-sql.	2016-04-05 15:19:51 -07:00
Kousuke Saruta	e4bd504120	[SPARK-14397][WEBUI] <html> and <body> tags are nested in LogPage ## What changes were proposed in this pull request? In `LogPage`, the content to be rendered is defined as follows. ``` val content = <html> <body> {linkToMaster} <div> <div style="float:left; margin-right:10px">{backButton}</div> <div style="float:left;">{range}</div> <div style="float:right; margin-left:10px">{nextButton}</div> </div> <br /> <div style="height:500px; overflow:auto; padding:5px;"> <pre>{logText}</pre> </div> </body> </html> UIUtils.basicSparkPage(content, logType + " log page for " + pageName) ``` As you can see, <html> and <body> tags will be rendered. On the other hand, `UIUtils.basicSparkPage` will render those tags so those tags will be nested. ``` def basicSparkPage( content: => Seq[Node], title: String, useDataTables: Boolean = false): Seq[Node] = { <html> <head> {commonHeaderNodes} {if (useDataTables) dataTablesHeaderNodes else Seq.empty} <title>{title}</title> </head> <body> <div class="container-fluid"> <div class="row-fluid"> <div class="span12"> <h3 style="vertical-align: middle; display: inline-block;"> <a style="text-decoration: none" href={prependBaseUri("/")}> <img src={prependBaseUri("/static/spark-logo-77x50px-hd.png")} /> <span class="version" style="margin-right: 15px;">{org.apache.spark.SPARK_VERSION}</span> </a> {title} </h3> </div> </div> {content} </div> </body> </html> } ``` These are the screen shots before this patch is applied. ![before1](https://cloud.githubusercontent.com/assets/4736016/14273236/03cbed8a-fb44-11e5-8786-bc1bfa4d3f8c.png) ![before2](https://cloud.githubusercontent.com/assets/4736016/14273237/03d1741c-fb44-11e5-9dee-ea93022033a6.png) And these are the ones after this patch is applied. ![after1](https://cloud.githubusercontent.com/assets/4736016/14273248/1b6a7d8a-fb44-11e5-8a3b-69964f3434f6.png) ![after2](https://cloud.githubusercontent.com/assets/4736016/14273249/1b6b9c38-fb44-11e5-9d6f-281d64c842e4.png) The appearance is not changed but the html source code is changed. ## How was this patch tested? Manually run some jobs on my standalone-cluster and check the WebUI. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #12170 from sarutak/SPARK-14397.	2016-04-05 10:51:23 -07:00
Guillaume Poulin	7201f033ce	[SPARK-12425][STREAMING] DStream union optimisation Use PartitionerAwareUnionRDD when possbile for optimizing shuffling and preserving the partitioner. Author: Guillaume Poulin <poulin.guillaume@gmail.com> Closes #10382 from gpoulin/dstream_union_optimisation.	2016-04-05 02:54:38 +01:00
Marcelo Vanzin	24d7d2e453	[SPARK-13579][BUILD] Stop building the main Spark assembly. This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and remove duplicate jars packages in the examples module). I also made some minor adjustments to dependencies to remove some test jars from the final packaging, and remove jars that conflict with each other when packaged separately (e.g. servlet api). Also note that this change restores guava in applications' classpaths, even though it's still shaded inside Spark. This is now needed for the Hadoop libraries that are packaged with Spark, which now are not processed by the shade plugin. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11796 from vanzin/SPARK-13579.	2016-04-04 16:52:22 -07:00
Davies Liu	cc70f17416	[SPARK-14334] [SQL] add toLocalIterator for Dataset/DataFrame ## What changes were proposed in this pull request? RDD.toLocalIterator() could be used to fetch one partition at a time to reduce the memory usage. Right now, for Dataset/Dataframe we have to use df.rdd.toLocalIterator, which is super slow also requires lots of memory (because of the Java serializer or even Kyro serializer). This PR introduce an optimized toLocalIterator for Dataset/DataFrame, which is much faster and requires much less memory. For a partition with 5 millions rows, `df.rdd.toIterator` took about 100 seconds, but df.toIterator took less than 7 seconds. For 10 millions row, rdd.toIterator will crash (not enough memory) with 4G heap, but df.toLocalIterator could finished in 12 seconds. The JDBC server has been updated to use DataFrame.toIterator. ## How was this patch tested? Existing tests. Author: Davies Liu <davies@databricks.com> Closes #12114 from davies/local_iterator.	2016-04-04 13:31:44 -07:00
Reynold Xin	7143904700	[SPARK-14358] Change SparkListener from a trait to an abstract class ## What changes were proposed in this pull request? Scala traits are difficult to maintain binary compatibility on, and as a result we had to introduce JavaSparkListener. In Spark 2.0 we can change SparkListener from a trait to an abstract class and then remove JavaSparkListener. ## How was this patch tested? Updated related unit tests. Author: Reynold Xin <rxin@databricks.com> Closes #12142 from rxin/SPARK-14358.	2016-04-04 13:26:18 -07:00
Reynold Xin	27dad6f658	[SPARK-14364][SPARK] HeartbeatReceiver object should be private ## What changes were proposed in this pull request? It's a mistake that HeartbeatReceiver object was made public in Spark 1.x. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #12148 from rxin/SPARK-14364.	2016-04-04 13:19:34 -07:00
Dongjoon Hyun	3f749f7ed4	[SPARK-14355][BUILD] Fix typos in Exception/Testcase/Comments and static analysis results ## What changes were proposed in this pull request? This PR contains the following 5 types of maintenance fix over 59 files (+94 lines, -93 lines). - Fix typos(exception/log strings, testcase name, comments) in 44 lines. - Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after SPARK-14011) - Use diamond operators in 40 lines. (New codes after SPARK-13702) - Fix redundant semicolon in 5 lines. - Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in CSVInferSchemaSuite.scala. ## How was this patch tested? Manual and pass the Jenkins tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12139 from dongjoon-hyun/SPARK-14355.	2016-04-03 18:14:16 -07:00
Marcin Tustin	9023015f05	[SPARK-14163][CORE] SumEvaluator and countApprox cannot reliably handle RDDs of size 1 ## What changes were proposed in this pull request? This special cases 0 and 1 counts to avoid passing 0 degrees of freedom. ## How was this patch tested? Tests run successfully. New test added. ## Note: This recreates #11982 which was closed to due to non-updated diff. rxin srowen Commented there. This also adds tests, reworks the code to perform the special casing (based on srowen's comments), and adds equality machinery for BoundedDouble, as well as changing how it is transformed to string. Author: Marcin Tustin <mtustin@handybook.com> Author: Marcin Tustin <mtustin@handy.com> Closes #12016 from mtustin-handy/SPARK-14163.	2016-04-03 17:42:33 -07:00
Sital Kedia	1cf7018342	[SPARK-14056] Appends s3 specific configurations and spark.hadoop con… ## What changes were proposed in this pull request? Appends s3 specific configurations and spark.hadoop configurations to hive configuration. ## How was this patch tested? Tested by running a job on cluster. …figurations to hive configuration. Author: Sital Kedia <skedia@fb.com> Closes #11876 from sitalkedia/hiveConf.	2016-04-02 19:17:25 -07:00
Liwei Lin	03d130f973	[SPARK-14342][CORE][DOCS][TESTS] Remove straggler references to Tachyon ## What changes were proposed in this pull request? Straggler references to Tachyon were removed: - for docs, `tachyon` has been generalized as `off-heap memory`; - for Mesos test suits, the key-value `tachyon:true`/`tachyon:false` has been changed to `os:centos`/`os:ubuntu`, since `os` is an example constrain used by the [Mesos official docs](http://mesos.apache.org/documentation/attributes-resources/). ## How was this patch tested? Existing test suites. Author: Liwei Lin <lwlin7@gmail.com> Closes #12129 from lw-lin/tachyon-cleanup.	2016-04-02 17:55:46 -07:00
Dongjoon Hyun	4a6e78abd9	[MINOR][DOCS] Use multi-line JavaDoc comments in Scala code. ## What changes were proposed in this pull request? This PR aims to fix all Scala-Style multiline comments into Java-Style multiline comments in Scala codes. (All comment-only changes over 77 files: +786 lines, −747 lines) ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12130 from dongjoon-hyun/use_multiine_javadoc_comments.	2016-04-02 17:50:40 -07:00
Alex Bozarth	abc6c42c2d	[SPARK-13241][WEB UI] Added long values for dates in ApplicationAttemptInfo API ## What changes were proposed in this pull request? Adding long values for each Date in the ApplicationAttemptInfo API for easier use in code ## How was the this patch tested? Tested with dev/run-tests Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #11326 from ajbozarth/spark13241.	2016-04-01 16:18:09 -07:00
Josh Rosen	e41acb7573	[SPARK-13992] Add support for off-heap caching This patch adds support for caching blocks in the executor processes using direct / off-heap memory. ## User-facing changes Updated semantics of `OFF_HEAP` storage level: In Spark 1.x, the `OFF_HEAP` storage level indicated that an RDD should be cached in Tachyon. Spark 2.x removed the external block store API that Tachyon caching was based on (see #10752 / SPARK-12667), so `OFF_HEAP` became an alias for `MEMORY_ONLY_SER`. As of this patch, `OFF_HEAP` means "serialized and cached in off-heap memory or on disk". Via the `StorageLevel` constructor, `useOffHeap` can be set if `serialized == true` and can be used to construct custom storage levels which support replication. Storage UI reporting: the storage UI will now report whether in-memory blocks are stored on- or off-heap. Only supported by UnifiedMemoryManager: for simplicity, this feature is only supported when the default UnifiedMemoryManager is used; applications which use the legacy memory manager (`spark.memory.useLegacyMode=true`) are not currently able to allocate off-heap storage memory, so using off-heap caching will fail with an error when legacy memory management is enabled. Given that we plan to eventually remove the legacy memory manager, this is not a significant restriction. Memory management policies: the policies for dividing available memory between execution and storage are the same for both on- and off-heap memory. For off-heap memory, the total amount of memory available for use by Spark is controlled by `spark.memory.offHeap.size`, which is an absolute size. Off-heap storage memory obeys `spark.memory.storageFraction` in order to control the amount of unevictable storage memory. For example, if `spark.memory.offHeap.size` is 1 gigabyte and Spark uses the default `storageFraction` of 0.5, then up to 500 megabytes of off-heap cached blocks will be protected from eviction due to execution memory pressure. If necessary, we can split `spark.memory.storageFraction` into separate on- and off-heap configurations, but this doesn't seem necessary now and can be done later without any breaking changes. Use of off-heap memory does not imply use of off-heap execution (or vice-versa): for now, the settings controlling the use of off-heap execution memory (`spark.memory.offHeap.enabled`) and off-heap caching are completely independent, so Spark SQL can be configured to use off-heap memory for execution while continuing to cache blocks on-heap. If desired, we can change this in a followup patch so that `spark.memory.offHeap.enabled` affect the default storage level for cached SQL tables. ## Internal changes - Rename `ByteArrayChunkOutputStream` to `ChunkedByteBufferOutputStream` - It now returns a `ChunkedByteBuffer` instead of an array of byte arrays. - Its constructor now accept an `allocator` function which is called to allocate `ByteBuffer`s. This allows us to control whether it allocates regular ByteBuffers or off-heap DirectByteBuffers. - Because block serialization is now performed during the unroll process, a `ChunkedByteBufferOutputStream` which is configured with a `DirectByteBuffer` allocator will use off-heap memory for both unroll and storage memory. - The `MemoryStore`'s MemoryEntries now tracks whether blocks are stored on- or off-heap. - `evictBlocksToFreeSpace()` now accepts a `MemoryMode` parameter so that we don't try to evict off-heap blocks in response to on-heap memory pressure (or vice-versa). - Make sure that off-heap buffers are properly de-allocated during MemoryStore eviction. - The JVM limits the total size of allocated direct byte buffers using the `-XX:MaxDirectMemorySize` flag and the default tends to be fairly low (< 512 megabytes in some JVMs). To work around this limitation, this patch adds a custom DirectByteBuffer allocator which ignores this memory limit. Author: Josh Rosen <joshrosen@databricks.com> Closes #11805 from JoshRosen/off-heap-caching.	2016-04-01 14:34:59 -07:00
zhonghaihua	bd7b91cefb	[SPARK-12864][YARN] initialize executorIdCounter after ApplicationMaster killed for max n… Currently, when max number of executor failures reached the `maxNumExecutorFailures`, `ApplicationMaster` will be killed and re-register another one.This time, `YarnAllocator` will be created a new instance. But, the value of property `executorIdCounter` in `YarnAllocator` will reset to `0`. Then the Id of new executor will starting from `1`. This will confuse with the executor has already created before, which will cause FetchFailedException. This situation is just in yarn client mode, so this is an issue in yarn client mode. For more details, [link to jira issues SPARK-12864](https://issues.apache.org/jira/browse/SPARK-12864) This PR introduce a mechanism to initialize `executorIdCounter` after `ApplicationMaster` killed. Author: zhonghaihua <793507405@qq.com> Closes #10794 from zhonghaihua/initExecutorIdCounterAfterAMKilled.	2016-04-01 16:23:14 -05:00
Liang-Chi Hsieh	3e991dbc31	[SPARK-13674] [SQL] Add wholestage codegen support to Sample JIRA: https://issues.apache.org/jira/browse/SPARK-13674 ## What changes were proposed in this pull request? Sample operator doesn't support wholestage codegen now. This pr is to add support to it. ## How was this patch tested? A test is added into `BenchmarkWholeStageCodegen`. Besides, all tests should be passed. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #11517 from viirya/add-wholestage-sample.	2016-04-01 14:02:32 -07:00
jerryshao	8ba2b7f28f	[SPARK-12343][YARN] Simplify Yarn client and client argument ## What changes were proposed in this pull request? Currently in Spark on YARN, configurations can be passed through SparkConf, env and command arguments, some parts are duplicated, like client argument and SparkConf. So here propose to simplify the command arguments. ## How was this patch tested? This patch is tested manually with unit test. CC vanzin tgravescs , please help to suggest this proposal. The original purpose of this JIRA is to remove `ClientArguments`, through refactoring some arguments like `--class`, `--arg` are not so easy to replace, so here I remove the most part of command line arguments, only keep the minimal set. Author: jerryshao <sshao@hortonworks.com> Closes #11603 from jerryshao/SPARK-12343.	2016-04-01 10:52:13 -07:00
Davies Liu	f0afafdc5d	[SPARK-14267] [SQL] [PYSPARK] execute multiple Python UDFs within single batch ## What changes were proposed in this pull request? This PR support multiple Python UDFs within single batch, also improve the performance. ```python >>> from pyspark.sql.types import IntegerType >>> sqlContext.registerFunction("double", lambda x: x * 2, IntegerType()) >>> sqlContext.registerFunction("add", lambda x, y: x + y, IntegerType()) >>> sqlContext.sql("SELECT double(add(1, 2)), add(double(2), 1)").explain(True) == Parsed Logical Plan == 'Project [unresolvedalias('double('add(1, 2)), None),unresolvedalias('add('double(2), 1), None)] +- OneRowRelation$ == Analyzed Logical Plan == double(add(1, 2)): int, add(double(2), 1): int Project [double(add(1, 2))#14,add(double(2), 1)#15] +- Project [double(add(1, 2))#14,add(double(2), 1)#15] +- Project [pythonUDF0#16 AS double(add(1, 2))#14,pythonUDF0#18 AS add(double(2), 1)#15] +- EvaluatePython [add(pythonUDF1#17, 1)], [pythonUDF0#18] +- EvaluatePython [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17] +- OneRowRelation$ == Optimized Logical Plan == Project [pythonUDF0#16 AS double(add(1, 2))#14,pythonUDF0#18 AS add(double(2), 1)#15] +- EvaluatePython [add(pythonUDF1#17, 1)], [pythonUDF0#18] +- EvaluatePython [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17] +- OneRowRelation$ == Physical Plan == WholeStageCodegen : +- Project [pythonUDF0#16 AS double(add(1, 2))#14,pythonUDF0#18 AS add(double(2), 1)#15] : +- INPUT +- !BatchPythonEvaluation [add(pythonUDF1#17, 1)], [pythonUDF0#16,pythonUDF1#17,pythonUDF0#18] +- !BatchPythonEvaluation [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17] +- Scan OneRowRelation[] ``` ## How was this patch tested? Added new tests. Using the following script to benchmark 1, 2 and 3 udfs, ``` df = sqlContext.range(1, 1 << 23, 1, 4) double = F.udf(lambda x: x * 2, LongType()) print df.select(double(df.id)).count() print df.select(double(df.id), double(df.id + 1)).count() print df.select(double(df.id), double(df.id + 1), double(df.id + 2)).count() ``` Here is the results: N \| Before \| After \| speed up ---- \|------------ \| -------------\|------ 1 \| 22 s \| 7 s \| 3.1X 2 \| 38 s \| 13 s \| 2.9X 3 \| 58 s \| 16 s \| 3.6X This benchmark ran locally with 4 CPUs. For 3 UDFs, it launched 12 Python before before this patch, 4 process after this patch. After this patch, it will use less memory for multiple UDFs than before (less buffering). Author: Davies Liu <davies@databricks.com> Closes #12057 from davies/multi_udfs.	2016-03-31 16:40:20 -07:00
Jo Voordeckers	10508f36ad	[SPARK-11327][MESOS] Dispatcher does not respect all args from the Submit request Supersedes https://github.com/apache/spark/pull/9752 Author: Jo Voordeckers <jo.voordeckers@gmail.com> Author: Iulian Dragos <jaguarul@gmail.com> Closes #10370 from jayv/mesos_cluster_params.	2016-03-31 12:08:10 -07:00
Wenchen Fan	0abee534f0	[SPARK-14069][SQL] Improve SparkStatusTracker to also track executor information ## What changes were proposed in this pull request? Track executor information like host and port, cache size, running tasks. TODO: tests ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #11888 from cloud-fan/status-tracker.	2016-03-31 12:07:19 -07:00
jeanlyn	8a333d2da8	[SPARK-14243][CORE] update task metrics when removing blocks ## What changes were proposed in this pull request? This PR try to use `incUpdatedBlockStatuses ` to update the `updatedBlockStatuses ` when removing blocks, making sure `BlockManager` correctly updates `updatedBlockStatuses` ## How was this patch tested? test("updated block statuses") in BlockManagerSuite.scala Author: jeanlyn <jeanlyn92@gmail.com> Closes #12091 from jeanlyn/updateBlock.	2016-03-31 12:04:42 -07:00
Nishkam Ravi	ac1b8b302a	[SPARK-13796] Redirect error message to logWarning ## What changes were proposed in this pull request? Redirect error message to logWarning ## How was this patch tested? Unit tests, manual tests JoshRosen Author: Nishkam Ravi <nishkamravi@gmail.com> Closes #12052 from nishkamravi2/master_warning.	2016-03-31 12:03:05 -07:00
tedyu	e1f6845391	[SPARK-12181] Check Cached unaligned-access capability before using Unsafe ## What changes were proposed in this pull request? For MemoryMode.OFF_HEAP, Unsafe.getInt etc. are used with no restriction. However, the Oracle implementation uses these methods only if the class variable unaligned (commented as "Cached unaligned-access capability") is true, which seems to be calculated whether the architecture is i386, x86, amd64, or x86_64. I think we should perform similar check for the use of Unsafe. Reference: https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent0.java#L112 ## How was this patch tested? Unit test suite Author: tedyu <yuzhihong@gmail.com> Closes #11943 from tedyu/master.	2016-03-29 17:16:53 -07:00
Davies Liu	a7a93a116d	[SPARK-14215] [SQL] [PYSPARK] Support chained Python UDFs ## What changes were proposed in this pull request? This PR brings the support for chained Python UDFs, for example ```sql select udf1(udf2(a)) select udf1(udf2(a) + 3) select udf1(udf2(a) + udf3(b)) ``` Also directly chained unary Python UDFs are put in single batch of Python UDFs, others may require multiple batches. For example, ```python >>> sqlContext.sql("select double(double(1))").explain() == Physical Plan == WholeStageCodegen : +- Project [pythonUDF#10 AS double(double(1))#9] : +- INPUT +- !BatchPythonEvaluation double(double(1)), [pythonUDF#10] +- Scan OneRowRelation[] >>> sqlContext.sql("select double(double(1) + double(2))").explain() == Physical Plan == WholeStageCodegen : +- Project [pythonUDF#19 AS double((double(1) + double(2)))#16] : +- INPUT +- !BatchPythonEvaluation double((pythonUDF#17 + pythonUDF#18)), [pythonUDF#17,pythonUDF#18,pythonUDF#19] +- !BatchPythonEvaluation double(2), [pythonUDF#17,pythonUDF#18] +- !BatchPythonEvaluation double(1), [pythonUDF#17] +- Scan OneRowRelation[] ``` TODO: will support multiple unrelated Python UDFs in one batch (another PR). ## How was this patch tested? Added new unit tests for chained UDFs. Author: Davies Liu <davies@databricks.com> Closes #12014 from davies/py_udfs.	2016-03-29 15:06:29 -07:00
Jakob Odersky	d26c42982c	[SPARK-10570][CORE] Add version info to json api Add a new api endpoint `/api/v1/version` to retrieve various version info. This PR only adds support for finding the current spark version, however other version info such as jvm or scala versions can easily be added. Author: Jakob Odersky <jodersky@gmail.com> Closes #10760 from jodersky/SPARK-10570.	2016-03-29 11:10:15 -07:00
Carson Wang	15c0b0006b	[SPARK-14232][WEBUI] Fix event timeline display issue when an executor is removed with a multiple line reason. ## What changes were proposed in this pull request? The event timeline doesn't show on job page if an executor is removed with a multiple line reason. This PR replaces all new line characters in the reason string with spaces. ![timelineerror](https://cloud.githubusercontent.com/assets/9278199/14100211/5fd4cd30-f5be-11e5-9cea-f32651a4cd62.jpg) ## How was this patch tested? Verified on the Web UI. Author: Carson Wang <carson.wang@intel.com> Closes #12029 from carsonwang/eventTimeline.	2016-03-29 11:07:58 -07:00
Sun Rui	d3638d7bff	[SPARK-12792] [SPARKR] Refactor RRDD to support R UDF. ## What changes were proposed in this pull request? Refactor RRDD by separating the common logic interacting with the R worker to a new class RRunner, which can be used to evaluate R UDFs. Now RRDD relies on RRuner for RDD computation and RRDD could be reomved if we want to remove RDD API in SparkR later. ## How was this patch tested? dev/lint-r SparkR unit tests Author: Sun Rui <rui.sun@intel.com> Closes #12024 from sun-rui/SPARK-12792_new.	2016-03-28 21:51:02 -07:00
jerryshao	2bc7c96d61	[SPARK-13447][YARN][CORE] Clean the stale states for AM failure and restart situation ## What changes were proposed in this pull request? This is a follow-up fix of #9963, in #9963 we handle this stale states clean-up work only for dynamic allocation enabled scenario. Here we should also clean the states in `CoarseGrainedSchedulerBackend` for dynamic allocation disabled scenario. Please review, CC andrewor14 lianhuiwang , thanks a lot. ## How was this patch tested? Run the unit test locally, also with integration test manually. Author: jerryshao <sshao@hortonworks.com> Closes #11366 from jerryshao/SPARK-13447.	2016-03-28 17:03:21 -07:00
jeanlyn	ad9e3d50f7	[SPARK-13845][CORE] Using onBlockUpdated to replace onTaskEnd avioding driver OOM ## What changes were proposed in this pull request? We have a streaming job using `FlumePollInputStream` always driver OOM after few days, here is some driver heap dump before OOM ``` num #instances #bytes class name ---------------------------------------------- 1: 13845916 553836640 org.apache.spark.storage.BlockStatus 2: 14020324 336487776 org.apache.spark.storage.StreamBlockId 3: 13883881 333213144 scala.collection.mutable.DefaultEntry 4: 8907 89043952 [Lscala.collection.mutable.HashEntry; 5: 62360 65107352 [B 6: 163368 24453904 [Ljava.lang.Object; 7: 293651 20342664 [C ... ``` `BlockStatus` and `StreamBlockId` keep on growing, and the driver OOM in the end. After investigated, i found the `executorIdToStorageStatus` in `StorageStatusListener` seems never remove the blocks from `StorageStatus`. In order to fix the issue, i try to use `onBlockUpdated` replace `onTaskEnd ` , so we can update the block informations(add blocks, drop the block from memory to disk and delete the blocks) in time. ## How was this patch tested? Existing unit tests and manual tests Author: jeanlyn <jeanlyn92@gmail.com> Closes #11779 from jeanlyn/fix_driver_oom.	2016-03-28 16:56:25 -07:00
Shixiong Zhu	2f98ee67df	[SPARK-14169][CORE] Add UninterruptibleThread ## What changes were proposed in this pull request? Extract the workaround for HADOOP-10622 introduced by #11940 into UninterruptibleThread so that we can test and reuse it. ## How was this patch tested? Unit tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11971 from zsxwing/uninterrupt.	2016-03-28 16:29:11 -07:00
Shixiong Zhu	34c0638ee6	[SPARK-14180][CORE] Fix a deadlock in CoarseGrainedExecutorBackend Shutdown ## What changes were proposed in this pull request? Call `executor.stop` in a new thread to eliminate deadlock. ## How was this patch tested? Existing unit tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #12012 from zsxwing/SPARK-14180.	2016-03-28 16:23:29 -07:00
Davies Liu	d7b58f1461	[SPARK-14052] [SQL] build a BytesToBytesMap directly in HashedRelation ## What changes were proposed in this pull request? Currently, for the key that can not fit within a long, we build a hash map for UnsafeHashedRelation, it's converted to BytesToBytesMap after serialization and deserialization. We should build a BytesToBytesMap directly to have better memory efficiency. In order to do that, BytesToBytesMap should support multiple (K,V) pair with the same K, Location.putNewKey() is renamed to Location.append(), which could append multiple values for the same key (same Location). `Location.newValue()` is added to find the next value for the same key. ## How was this patch tested? Existing tests. Added benchmark for broadcast hash join with duplicated keys. Author: Davies Liu <davies@databricks.com> Closes #11870 from davies/map2.	2016-03-28 13:07:32 -07:00
Davies Liu	e5a1b301fb	Revert "[SPARK-12792] [SPARKR] Refactor RRDD to support R UDF." This reverts commit `40984f6706`.	2016-03-28 10:21:02 -07:00
Sun Rui	40984f6706	[SPARK-12792] [SPARKR] Refactor RRDD to support R UDF. Refactor RRDD by separating the common logic interacting with the R worker to a new class RRunner, which can be used to evaluate R UDFs. Now RRDD relies on RRuner for RDD computation and RRDD could be reomved if we want to remove RDD API in SparkR later. Author: Sun Rui <rui.sun@intel.com> Closes #10947 from sun-rui/SPARK-12792.	2016-03-28 10:14:28 -07:00
Liang-Chi Hsieh	68c0c460bf	[SPARK-13742] [CORE] Add non-iterator interface to RandomSampler JIRA: https://issues.apache.org/jira/browse/SPARK-13742 ## What changes were proposed in this pull request? `RandomSampler.sample` currently accepts iterator as input and output another iterator. This makes it inappropriate to use in wholestage codegen of `Sampler` operator #11517. This change is to add non-iterator interface to `RandomSampler`. This change adds a new method `def sample(): Int` to the trait `RandomSampler`. As we don't need to know the actual values of the sampling items, so this new method takes no arguments. This method will decide whether to sample the next item or not. It returns how many times the next item will be sampled. For `BernoulliSampler` and `BernoulliCellSampler`, the returned sampling times can only be 0 or 1. It simply means whether to sample the next item or not. For `PoissonSampler`, the returned value can be more than 1, meaning the next item will be sampled multiple times. ## How was this patch tested? Tests are added into `RandomSamplerSuite`. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Author: Liang-Chi Hsieh <viirya@appier.com> Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #11578 from viirya/random-sampler-no-iterator.	2016-03-28 09:58:47 -07:00
Josh Rosen	20c0bcd972	[SPARK-14135] Add off-heap storage memory bookkeeping support to MemoryManager This patch extends Spark's `UnifiedMemoryManager` to add bookkeeping support for off-heap storage memory, an requirement for enabling off-heap caching (which will be done by #11805). The `MemoryManager`'s `storageMemoryPool` has been split into separate on- and off-heap pools and the storage and unroll memory allocation methods have been updated to accept a `memoryMode` parameter to specify whether allocations should be performed on- or off-heap. In order to reduce the testing surface, the `StaticMemoryManager` does not support off-heap caching (we plan to eventually remove the `StaticMemoryManager`, so this isn't a significant limitation). Author: Josh Rosen <joshrosen@databricks.com> Closes #11942 from JoshRosen/off-heap-storage-memory-bookkeeping.	2016-03-26 11:03:25 -07:00
Liwei Lin	62a85eb09f	[SPARK-14089][CORE][MLLIB] Remove methods that has been deprecated since 1.1, 1.2, 1.3, 1.4, and 1.5 ## What changes were proposed in this pull request? Removed methods that has been deprecated since 1.1, 1.2, 1.3, 1.4, and 1.5. ## How was this patch tested? - manully checked that no codes in Spark call these methods any more - existing test suits Author: Liwei Lin <lwlin7@gmail.com> Author: proflin <proflin.me@gmail.com> Closes #11910 from lw-lin/remove-deprecates.	2016-03-26 12:41:34 +00:00
Dongjoon Hyun	1808465855	[MINOR] Fix newly added java-lint errors ## What changes were proposed in this pull request? This PR fixes some newly added java-lint errors(unused-imports, line-lengsth). ## How was this patch tested? Pass the Jenkins tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11968 from dongjoon-hyun/SPARK-14167.	2016-03-26 11:55:49 +00:00
Rajesh Balamohan	ff7cc45f52	[SPARK-14091][CORE] Improve performance of SparkContext.getCallSite() Currently SparkContext.getCallSite() makes a call to Utils.getCallSite(). ``` private[spark] def getCallSite(): CallSite = { val callSite = Utils.getCallSite() CallSite( Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm), Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm) ) } ``` However, in some places utils.withDummyCallSite(sc) is invoked to avoid expensive threaddumps within getCallSite(). But Utils.getCallSite() is evaluated earlier causing threaddumps to be computed. This can have severe impact on smaller queries (that finish in 10-20 seconds) having large number of RDDs. Creating this patch for lazy evaluation of getCallSite. No new test cases are added. Following standalone test was tried out manually. Also, built entire spark binary and tried with few SQL queries in TPC-DS and TPC-H in multi node cluster ``` def run(): Unit = { val conf = new SparkConf() val sc = new SparkContext("local[1]", "test-context", conf) val start: Long = System.currentTimeMillis(); val confBroadcast = sc.broadcast(new SerializableConfiguration(new Configuration())) Utils.withDummyCallSite(sc) { //Large tables end up creating 5500 RDDs for(i <- 1 to 5000) { //ignore nulls in RDD as its mainly for testing callSite val testRDD = new HadoopRDD(sc, confBroadcast, None, null, classOf[NullWritable], classOf[Writable], 10) } } val end: Long = System.currentTimeMillis(); println("Time taken : " + (end - start)) } def main(args: Array[String]): Unit = { run } ``` Author: Rajesh Balamohan <rbalamohan@apache.org> Closes #11911 from rajeshbalamohan/SPARK-14091.	2016-03-25 15:09:52 -07:00
Reynold Xin	70a6f0bb57	[SPARK-14149] Log exceptions in tryOrIOException ## What changes were proposed in this pull request? We ran into a problem today debugging some class loading problem during deserialization, and JVM was masking the underlying exception which made it very difficult to debug. We can however log the exceptions using try/catch ourselves in serialization/deserialization. The good thing is that all these methods are already using Utils.tryOrIOException, so we can just put the try catch and logging in a single place. ## How was this patch tested? A logging change with a manual test. Author: Reynold Xin <rxin@databricks.com> Closes #11951 from rxin/SPARK-14149.	2016-03-25 01:17:23 -07:00
Josh Rosen	fdd460f5f4	[SPARK-13980] Incrementally serialize blocks while unrolling them in MemoryStore When a block is persisted in the MemoryStore at a serialized storage level, the current MemoryStore.putIterator() code will unroll the entire iterator as Java objects in memory, then will turn around and serialize an iterator obtained from the unrolled array. This is inefficient and doubles our peak memory requirements. Instead, I think that we should incrementally serialize blocks while unrolling them. A downside to incremental serialization is the fact that we will need to deserialize the partially-unrolled data in case there is not enough space to unroll the block and the block cannot be dropped to disk. However, I'm hoping that the memory efficiency improvements will outweigh any performance losses as a result of extra serialization in that hopefully-rare case. Author: Josh Rosen <joshrosen@databricks.com> Closes #11791 from JoshRosen/serialize-incrementally.	2016-03-24 17:33:21 -07:00
Tejas Patil	01849da080	[SPARK-14110][CORE] PipedRDD to print the command ran on non zero exit ## What changes were proposed in this pull request? In case of failure in subprocess launched in PipedRDD, the failure exception reads “Subprocess exited with status XXX”. Debugging this is not easy for users especially if there are multiple pipe() operations in the Spark application. Changes done: - Changed the exception message when non-zero exit code is seen - If the reader and writer threads see exception, simply logging the command ran. The current model is to propagate the exception "as is" so that upstream Spark logic will take the right action based on what the exception was (eg. for fetch failure, it needs to retry; but for some fatal exception, it will decide to fail the stage / job). So wrapping the exception with a generic exception will not work. Altering the exception message will keep that guarantee but that is ugly (plus not all exceptions might have a constructor for a string message) ## How was this patch tested? - Added a new test case - Ran all existing tests for PipedRDD Author: Tejas Patil <tejasp@fb.com> Closes #11927 from tejasapatil/SPARK-14110-piperdd-failure.	2016-03-24 00:31:13 -07:00
Liwei Lin	de4e48b62b	[SPARK-14025][STREAMING][WEBUI] Fix streaming job descriptions on the event timeline ## What changes were proposed in this pull request? Removed the extra `<a href=...>...</a>` for each streaming job's description on the event timeline. ### [Before] ![before](https://cloud.githubusercontent.com/assets/15843379/13898653/0a6c1838-ee13-11e5-9761-14bb7b114c13.png) ### [After] ![after](https://cloud.githubusercontent.com/assets/15843379/13898650/012b8808-ee13-11e5-92a6-64aff0799c83.png) ## How was this patch tested? test suits, manual checks (see screenshots above) Author: Liwei Lin <proflin.me@gmail.com> Author: proflin <proflin.me@gmail.com> Closes #11845 from lw-lin/description-event-line.	2016-03-23 15:15:55 -07:00
Ernest	48ee16d801	[SPARK-14055] writeLocksByTask need to be update when removeBlock ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-14055 ## How was this patch tested? manual tests by running LiveJournalPageRank on a large dataset ( the dataset must larger enough to incure RDD partition eviction). Author: Ernest <earneyzxl@gmail.com> Closes #11875 from Earne/issue-14055.	2016-03-23 10:29:36 -07:00
Josh Rosen	3de24ae2ed	[SPARK-14075] Refactor MemoryStore to be testable independent of BlockManager This patch refactors the `MemoryStore` so that it can be tested without needing to construct / mock an entire `BlockManager`. - The block manager's serialization- and compression-related methods have been moved from `BlockManager` to `SerializerManager`. - `BlockInfoManager `is now passed directly to classes that need it, rather than being passed via the `BlockManager`. - The `MemoryStore` now calls `dropFromMemory` via a new `BlockEvictionHandler` interface rather than directly calling the `BlockManager`. This change helps to enforce a narrow interface between the `MemoryStore` and `BlockManager` functionality and makes this interface easier to mock in tests. - Several of the block unrolling tests have been moved from `BlockManagerSuite` into a new `MemoryStoreSuite`. Author: Josh Rosen <joshrosen@databricks.com> Closes #11899 from JoshRosen/reduce-memorystore-blockmanager-coupling.	2016-03-23 10:15:23 -07:00
Kazuaki Ishizaki	0d51b60443	[SPARK-14072][CORE] Show JVM/OS version information when we run a benchmark program ## What changes were proposed in this pull request? This PR allows us to identify what JVM is used when someone ran a benchmark program. In some cases, a JVM version may affect performance result. Thus, it would be good to show processor information and JVM version information. ``` model name : Intel(R) Xeon(R) CPU E5-2697 v2 2.70GHz JVM information : OpenJDK 64-Bit Server VM, 1.7.0_65-mockbuild_2014_07_14_06_19-b00 Int and String Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------- SQL Parquet Vectorized 981 / 994 10.7 93.5 1.0X SQL Parquet MR 2518 / 2542 4.2 240.1 0.4X ``` ``` model name : Intel(R) Xeon(R) CPU E5-2697 v2 2.70GHz JVM information : IBM J9 VM, pxa6480sr2-20151023_01 (SR2) String Dictionary: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------- SQL Parquet Vectorized 693 / 740 15.1 66.1 1.0X SQL Parquet MR 2501 / 2562 4.2 238.5 0.3X ``` ## How was this patch tested? Tested by using existing benchmark programs (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #11893 from kiszk/SPARK-14072.	2016-03-22 21:01:52 -07:00
Josh Rosen	b5f1ab701a	[SPARK-13990] Automatically pick serializer when caching RDDs Building on the `SerializerManager` introduced in SPARK-13926/ #11755, this patch Spark modifies Spark's BlockManager to use RDD's ClassTags in order to select the best serializer to use when caching RDD blocks. When storing a local block, the BlockManager `put()` methods use implicits to record ClassTags and stores those tags in the blocks' BlockInfo records. When reading a local block, the stored ClassTag is used to pick the appropriate serializer. When a block is stored with replication, the class tag is written into the block transfer metadata and will also be stored in the remote BlockManager. There are two or three places where we don't properly pass ClassTags, including TorrentBroadcast and BlockRDD. I think this happens to work because the missing ClassTag always happens to be `ClassTag.Any`, but it might be worth looking more carefully at those places to see whether we should be more explicit. Author: Josh Rosen <joshrosen@databricks.com> Closes #11801 from JoshRosen/pick-best-serializer-for-caching.	2016-03-21 17:19:39 -07:00
Davies Liu	9b4e15ba13	[SPARK-14007] [SQL] Manage the memory used by hash map in shuffled hash join ## What changes were proposed in this pull request? This PR try acquire the memory for hash map in shuffled hash join, fail the task if there is no enough memory (otherwise it could OOM the executor). It also removed unused HashedRelation. ## How was this patch tested? Existing unit tests. Manual tests with TPCDS Q78. Author: Davies Liu <davies@databricks.com> Closes #11826 from davies/cleanup_hash2.	2016-03-21 11:21:39 -07:00
Dongjoon Hyun	df61fbd978	[SPARK-13986][CORE][MLLIB] Remove `DeveloperApi`-annotations for non-publics ## What changes were proposed in this pull request? Spark uses `DeveloperApi` annotation, but sometimes it seems to conflict with visibility. This PR tries to fix those conflict by removing annotations for non-publics. The following is the example. JobResult.scala ```scala DeveloperApi sealed trait JobResult DeveloperApi case object JobSucceeded extends JobResult -DeveloperApi private[spark] case class JobFailed(exception: Exception) extends JobResult ``` ## How was this patch tested? Pass the existing Jenkins test. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11797 from dongjoon-hyun/SPARK-13986.	2016-03-21 14:57:52 +00:00
Dongjoon Hyun	761c2d1b6e	[MINOR][DOCS] Add proper periods and spaces for CLI help messages and `config` doc. ## What changes were proposed in this pull request? This PR adds some proper periods and spaces to Spark CLI help messages and SQL/YARN conf docs for consistency. ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11848 from dongjoon-hyun/add_proper_period_and_space.	2016-03-21 08:00:09 +00:00
Dongjoon Hyun	20fd254101	[SPARK-14011][CORE][SQL] Enable `LineLength` Java checkstyle rule ## What changes were proposed in this pull request? [Spark Coding Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) has 100-character limit on lines, but it's disabled for Java since 11/09/15. This PR enables LineLength checkstyle again. To help that, this also introduces RedundantImport and RedundantModifier, too. The following is the diff on `checkstyle.xml`. ```xml - <!-- TODO: 11/09/15 disabled - the lengths are currently > 100 in many places --> - <!-- <module name="LineLength"> <property name="max" value="100"/> <property name="ignorePattern" value="^package.\|^import.\|a href\|href\|http://\|https://\|ftp://"/> </module> - --> <module name="NoLineWrap"/> <module name="EmptyBlock"> <property name="option" value="TEXT"/> -167,5 +164,7 </module> <module name="CommentsIndentation"/> <module name="UnusedImports"/> + <module name="RedundantImport"/> + <module name="RedundantModifier"/> ``` ## How was this patch tested? Currently, `lint-java` is disabled in Jenkins. It needs a manual test. After passing the Jenkins tests, `dev/lint-java` should passes locally. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11831 from dongjoon-hyun/SPARK-14011.	2016-03-21 07:58:57 +00:00
Sital Kedia	2e0c5284fd	[SPARK-13958] Executor OOM due to unbounded growth of pointer array in… ## What changes were proposed in this pull request? This change fixes the executor OOM which was recently introduced in PR apache/spark#11095 (Please fill in changes proposed in this fix) ## How was this patch tested? Tested by running a spark job on the cluster. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) … Sorter Author: Sital Kedia <skedia@fb.com> Closes #11794 from sitalkedia/SPARK-13958.	2016-03-18 12:56:06 -07:00
jerryshao	3537782168	[SPARK-13885][YARN] Fix attempt id regression for Spark running on Yarn ## What changes were proposed in this pull request? This regression is introduced in #9182, previously attempt id is simply as counter "1" or "2". With the change of #9182, it is changed to full name as "appattemtp-xxx-00001", this will affect all the parts which uses this attempt id, like event log file name, history server app url link. So here change it back to the counter to keep consistent with previous code. Also revert back this patch #11518, this patch fix the url link of history log according to the new way of attempt id, since here we change back to the previous way, so this patch is not necessary, here to revert it. Also clean "spark.yarn.app.id" and "spark.yarn.app.attemptId", since it is useless now. ## How was this patch tested? Test it with unit test and manually test different scenario: 1. application running in yarn-client mode. 2. application running in yarn-cluster mode. 3. application running in yarn-cluster mode with multiple attempts. Checked both the event log file name and url link. CC vanzin tgravescs , please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #11721 from jerryshao/SPARK-13885.	2016-03-18 12:39:49 -07:00
Josh Rosen	6c2d894a2f	[SPARK-13921] Store serialized blocks as multiple chunks in MemoryStore This patch modifies the BlockManager, MemoryStore, and several other storage components so that serialized cached blocks are stored as multiple small chunks rather than as a single contiguous ByteBuffer. This change will help to improve the efficiency of memory allocation and the accuracy of memory accounting when serializing blocks. Our current serialization code uses a ByteBufferOutputStream, which doubles and re-allocates its backing byte array; this increases the peak memory requirements during serialization (since we need to hold extra memory while expanding the array). In addition, we currently don't account for the extra wasted space at the end of the ByteBuffer's backing array, so a 129 megabyte serialized block may actually consume 256 megabytes of memory. After switching to storing blocks in multiple chunks, we'll be able to efficiently trim the backing buffers so that no space is wasted. This change is also a prerequisite to being able to cache blocks which are larger than 2GB (although full support for that depends on several other changes which have not bee implemented yet). Author: Josh Rosen <joshrosen@databricks.com> Closes #11748 from JoshRosen/chunked-block-serialization.	2016-03-17 20:00:56 -07:00
Shixiong Zhu	65b75e66e8	[SPARK-13776][WEBUI] Limit the max number of acceptors and selectors for Jetty ## What changes were proposed in this pull request? As each acceptor/selector in Jetty will use one thread, the number of threads should at least be the number of acceptors and selectors plus 1. Otherwise, the thread pool of Jetty server may be exhausted by acceptors/selectors and not be able to response any request. To avoid wasting threads, the PR limits the max number of acceptors and selectors and also updates the max thread number if necessary. ## How was this patch tested? Just make sure we don't break any existing tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11615 from zsxwing/SPARK-13776.	2016-03-17 13:05:29 +00:00
Wenchen Fan	8ef3399aff	[SPARK-13928] Move org.apache.spark.Logging into org.apache.spark.internal.Logging ## What changes were proposed in this pull request? Logging was made private in Spark 2.0. If we move it, then users would be able to create a Logging trait themselves to avoid changing their own code. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #11764 from cloud-fan/logger.	2016-03-17 19:23:38 +08:00
trueyao	ea9ca6f04c	[SPARK-13901][CORE] correct the logDebug information when jump to the next locality level JIRA Issue:https://issues.apache.org/jira/browse/SPARK-13901 In getAllowedLocalityLevel method of TaskSetManager,we get wrong logDebug information when jump to the next locality level.So we should fix it. Author: trueyao <501663994@qq.com> Closes #11719 from trueyao/logDebug-localityWait.	2016-03-17 09:45:06 +00:00
Josh Rosen	de1a84e56e	[SPARK-13926] Automatically use Kryo serializer when shuffling RDDs with simple types Because ClassTags are available when constructing ShuffledRDD we can use them to automatically use Kryo for shuffle serialization when the RDD's types are known to be compatible with Kryo. This patch introduces `SerializerManager`, a component which picks the "best" serializer for a shuffle given the elements' ClassTags. It will automatically pick a Kryo serializer for ShuffledRDDs whose key, value, and/or combiner types are primitives, arrays of primitives, or strings. In the future we can use this class as a narrow extension point to integrate specialized serializers for other types, such as ByteBuffers. In a planned followup patch, I will extend the BlockManager APIs so that we're able to use similar automatic serializer selection when caching RDDs (this is a little trickier because the ClassTags need to be threaded through many more places). Author: Josh Rosen <joshrosen@databricks.com> Closes #11755 from JoshRosen/automatically-pick-best-serializer.	2016-03-16 22:52:55 -07:00
Wesley Tang	5f6bdf97c5	[SPARK-13281][CORE] Switch broadcast of RDD to exception from warning ## What changes were proposed in this pull request? In SparkContext, throw Illegalargumentexception when trying to broadcast rdd directly, instead of logging the warning. ## How was this patch tested? mvn clean install Add UT in BroadcastSuite Author: Wesley Tang <tangmingjun@mininglamp.com> Closes #11735 from breakdawn/master.	2016-03-16 16:12:17 +00:00
Tejas Patil	1d95fb6785	[SPARK-13793][CORE] PipedRDD doesn't propagate exceptions while reading parent RDD ## What changes were proposed in this pull request? PipedRDD creates a child thread to read output of the parent stage and feed it to the pipe process. Used a variable to save the exception thrown in the child thread and then propagating the exception in the main thread if the variable was set. ## How was this patch tested? - Added a unit test - Ran all the existing tests in PipedRDDSuite and they all pass with the change - Tested the patch with a real pipe() job, bounced the executor node which ran the parent stage to simulate a fetch failure and observed that the parent stage was re-ran. Author: Tejas Patil <tejasp@fb.com> Closes #11628 from tejasapatil/pipe_rdd.	2016-03-16 09:58:53 +00:00
GayathriMurali	56d88247f1	[SPARK-13396] Stop using our internal deprecated .metrics on Exceptio… JIRA: https://issues.apache.org/jira/browse/SPARK-13396 Stop using our internal deprecated .metrics on ExceptionFailure instead use accumUpdates Author: GayathriMurali <gayathri.m.softie@gmail.com> Closes #11544 from GayathriMurali/SPARK-13396.	2016-03-16 09:39:41 +00:00
Sean Owen	3b461d9ecd	[SPARK-13823][SPARK-13397][SPARK-13395][CORE] More warnings, StandardCharset follow up ## What changes were proposed in this pull request? Follow up to https://github.com/apache/spark/pull/11657 - Also update `String.getBytes("UTF-8")` to use `StandardCharsets.UTF_8` - And fix one last new Coverity warning that turned up (use of unguarded `wait()` replaced by simpler/more robust `java.util.concurrent` classes in tests) - And while we're here cleaning up Coverity warnings, just fix about 15 more build warnings ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11725 from srowen/SPARK-13823.2.	2016-03-16 09:36:34 +00:00
Yonathan Randolph	05ab2948ab	[SPARK-13906] Ensure that there are at least 2 dispatcher threads. ## What changes were proposed in this pull request? Force at least two dispatcher-event-loop threads. Since SparkDeploySchedulerBackend (in AppClient) calls askWithRetry to CoarseGrainedScheduler in the same process, there the driver needs at least two dispatcher threads to prevent the dispatcher thread from hanging. ## How was this patch tested? Manual. Author: Yonathan Randolph <yonathangmail.com> Author: Yonathan Randolph <yonathan@liftigniter.com> Closes #11728 from yonran/SPARK-13906.	2016-03-16 09:34:04 +00:00
Marcelo Vanzin	41eaabf593	[SPARK-13626][CORE] Revert change to SparkConf's constructor. It shouldn't be private. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11734 from vanzin/SPARK-13626-api.	2016-03-15 14:51:25 -07:00
CodingCat	dddf2f2d87	[MINOR] a minor fix for the comments of a method in RPC Dispatcher ## What changes were proposed in this pull request? a minor fix for the comments of a method in RPC Dispatcher ## How was this patch tested? existing unit tests Author: CodingCat <zhunansjtu@gmail.com> Closes #11738 from CodingCat/minor_rpc.	2016-03-15 14:46:21 -07:00
CodingCat	bd5365bbe9	[SPARK-13803] restore the changes in SPARK-3411 ## What changes were proposed in this pull request? This patch contains the functionality to balance the load of the cluster-mode drivers among workers This patch restores the changes in https://github.com/apache/spark/pull/1106 which was erased due to the merging of https://github.com/apache/spark/pull/731 ## How was this patch tested? test with existing test cases Author: CodingCat <zhunansjtu@gmail.com> Closes #11702 from CodingCat/SPARK-13803.	2016-03-15 10:10:23 +00:00
Marcelo Vanzin	8301fadd8d	[SPARK-13626][CORE] Avoid duplicate config deprecation warnings. Three different things were needed to get rid of spurious warnings: - silence deprecation warnings when cloning configuration - change the way SparkHadoopUtil instantiates SparkConf to silence warnings - avoid creating new SparkConf instances where it's not needed. On top of that, I changed the way that Logging.scala detects the repl; now it uses a method that is overridden in the repl's Main class, and the hack in Utils.scala is not needed anymore. This makes the 2.11 repl behave like the 2.10 one and set the default log level to WARN, which is a lot better. Previously, this wasn't working because the 2.11 repl triggers log initialization earlier than the 2.10 one. I also removed and simplified some other code in the 2.11 repl's Main to avoid replicating logic that already exists elsewhere in Spark. Tested the 2.11 repl in local and yarn modes. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11510 from vanzin/SPARK-13626.	2016-03-14 14:27:33 -07:00
Josh Rosen	38529d8f23	[SPARK-10907][SPARK-6157] Remove pendingUnrollMemory from MemoryStore This patch refactors the MemoryStore to remove the concept of `pendingUnrollMemory`. It also fixes fixes SPARK-6157: "Unrolling with MEMORY_AND_DISK should always release memory". Key changes: - Inline `MemoryStore.tryToPut` at its three call sites in the `MemoryStore`. - Inline `Memory.unrollSafely` at its only call site (in `MemoryStore.putIterator`). - Inline `MemoryManager.acquireStorageMemory` at its call sites. - Simplify the code as a result of this inlining (some parameters have fixed values after inlining, so lots of branches can be removed). - Remove the `pendingUnrollMemory` map by returning the amount of unrollMemory allocated when returning an iterator after a failed `putIterator` call. - Change `putIterator` to return an instance of `PartiallyUnrolledIterator`, a special iterator subclass which will automatically free the unroll memory of its partially-unrolled elements when the iterator is consumed. To handle cases where the iterator is not consumed (e.g. when a MEMORY_ONLY put fails), `PartiallyUnrolledIterator` exposes a `close()` method which may be called to discard the unrolled values and free their memory. Author: Josh Rosen <joshrosen@databricks.com> Closes #11613 from JoshRosen/cleanup-unroll-memory.	2016-03-14 14:26:39 -07:00
Thomas Graves	23385e853e	[SPARK-13054] Always post TaskEnd event for tasks I am using dynamic container allocation and speculation and am seeing issues with the active task accounting. The Executor UI still shows active tasks on the an executor but the job/stage is all completed. I think its also affecting the dynamic allocation being able to release containers because it thinks there are still tasks. There are multiple issues with this: - If the task end for tasks (in this case probably because of speculation) comes in after the stage is finished, then the DAGScheduler.handleTaskCompletion will skip the task completion event Author: Thomas Graves <tgraves@prevailsail.corp.gq1.yahoo.com> Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Author: Tom Graves <tgraves@yahoo-inc.com> Closes #10951 from tgravescs/SPARK-11701.	2016-03-14 12:31:46 -07:00
Bertrand Bossy	310981d49a	[SPARK-12583][MESOS] Mesos shuffle service: Don't delete shuffle files before application has stopped ## Problem description: Mesos shuffle service is completely unusable since Spark 1.6.0 . The problem seems to occur since the move from akka to netty in the networking layer. Until now, a connection from the driver to each shuffle service was used as a signal for the shuffle service to determine, whether the driver is still running. Since 1.6.0, this connection is closed after spark.shuffle.io.connectionTimeout (or spark.network.timeout if the former is not set) due to it being idle. The shuffle service interprets this as a signal that the driver has stopped, despite the driver still being alive. Thus, shuffle files are deleted before the application has stopped. ### Context and analysis: spark shuffle fails with mesos after 2mins: https://issues.apache.org/jira/browse/SPARK-12583 External shuffle service broken w/ Mesos: https://issues.apache.org/jira/browse/SPARK-13159 This is a follow up on #11207 . ## What changes were proposed in this pull request? This PR adds a heartbeat signal from the Driver (in MesosExternalShuffleClient) to all registered external mesos shuffle service instances. In MesosExternalShuffleBlockHandler, a thread periodically checks whether a driver has timed out and cleans an application's shuffle files if this is the case. ## How was the this patch tested? This patch has been tested on a small mesos test cluster using the spark-shell. Log output from mesos shuffle service: ``` 16/02/19 15:13:45 INFO mesos.MesosExternalShuffleBlockHandler: Received registration request from app 294def07-3249-4e0f-8d71-bf8c83c58a50-0018 (remote address /xxx.xxx.xxx.xxx:52391, heartbeat timeout 120000 ms). 16/02/19 15:13:47 INFO shuffle.ExternalShuffleBlockResolver: Registered executor AppExecId{appId=294def07-3249-4e0f-8d71-bf8c83c58a50-0018, execId=3} with ExecutorShuffleInfo{localDirs=[/foo/blockmgr-c84c0697-a3f9-4f61-9c64-4d3ee227c047], subDirsPerLocalDir=64, shuffleManager=sort} 16/02/19 15:13:47 INFO shuffle.ExternalShuffleBlockResolver: Registered executor AppExecId{appId=294def07-3249-4e0f-8d71-bf8c83c58a50-0018, execId=7} with ExecutorShuffleInfo{localDirs=[/foo/blockmgr-bf46497a-de80-47b9-88f9-563123b59e03], subDirsPerLocalDir=64, shuffleManager=sort} 16/02/19 15:16:02 INFO mesos.MesosExternalShuffleBlockHandler: Application 294def07-3249-4e0f-8d71-bf8c83c58a50-0018 timed out. Removing shuffle files. 16/02/19 15:16:02 INFO shuffle.ExternalShuffleBlockResolver: Application 294def07-3249-4e0f-8d71-bf8c83c58a50-0018 removed, cleanupLocalDirs = true 16/02/19 15:16:02 INFO shuffle.ExternalShuffleBlockResolver: Cleaning up executor AppExecId{appId=294def07-3249-4e0f-8d71-bf8c83c58a50-0018, execId=3}'s 1 local dirs 16/02/19 15:16:02 INFO shuffle.ExternalShuffleBlockResolver: Cleaning up executor AppExecId{appId=294def07-3249-4e0f-8d71-bf8c83c58a50-0018, execId=7}'s 1 local dirs ``` Note: there are 2 executors running on this slave. Author: Bertrand Bossy <bertrand.bossy@teralytics.net> Closes #11272 from bbossy/SPARK-12583-mesos-shuffle-service-heartbeat.	2016-03-14 12:22:57 -07:00
Josh Rosen	07cb323e7a	[SPARK-13848][SPARK-5185] Update to Py4J 0.9.2 in order to fix classloading issue This patch upgrades Py4J from 0.9.1 to 0.9.2 in order to include a patch which modifies Py4J to use the current thread's ContextClassLoader when performing reflection / class loading. This is necessary in order to fix [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185), a longstanding issue affecting the use of `--jars` and `--packages` in PySpark. In order to demonstrate that the fix works, I removed the workarounds which were added as part of [SPARK-6027](https://issues.apache.org/jira/browse/SPARK-6027) / #4779 and other patches. Py4J diff: https://github.com/bartdag/py4j/compare/0.9.1...0.9.2 /cc zsxwing tdas davies brkyvz Author: Josh Rosen <joshrosen@databricks.com> Closes #11687 from JoshRosen/py4j-0.9.2.	2016-03-14 12:22:02 -07:00
Josh Rosen	9a87afd7d1	[SPARK-13833] Guard against race condition when re-caching disk blocks in memory When reading data from the DiskStore and attempting to cache it back into the memory store, we should guard against race conditions where multiple readers are attempting to re-cache the same block in memory. This patch accomplishes this by synchronizing on the block's `BlockInfo` object while trying to re-cache a block. (Will file JIRA as soon as ASF JIRA stops being down / laggy). Author: Josh Rosen <joshrosen@databricks.com> Closes #11660 from JoshRosen/concurrent-recaching-fixes.	2016-03-14 10:48:24 -07:00
Dongjoon Hyun	acdf219703	[MINOR][DOCS] Fix more typos in comments/strings. ## What changes were proposed in this pull request? This PR fixes 135 typos over 107 files: * 121 typos in comments * 11 typos in testcase name * 3 typos in log messages ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11689 from dongjoon-hyun/fix_more_typos.	2016-03-14 09:07:39 +00:00
Sean Owen	1840852841	[SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> byte[] conversions (and remaining Coverity items) ## What changes were proposed in this pull request? - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8 - Same for `InputStreamReader` and `OutputStreamWriter` constructors - Standardizes on UTF-8 everywhere - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`) - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit `1deecd8d9c` ) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11657 from srowen/SPARK-13823.	2016-03-13 21:03:49 -07:00
Bjorn Jonsson	515e4afbc7	[SPARK-13810][CORE] Add Port Configuration Suggestions on Bind Exceptions ## What changes were proposed in this pull request? Currently, when a java.net.BindException is thrown, it displays the following message: java.net.BindException: Address already in use: Service '$serviceName' failed after 16 retries! This change adds port configuration suggestions to the BindException, for example, for the UI, it now displays java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries! Consider explicitly setting the appropriate port for 'SparkUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries. ## How was this patch tested? Manual tests Author: Bjorn Jonsson <bjornjon@gmail.com> Closes #11644 from bjornjon/master.	2016-03-13 10:18:24 +00:00
Davies Liu	2ef4c5963b	[SPARK-13830] prefer block manager than direct result for large result ## What changes were proposed in this pull request? The current RPC can't handle large blocks very well, it's very slow to fetch 100M block (about 1 minute). Once switch to block manager to fetch that, it took about 10 seconds (still could be improved). ## How was this patch tested? existing unit tests. Author: Davies Liu <davies@databricks.com> Closes #11659 from davies/direct_result.	2016-03-11 15:39:21 -08:00
Nezih Yigitbasi	ff776b2fc1	[SPARK-13328][CORE] Poor read performance for broadcast variables with dynamic resource allocation When dynamic resource allocation is enabled fetching broadcast variables from removed executors were causing job failures and SPARK-9591 fixed this problem by trying all locations of a block before giving up. However, the locations of a block is retrieved only once from the driver in this process and the locations in this list can be stale due to dynamic resource allocation. This situation gets worse when running on a large cluster as the size of this location list can be in the order of several hundreds out of which there may be tens of stale entries. What we have observed is with the default settings of 3 max retries and 5s between retries (that's 15s per location) the time it takes to read a broadcast variable can be as high as ~17m (70 failed attempts * 15s/attempt) Author: Nezih Yigitbasi <nyigitbasi@netflix.com> Closes #11241 from nezihyigitbasi/SPARK-13328.	2016-03-11 11:11:53 -08:00
Marcelo Vanzin	07f1c54477	[SPARK-13577][YARN] Allow Spark jar to be multiple jars, archive. In preparation for the demise of assemblies, this change allows the YARN backend to use multiple jars and globs as the "Spark jar". The config option has been renamed to "spark.yarn.jars" to reflect that. A second option "spark.yarn.archive" was also added; if set, this takes precedence and uploads an archive expected to contain the jar files with the Spark code and its dependencies. Existing deployments should keep working, mostly. This change drops support for the "SPARK_JAR" environment variable, and also does not fall back to using "jarOfClass" if no configuration is set, falling back to finding files under SPARK_HOME instead. This should be fine since "jarOfClass" probably wouldn't work unless you were using spark-submit anyway. Tested with the unit tests, and trying the different config options on a YARN cluster. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11500 from vanzin/SPARK-13577.	2016-03-11 07:54:57 -06:00
Marcelo Vanzin	e33bc67c8f	[MINOR][CORE] Fix a duplicate "and" in a log message. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11642 from vanzin/spark-conf-typo.	2016-03-10 22:15:30 -08:00
Shixiong Zhu	27fe6bacc5	[SPARK-13604][CORE] Sync worker's state after registering with master ## What changes were proposed in this pull request? Here lists all cases that Master cannot talk with Worker for a while and then network is back. 1. Master doesn't know the network issue (not yet timeout) a. Worker doesn't know the network issue (onDisconnected is not called) - Worker keeps sending Heartbeat. Both Worker and Master don't know the network issue. Nothing to do. (Finally, Master will notice the heartbeat timeout if network is not recovered) b. Worker knows the network issue (onDisconnected is called) - Worker stops sending Heartbeat and sends `RegisterWorker` to master. Master will reply `RegisterWorkerFailed("Duplicate worker ID")`. Worker calls "System.exit(1)" (Finally, Master will notice the heartbeat timeout if network is not recovered) (May leak driver processes. See [SPARK-13602](https://issues.apache.org/jira/browse/SPARK-13602)) 2. Worker timeout (Master knows the network issue). In such case, master removes Worker and its executors and drivers. a. Worker doesn't know the network issue (onDisconnected is not called) - Worker keeps sending Heartbeat. - If the network is back, say Master receives Heartbeat, Master sends `ReconnectWorker` to Worker - Worker send `RegisterWorker` to master. - Master accepts `RegisterWorker` but doesn't know executors and drivers in Worker. (may leak executors) b. Worker knows the network issue (onDisconnected is called) - Worker stop sending `Heartbeat`. Worker will send "RegisterWorker" to master. - Master accepts `RegisterWorker` but doesn't know executors and drivers in Worker. (may leak executors) This PR fixes executors and drivers leak in 2.a and 2.b when Worker reregisters with Master. The approach is making Worker send `WorkerLatestState` to sync the state after registering with master successfully. Then Master will ask Worker to kill unknown executors and drivers. Note: Worker cannot just kill executors after registering with master because in the worker, `LaunchExecutor` and `RegisteredWorker` are processed in two threads. If `LaunchExecutor` happens before `RegisteredWorker`, Worker's executor list will contain new executors after Master accepts `RegisterWorker`. We should not kill these executors. So sending the list to Master and let Master tell Worker which executors should be killed. ## How was this patch tested? test("SPARK-13604: Master should ask Worker kill unknown executors and drivers") Author: Shixiong Zhu <shixiong@databricks.com> Closes #11455 from zsxwing/orphan-executors.	2016-03-10 16:59:14 -08:00
Dongjoon Hyun	91fed8e9c5	[SPARK-3854][BUILD] Scala style: require spaces before `{`. ## What changes were proposed in this pull request? Since the opening curly brace, '{', has many usages as discussed in [SPARK-3854](https://issues.apache.org/jira/browse/SPARK-3854), this PR adds a ScalaStyle rule to prevent '){' pattern for the following majority pattern and fixes the code accordingly. If we enforce this in ScalaStyle from now, it will improve the Scala code quality and reduce review time. ``` // Correct: if (true) { println("Wow!") } // Incorrect: if (true){ println("Wow!") } ``` IntelliJ also shows new warnings based on this. ## How was this patch tested? Pass the Jenkins ScalaStyle test. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11637 from dongjoon-hyun/SPARK-3854.	2016-03-10 15:57:22 -08:00
Josh Rosen	81d48532d9	[SPARK-13696] Remove BlockStore class & simplify interfaces of mem. & disk stores Today, both the MemoryStore and DiskStore implement a common `BlockStore` API, but I feel that this API is inappropriate because it abstracts away important distinctions between the behavior of these two stores. For instance, the disk store doesn't have a notion of storing deserialized objects, so it's confusing for it to expose object-based APIs like putIterator() and getValues() instead of only exposing binary APIs and pushing the responsibilities of serialization and deserialization to the client. Similarly, the DiskStore put() methods accepted a `StorageLevel` parameter even though the disk store can only store blocks in one form. As part of a larger BlockManager interface cleanup, this patch remove the BlockStore interface and refines the MemoryStore and DiskStore interfaces to reflect more narrow sets of responsibilities for those components. Some of the benefits of this interface cleanup are reflected in simplifications to several unit tests to eliminate now-unnecessary mocking, significant simplification of the BlockManager's `getLocal()` and `doPut()` methods, and a narrower API between the MemoryStore and DiskStore. Author: Josh Rosen <joshrosen@databricks.com> Closes #11534 from JoshRosen/remove-blockstore-interface.	2016-03-10 15:08:41 -08:00
bomeng	235f4ac6fc	[SPARK-13727][CORE] SparkConf.contains does not consider deprecated keys The contains() method does not return consistently with get() if the key is deprecated. For example, import org.apache.spark.SparkConf val conf = new SparkConf() conf.set("spark.io.compression.lz4.block.size", "12345") # display some deprecated warning message conf.get("spark.io.compression.lz4.block.size") # return 12345 conf.get("spark.io.compression.lz4.blockSize") # return 12345 conf.contains("spark.io.compression.lz4.block.size") # return true conf.contains("spark.io.compression.lz4.blockSize") # return false The fix will make the contains() and get() more consistent. I've added a test case for this. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Unit tests should be sufficient. Author: bomeng <bmeng@us.ibm.com> Closes #11568 from bomeng/SPARK-13727.	2016-03-10 11:17:40 -08:00
mwws	74267beb35	[SPARK-13758][STREAMING][CORE] enhance exception message to avoid misleading We have a recoverable Spark streaming job with checkpoint enabled, it could be executed correctly at first time, but throw following exception when restarted and recovered from checkpoint. ``` org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063. at org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$sc(RDD.scala:87) at org.apache.spark.rdd.RDD.withScope(RDD.scala:352) at org.apache.spark.rdd.RDD.union(RDD.scala:565) at org.apache.spark.streaming.Repo$$anonfun$createContext$1.apply(Repo.scala:23) at org.apache.spark.streaming.Repo$$anonfun$createContext$1.apply(Repo.scala:19) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627) ``` According to exception, it shows I invoked transformations and actions in other transformations, but I did not. The real reason is that I used external RDD in DStream operation. External RDD data is not stored in checkpoint, so that during recovering, the initial value of _sc in this RDD is assigned to null and hit above exception. But you can find the error message is misleading, it indicates nothing about the real issue Here is the code to reproduce it. ```scala object Repo { def createContext(ip: String, port: Int, checkpointDirectory: String):StreamingContext = { println("Creating new context") val sparkConf = new SparkConf().setAppName("Repo").setMaster("local[2]") val ssc = new StreamingContext(sparkConf, Seconds(2)) ssc.checkpoint(checkpointDirectory) var cached = ssc.sparkContext.parallelize(Seq("apple, banana")) val words = ssc.socketTextStream(ip, port).flatMap(_.split(" ")) words.foreachRDD((rdd: RDD[String]) => { val res = rdd.map(word => (word, word.length)).collect() println("words: " + res.mkString(", ")) cached = cached.union(rdd) cached.checkpoint() println("cached words: " + cached.collect.mkString(", ")) }) ssc } def main(args: Array[String]) { val ip = "localhost" val port = 9999 val dir = "/home/maowei/tmp" val ssc = StreamingContext.getOrCreate(dir, () => { createContext(ip, port, dir) }) ssc.start() ssc.awaitTermination() } } ``` Author: mwws <wei.mao@intel.com> Closes #11595 from mwws/SPARK-MissleadingLog.	2016-03-10 15:45:06 +00:00
Sergiusz Urbaniak	a4a0addccf	[SPARK-13492][MESOS] Configurable Mesos framework webui URL. ## What changes were proposed in this pull request? Previously the Mesos framework webui URL was being derived only from the Spark UI address leaving no possibility to configure it. This commit makes it configurable. If unset it falls back to the previous behavior. Motivation: This change is necessary in order to be able to install Spark on DCOS and to be able to give it a custom service link. The configured `webui_url` is configured to point to a reverse proxy in the DCOS environment. ## How was this patch tested? Locally, using unit tests and on DCOS testing and stable revision. Author: Sergiusz Urbaniak <sur@mesosphere.io> Closes #11369 from s-urbaniak/sur-webui-url.	2016-03-09 18:10:01 -08:00
zhuol	238447db56	[SPARK-13775] History page sorted by completed time desc by default. ## What changes were proposed in this pull request? Originally the page is sorted by AppID by default. After tests with users' feedback, we think it might be best to sort by completed time (desc). ## How was this patch tested? Manually test, with screenshot as follows. ![sorted-by-complete-time-desc](https://cloud.githubusercontent.com/assets/11683054/13647686/d6dea924-e5fa-11e5-8fc5-68e039b74b6f.png) Author: zhuol <zhuol@yahoo-inc.com> Closes #11608 from zhuoliu/13775.	2016-03-09 17:58:09 -08:00
Shixiong Zhu	40e0676757	[SPARK-13778][CORE] Set the executor state for a worker when removing it ## What changes were proposed in this pull request? When a worker is lost, the executors on this worker are also lost. But Master's ApplicationPage still displays their states as running. This patch just sets the executor state to `LOST` when a worker is lost. ## How was this patch tested? manual tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11609 from zsxwing/SPARK-13778.	2016-03-09 17:54:34 -08:00
Andrew Or	37fcda3e6c	[SPARK-13747][SQL] Fix concurrent query with fork-join pool ## What changes were proposed in this pull request? Fix this use case, which was already fixed in SPARK-10548 in 1.6 but was broken in master due to #9264: ``` (1 to 100).par.foreach { _ => sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() } ``` This threw `IllegalArgumentException` consistently before this patch. For more detail, see the JIRA. ## How was this patch tested? New test in `SQLExecutionSuite`. Author: Andrew Or <andrew@databricks.com> Closes #11586 from andrewor14/fix-concurrent-sql.	2016-03-09 17:34:28 -08:00
Ahmed Kamal	8e8633e0b2	[SPARK-13769][CORE] Update Java Doc in Spark Submit JIRA : https://issues.apache.org/jira/browse/SPARK-13769 The java doc here (`e97fc7f176/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala (L51)`) needs to be updated from "The latter two operations are currently supported only for standalone cluster mode." to "The latter two operations are currently supported only for standalone and mesos cluster modes." Author: Ahmed Kamal <ahmed.kamal@badrit.com> Closes #11600 from AhmedKamal/SPARK-13769.	2016-03-09 12:28:58 +00:00
Dongjoon Hyun	c3689bc24e	[SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance creation in Java code. ## What changes were proposed in this pull request? In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator. ``` - final ArrayList<Product2<Object, Object>> dataToWrite = - new ArrayList<Product2<Object, Object>>(); + final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>(); ``` Java 7 or higher supports diamond operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this. ## How was this patch tested? Manual. Pass the existing tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11541 from dongjoon-hyun/SPARK-13702.	2016-03-09 10:31:26 +00:00
Andy Sloane	cbff2803ef	[SPARK-13631][CORE] Thread-safe getLocationsWithLargestOutputs ## What changes were proposed in this pull request? If a job is being scheduled in one thread which has a dependency on an RDD currently executing a shuffle in another thread, Spark would throw a NullPointerException. This patch synchronizes access to `mapStatuses` and skips null status entries (which are in-progress shuffle tasks). ## How was this patch tested? Our client code unit test suite, which was reliably reproducing the race condition with 10 threads, shows that this fixes it. I have not found a minimal test case to add to Spark, but I will attempt to do so if desired. The same test case was tripping up on SPARK-4454, which was fixed by making other DAGScheduler code thread-safe. shivaram srowen Author: Andy Sloane <asloane@tetrationanalytics.com> Closes #11505 from a1k0n/SPARK-13631.	2016-03-09 10:25:47 +00:00
Dongjoon Hyun	f3201aeeb0	[SPARK-13692][CORE][SQL] Fix trivial Coverity/Checkstyle defects ## What changes were proposed in this pull request? This issue fixes the following potential bugs and Java coding style detected by Coverity and Checkstyle. - Implement both null and type checking in equals functions. - Fix wrong type casting logic in SimpleJavaBean2.equals. - Add `implement Cloneable` to `UTF8String` and `SortedIterator`. - Remove dereferencing before null check in `AbstractBytesToBytesMapSuite`. - Fix coding style: Add '{}' to single `for` statement in mllib examples. - Remove unused imports in `ColumnarBatch` and `JavaKinesisStreamSuite`. - Remove unused fields in `ChunkFetchIntegrationSuite`. - Add `stop()` to prevent resource leak. Please note that the last two checkstyle errors exist on newly added commits after [SPARK-13583](https://issues.apache.org/jira/browse/SPARK-13583). ## How was this patch tested? manual via `./dev/lint-java` and Coverity site. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11530 from dongjoon-hyun/SPARK-13692.	2016-03-09 10:12:23 +00:00
Josh Rosen	ad3c9a9730	[SPARK-13695] Don't cache MEMORY_AND_DISK blocks as bytes in memory after spills When a cached block is spilled to disk and read back in serialized form (i.e. as bytes), the current BlockManager implementation will attempt to re-insert the serialized block into the MemoryStore even if the block's storage level requests deserialized caching. This behavior adds some complexity to the MemoryStore but I don't think it offers many performance benefits and I'd like to remove it in order to simplify a larger refactoring patch. Therefore, this patch changes the behavior so that disk store reads will only cache bytes in the memory store for blocks with serialized storage levels. There are two places where we request serialized bytes from the BlockStore: 1. getLocalBytes(), which is only called when reading local copies of TorrentBroadcast pieces. Broadcast pieces are always cached using a serialized storage level, so this won't lead to a mismatch in serialization forms if spilled bytes read from disk are cached as bytes in the memory store. 2. the non-shuffle-block branch in getBlockData(), which is only called by the NettyBlockRpcServer when responding to requests to read remote blocks. Caching the serialized bytes in memory will only benefit us if those cached bytes are read before they're evicted and the likelihood of that happening seems low since the frequency of remote reads of non-broadcast cached blocks seems very low. Caching these bytes when they have a low probability of being read is bad if it risks the eviction of blocks which are cached in their expected serialized/deserialized forms, since those blocks seem more likely to be read in local computation. Given the argument above, I think this change is unlikely to cause performance regressions. Author: Josh Rosen <joshrosen@databricks.com> Closes #11533 from JoshRosen/remove-memorystore-level-mismatch.	2016-03-08 10:40:27 -08:00
jerryshao	9e86e6efd1	[SPARK-13675][UI] Fix wrong historyserver url link for application running in yarn cluster mode ## What changes were proposed in this pull request? Current URL for each application to access history UI is like: http://localhost:18080/history/application_1457058760338_0016/1/jobs/ or http://localhost:18080/history/application_1457058760338_0016/2/jobs/ Here 1 or 2 represents the number of attempts in `historypage.js`, but it will parse to attempt id in `HistoryServer`, while the correct attempt id should be like "appattempt_1457058760338_0016_000002", so it will fail to parse to a correct attempt id in HistoryServer. This is OK in yarn client mode, since we don't need this attempt id to fetch out the app cache, but it is failed in yarn cluster mode, where attempt id "1" or "2" is actually wrong. So here we should fix this url to parse the correct application id and attempt id. Also the suffix "jobs/" is not needed. Here is the screenshot: ![screen shot 2016-02-29 at 3 57 32 pm](https://cloud.githubusercontent.com/assets/850797/13524377/d4b44348-e235-11e5-8b3e-bc06de306e87.png) ## How was this patch tested? This patch is tested manually, with different master and deploy mode. ![image](https://cloud.githubusercontent.com/assets/850797/13524419/118be5a0-e236-11e5-8022-3ff613ccde46.png) Author: jerryshao <sshao@hortonworks.com> Closes #11518 from jerryshao/SPARK-13675.	2016-03-08 09:09:42 -06:00
Devaraj K	9bf76ddde5	[SPARK-13117][WEB UI] WebUI should use the local ip not 0.0.0.0 ## What changes were proposed in this pull request? In WebUI, now Jetty Server starts with SPARK_LOCAL_IP config value if it is configured otherwise it starts with default value as '0.0.0.0'. It is continuation as per the closed PR https://github.com/apache/spark/pull/11133 for the JIRA SPARK-13117 and discussion in SPARK-13117. ## How was this patch tested? This has been verified using the command 'netstat -tnlp \| grep <PID>' to check on which IP/hostname is binding with the below steps. In the below results, mentioned PID in the command is the corresponding process id. #### Without the patch changes, Web UI(Jetty Server) is not taking the value configured for SPARK_LOCAL_IP and it is listening to all the interfaces. ###### Master ``` [devarajstobdtserver2 sbin]$ netstat -tnlp \| grep 3930 tcp6 0 0 :::8080 :::* LISTEN 3930/java ``` ###### Worker ``` [devarajstobdtserver2 sbin]$ netstat -tnlp \| grep 4090 tcp6 0 0 :::8081 :::* LISTEN 4090/java ``` ###### History Server Process, ``` [devarajstobdtserver2 sbin]$ netstat -tnlp \| grep 2471 tcp6 0 0 :::18080 :::* LISTEN 2471/java ``` ###### Driver ``` [devarajstobdtserver2 spark-master]$ netstat -tnlp \| grep 6556 tcp6 0 0 :::4040 :::* LISTEN 6556/java ``` #### With the patch changes ##### i. With SPARK_LOCAL_IP configured If the SPARK_LOCAL_IP is configured then all the processes Web UI(Jetty Server) is getting bind to the configured value. ###### Master ``` [devarajstobdtserver2 sbin]$ netstat -tnlp \| grep 1561 tcp6 0 0 x.x.x.x:8080 :::* LISTEN 1561/java ``` ###### Worker ``` [devarajstobdtserver2 sbin]$ netstat -tnlp \| grep 2229 tcp6 0 0 x.x.x.x:8081 :::* LISTEN 2229/java ``` ###### History Server ``` [devarajstobdtserver2 sbin]$ netstat -tnlp \| grep 3747 tcp6 0 0 x.x.x.x:18080 :::* LISTEN 3747/java ``` ###### Driver ``` [devarajstobdtserver2 spark-master]$ netstat -tnlp \| grep 6013 tcp6 0 0 x.x.x.x:4040 :::* LISTEN 6013/java ``` ##### ii. Without SPARK_LOCAL_IP configured If the SPARK_LOCAL_IP is not configured then all the processes Web UI(Jetty Server) will start with the '0.0.0.0' as default value. ###### Master ``` [devarajstobdtserver2 sbin]$ netstat -tnlp \| grep 4573 tcp6 0 0 :::8080 :::* LISTEN 4573/java ``` ###### Worker ``` [devarajstobdtserver2 sbin]$ netstat -tnlp \| grep 4703 tcp6 0 0 :::8081 :::* LISTEN 4703/java ``` ###### History Server ``` [devarajstobdtserver2 sbin]$ netstat -tnlp \| grep 4846 tcp6 0 0 :::18080 :::* LISTEN 4846/java ``` ###### Driver ``` [devarajstobdtserver2 sbin]$ netstat -tnlp \| grep 5437 tcp6 0 0 :::4040 :::* LISTEN 5437/java ``` Author: Devaraj K <devaraj@apache.org> Closes #11490 from devaraj-kavali/SPARK-13117-v1.	2016-03-08 10:48:31 +00:00
Josh Rosen	e52e597db4	[SPARK-13659] Refactor BlockStore put*() APIs to remove returnValues In preparation for larger refactoring, this patch removes the confusing `returnValues` option from the BlockStore put() APIs: returning the value is only useful in one place (caching) and in other situations, such as block replication, it's simpler to put() and then get(). As part of this change, I needed to refactor `BlockManager.doPut()`'s block replication code. I also changed `doPut()` to access the memory and disk stores directly rather than calling them through the BlockStore interface; this is in anticipation of a followup patch to remove the BlockStore interface so that the disk store can expose a binary-data-oriented API which is not concerned with Java objects or serialization. These changes should be covered by the existing storage unit tests. The best way to review this patch is probably to look at the individual commits, all of which are small and have useful descriptions to guide the review. /cc davies for review. Author: Josh Rosen <joshrosen@databricks.com> Closes #11502 from JoshRosen/remove-returnvalues.	2016-03-07 21:50:01 -08:00
Shixiong Zhu	017cdf2be6	[SPARK-13711][CORE] Don't call SparkUncaughtExceptionHandler in AppClient as it's in driver ## What changes were proposed in this pull request? AppClient runs in the driver side. It should not call `Utils.tryOrExit` as it will send exception to SparkUncaughtExceptionHandler and call `System.exit`. This PR just removed `Utils.tryOrExit`. ## How was this patch tested? manual tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11566 from zsxwing/SPARK-13711.	2016-03-07 20:56:08 -08:00
Michael Armbrust	e720dda42e	[SPARK-13665][SQL] Separate the concerns of HadoopFsRelation `HadoopFsRelation` is used for reading most files into Spark SQL. However today this class mixes the concerns of file management, schema reconciliation, scan building, bucketing, partitioning, and writing data. As a result, many data sources are forced to reimplement the same functionality and the various layers have accumulated a fair bit of inefficiency. This PR is a first cut at separating this into several components / interfaces that are each described below. Additionally, all implementations inside of Spark (parquet, csv, json, text, orc, svmlib) have been ported to the new API `FileFormat`. External libraries, such as spark-avro will also need to be ported to work with Spark 2.0. ### HadoopFsRelation A simple `case class` that acts as a container for all of the metadata required to read from a datasource. All discovery, resolution and merging logic for schemas and partitions has been removed. This an internal representation that no longer needs to be exposed to developers. ```scala case class HadoopFsRelation( sqlContext: SQLContext, location: FileCatalog, partitionSchema: StructType, dataSchema: StructType, bucketSpec: Option[BucketSpec], fileFormat: FileFormat, options: Map[String, String]) extends BaseRelation ``` ### FileFormat The primary interface that will be implemented by each different format including external libraries. Implementors are responsible for reading a given format and converting it into `InternalRow` as well as writing out an `InternalRow`. A format can optionally return a schema that is inferred from a set of files. ```scala trait FileFormat { def inferSchema( sqlContext: SQLContext, options: Map[String, String], files: Seq[FileStatus]): Option[StructType] def prepareWrite( sqlContext: SQLContext, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory def buildInternalScan( sqlContext: SQLContext, dataSchema: StructType, requiredColumns: Array[String], filters: Array[Filter], bucketSet: Option[BitSet], inputFiles: Array[FileStatus], broadcastedConf: Broadcast[SerializableConfiguration], options: Map[String, String]): RDD[InternalRow] } ``` The current interface is based on what was required to get all the tests passing again, but still mixes a couple of concerns (i.e. `bucketSet` is passed down to the scan instead of being resolved by the planner). Additionally, scans are still returning `RDD`s instead of iterators for single files. In a future PR, bucketing should be removed from this interface and the scan should be isolated to a single file. ### FileCatalog This interface is used to list the files that make up a given relation, as well as handle directory based partitioning. ```scala trait FileCatalog { def paths: Seq[Path] def partitionSpec(schema: Option[StructType]): PartitionSpec def allFiles(): Seq[FileStatus] def getStatus(path: Path): Array[FileStatus] def refresh(): Unit } ``` Currently there are two implementations: - `HDFSFileCatalog` - based on code from the old `HadoopFsRelation`. Infers partitioning by recursive listing and caches this data for performance - `HiveFileCatalog` - based on the above, but it uses the partition spec from the Hive Metastore. ### ResolvedDataSource Produces a logical plan given the following description of a Data Source (which can come from DataFrameReader or a metastore): - `paths: Seq[String] = Nil` - `userSpecifiedSchema: Option[StructType] = None` - `partitionColumns: Array[String] = Array.empty` - `bucketSpec: Option[BucketSpec] = None` - `provider: String` - `options: Map[String, String]` This class is responsible for deciding which of the Data Source APIs a given provider is using (including the non-file based ones). All reconciliation of partitions, buckets, schema from metastores or inference is done here. ### DataSourceAnalysis / DataSourceStrategy Responsible for analyzing and planning reading/writing of data using any of the Data Source APIs, including: - pruning the files from partitions that will be read based on filters. - appending partition columns* - applying additional filters when a data source can not evaluate them internally. - constructing an RDD that is bucketed correctly when required* - sanity checking schema match-up and other analysis when writing. *In the future we should do that following: - Break out file handling into its own Strategy as its sufficiently complex / isolated. - Push the appending of partition columns down in to `FileFormat` to avoid an extra copy / unvectorization. - Use a custom RDD for scans instead of `SQLNewNewHadoopRDD2` Author: Michael Armbrust <michael@databricks.com> Author: Wenchen Fan <wenchen@databricks.com> Closes #11509 from marmbrus/fileDataSource.	2016-03-07 15:15:10 -08:00
Marcelo Vanzin	e1fb857992	[SPARK-529][CORE][YARN] Add type-safe config keys to SparkConf. This is, in a way, the basics to enable SPARK-529 (which was closed as won't fix but I think is still valuable). In fact, Spark SQL created something for that, and this change basically factors out that code and inserts it into SparkConf, with some extra bells and whistles. To showcase the usage of this pattern, I modified the YARN backend to use the new config keys (defined in the new `config` package object under `o.a.s.deploy.yarn`). Most of the changes are mechanic, although logic had to be slightly modified in a handful of places. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10205 from vanzin/conf-opts.	2016-03-07 14:13:44 -08:00
Alex Bozarth	5f42c28b11	[SPARK-13459][WEB UI] Separate Alive and Dead Executors in Executor Totals Table ## What changes were proposed in this pull request? Now that dead executors are shown in the executors table (#10058) the totals table is updated to include the separate totals for alive and dead executors as well as the current total, as originally discussed in #10668 ## How was this patch tested? Manually verified by running the Standalone Web UI in the latest Safari and Firefox ESR Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #11381 from ajbozarth/spark13459.	2016-03-04 17:04:09 -06:00
Holden Karau	c04dc27ced	[SPARK-13398][STREAMING] Move away from thread pool task support to forkjoin ## What changes were proposed in this pull request? Remove old deprecated ThreadPoolExecutor and replace with ExecutionContext using a ForkJoinPool. The downside of this is that scala's ForkJoinPool doesn't give us a way to specify the thread pool name (and is a wrapper of Java's in 2.12) except by providing a custom factory. Note that we can't use Java's ForkJoinPool directly in Scala 2.11 since it uses a ExecutionContext which reports system parallelism. One other implicit change that happens is the old ExecutionContext would have reported a different default parallelism since it used system parallelism rather than threadpool parallelism (this was likely not intended but also likely not a huge difference). The previous version of this PR attempted to use an execution context constructed on the ThreadPool (but not the deprecated ThreadPoolExecutor class) so as to keep the ability to have human readable named threads but this reported system parallelism. ## How was this patch tested? unit tests: streaming/testOnly org.apache.spark.streaming.util.* Author: Holden Karau <holden@us.ibm.com> Closes #11423 from holdenk/SPARK-13398-move-away-from-ThreadPoolTaskSupport-java-forkjoin.	2016-03-04 10:56:58 +00:00
Dongjoon Hyun	941b270b70	[MINOR] Fix typos in comments and testcase name of code ## What changes were proposed in this pull request? This PR fixes typos in comments and testcase name of code. ## How was this patch tested? manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11481 from dongjoon-hyun/minor_fix_typos_in_code.	2016-03-03 22:42:12 +00:00
Sean Owen	645c3a85e2	[SPARK-13423][HOTFIX] Static analysis fixes for 2.x / fixed for Scala 2.10 ## What changes were proposed in this pull request? Fixes compile problem due to inadvertent use of `Option.contains`, only in Scala 2.11. The change should have been to replace `Option.exists(_ == x)` with `== Some(x)`. Replacing exists with contains only makes sense for collections. Replacing use of `Option.exists` still makes sense though as it's misleading. ## How was this patch tested? Jenkins tests / compilation (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Sean Owen <sowen@cloudera.com> Closes #11493 from srowen/SPARK-13423.2.	2016-03-03 15:11:02 +00:00
Dongjoon Hyun	b5f02d6743	[SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle rule ## What changes were proposed in this pull request? After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time. This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers. ## How was this patch tested? ``` ./dev/lint-java ./build/sbt compile ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11438 from dongjoon-hyun/SPARK-13583.	2016-03-03 10:12:32 +00:00
Sean Owen	e97fc7f176	[SPARK-13423][WIP][CORE][SQL][STREAMING] Static analysis fixes for 2.x ## What changes were proposed in this pull request? Make some cross-cutting code improvements according to static analysis. These are individually up for discussion since they exist in separate commits that can be reverted. The changes are broadly: - Inner class should be static - Mismatched hashCode/equals - Overflow in compareTo - Unchecked warnings - Misuse of assert, vs junit.assert - get(a) + getOrElse(b) -> getOrElse(a,b) - Array/String .size -> .length (occasionally, -> .isEmpty / .nonEmpty) to avoid implicit conversions - Dead code - tailrec - exists(_ == ) -> contains find + nonEmpty -> exists filter + size -> count - reduce(_+_) -> sum map + flatten -> map The most controversial may be .size -> .length simply because of its size. It is intended to avoid implicits that might be expensive in some places. ## How was the this patch tested? Existing Jenkins unit tests. Author: Sean Owen <sowen@cloudera.com> Closes #11292 from srowen/SPARK-13423.	2016-03-03 09:54:09 +00:00
Devaraj K	56e3d00715	[SPARK-13621][CORE] TestExecutor.scala needs to be moved to test package Moved TestExecutor.scala from src to test package and removed the unused file TestClient.scala. Author: Devaraj K <devaraj@apache.org> Closes #11474 from devaraj-kavali/SPARK-13621.	2016-03-02 22:34:44 -08:00
Davies Liu	b5a59a0fe2	[SPARK-13601] call failure callbacks before writer.close() ## What changes were proposed in this pull request? In order to tell OutputStream that the task has failed or not, we should call the failure callbacks BEFORE calling writer.close(). ## How was this patch tested? Added new unit tests. Author: Davies Liu <davies@databricks.com> Closes #11450 from davies/callback.	2016-03-02 14:35:44 -08:00
Josh Rosen	d6969ffc0f	[SPARK-12817] Add BlockManager.getOrElseUpdate and remove CacheManager CacheManager directly calls MemoryStore.unrollSafely() and has its own logic for handling graceful fallback to disk when cached data does not fit in memory. However, this logic also exists inside of the MemoryStore itself, so this appears to be unnecessary duplication. Thanks to the addition of block-level read/write locks in #10705, we can refactor the code to remove the CacheManager and replace it with an atomic `BlockManager.getOrElseUpdate()` method. This pull request replaces / subsumes #10748. /cc andrewor14 and nongli for review. Note that this changes the locking semantics of a couple of internal BlockManager methods (`doPut()` and `lockNewBlockForWriting`), so please pay attention to the Scaladoc changes and new test cases for those methods. Author: Josh Rosen <joshrosen@databricks.com> Closes #11436 from JoshRosen/remove-cachemanager.	2016-03-02 10:26:47 -08:00
Marcelo Vanzin	c7fccb56cd	[SPARK-13478][YARN] Use real user when fetching delegation tokens. The Hive client library is not smart enough to notice that the current user is a proxy user; so when using a proxy user, it fails to fetch delegation tokens from the metastore because of a missing kerberos TGT for the current user. To fix it, just run the code that fetches the delegation token as the real logged in user. Tested on a kerberos cluster both submitting normally and with a proxy user; Hive and HBase tokens are retrieved correctly in both cases. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11358 from vanzin/SPARK-13478.	2016-02-29 13:01:27 -08:00
Shixiong Zhu	644dbb641a	[SPARK-13522][CORE] Fix the exit log place for heartbeat ## What changes were proposed in this pull request? Just fixed the log place introduced by #11401 ## How was this patch tested? unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11432 from zsxwing/SPARK-13522-follow-up.	2016-02-29 11:52:11 -08:00
Shixiong Zhu	17a253cbf4	[SPARK-13522][CORE] Executor should kill itself when it's unable to heartbeat to driver more than N times ## What changes were proposed in this pull request? Sometimes, network disconnection event won't be triggered for other potential race conditions that we may not have thought of, then the executor will keep sending heartbeats to driver and won't exit. This PR adds a new configuration `spark.executor.heartbeat.maxFailures` to kill Executor when it's unable to heartbeat to the driver more than `spark.executor.heartbeat.maxFailures` times. ## How was this patch tested? unit tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11401 from zsxwing/SPARK-13522.	2016-02-29 11:02:45 -08:00
zhuol	2f91f5ac0d	[SPARK-13481] Desc order of appID by default for history server page. ## What changes were proposed in this pull request? Now by default, it shows as ascending order of appId. We might prefer to display as descending order by default, which will show the latest application at the top. ## How was this patch tested? Manual tested. See screenshot below: ![desc-sort](https://cloud.githubusercontent.com/assets/11683054/13307473/102f4cf8-db31-11e5-8dd5-391edbf32f0d.png) Author: zhuol <zhuol@yahoo-inc.com> Closes #11357 from zhuoliu/13481.	2016-02-29 08:37:33 -06:00
Jeff Zhang	99fe8993f5	[SPARK-12994][CORE] It is not necessary to create ExecutorAllocationM… …anager in local mode Author: Jeff Zhang <zjffdu@apache.org> Closes #10914 from zjffdu/SPARK-12994.	2016-02-29 12:08:37 +00:00
Shixiong Zhu	ad615291fe	[SPARK-13519][CORE] Driver should tell Executor to stop itself when cleaning executor's state ## What changes were proposed in this pull request? When the driver removes an executor's state, the connection between the driver and the executor may be still alive so that the executor cannot exit automatically (E.g., Master will send RemoveExecutor when a work is lost but the executor is still alive), so the driver should try to tell the executor to stop itself. Otherwise, we will leak an executor. This PR modified the driver to send `StopExecutor` to the executor when it's removed. ## How was this patch tested? manual test: increase the worker heartbeat interval to force it's always timeout and the leak executors are gone. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11399 from zsxwing/SPARK-13519.	2016-02-26 15:11:57 -08:00
Reynold Xin	391755dc6e	[SPARK-13465] Add a task failure listener to TaskContext ## What changes were proposed in this pull request? TaskContext supports task completion callback, which gets called regardless of task failures. However, there is no way for the listener to know if there is an error. This patch adds a new listener that gets called when a task fails. ## How was the this patch tested? New unit test case and integration test case covering the code path Author: Reynold Xin <rxin@databricks.com> Closes #11340 from rxin/SPARK-13465.	2016-02-26 12:49:16 -08:00
Josh Rosen	633d63a48a	[SPARK-12757] Add block-level read/write locks to BlockManager ## Motivation As a pre-requisite to off-heap caching of blocks, we need a mechanism to prevent pages / blocks from being evicted while they are being read. With on-heap objects, evicting a block while it is being read merely leads to memory-accounting problems (because we assume that an evicted block is a candidate for garbage-collection, which will not be true during a read), but with off-heap memory this will lead to either data corruption or segmentation faults. ## Changes ### BlockInfoManager and reader/writer locks This patch adds block-level read/write locks to the BlockManager. It introduces a new `BlockInfoManager` component, which is contained within the `BlockManager`, holds the `BlockInfo` objects that the `BlockManager` uses for tracking block metadata, and exposes APIs for locking blocks in either shared read or exclusive write modes. `BlockManager`'s `get()` and `put()` methods now implicitly acquire the necessary locks. After a `get()` call successfully retrieves a block, that block is locked in a shared read mode. A `put()` call will block until it acquires an exclusive write lock. If the write succeeds, the write lock will be downgraded to a shared read lock before returning to the caller. This `put()` locking behavior allows us store a block and then immediately turn around and read it without having to worry about it having been evicted between the write and the read, which will allow us to significantly simplify `CacheManager` in the future (see #10748). See `BlockInfoManagerSuite`'s test cases for a more detailed specification of the locking semantics. ### Auto-release of locks at the end of tasks Our locking APIs support explicit release of locks (by calling `unlock()`), but it's not always possible to guarantee that locks will be released prior to the end of the task. One reason for this is our iterator interface: since our iterators don't support an explicit `close()` operator to signal that no more records will be consumed, operations like `take()` or `limit()` don't have a good means to release locks on their input iterators' blocks. Another example is broadcast variables, whose block locks can only be released at the end of the task. To address this, `BlockInfoManager` uses a pair of maps to track the set of locks acquired by each task. Lock acquisitions automatically record the current task attempt id by obtaining it from `TaskContext`. When a task finishes, code in `Executor` calls `BlockInfoManager.unlockAllLocksForTask(taskAttemptId)` to free locks. ### Locking and the MemoryStore In order to prevent in-memory blocks from being evicted while they are being read, the `MemoryStore`'s `evictBlocksToFreeSpace()` method acquires write locks on blocks which it is considering as candidates for eviction. These lock acquisitions are non-blocking, so a block which is being read will not be evicted. By holding write locks until the eviction is performed or skipped (in case evicting the blocks would not free enough memory), we avoid a race where a new reader starts to read a block after the block has been marked as an eviction candidate but before it has been removed. ### Locking and remote block transfer This patch makes small changes to to block transfer and network layer code so that locks acquired by the BlockTransferService are released as soon as block transfer messages are consumed and released by Netty. This builds on top of #11193, a bug fix related to freeing of network layer ManagedBuffers. ## FAQ - Why not use Java's built-in [`ReadWriteLock`](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html)? Our locks operate on a per-task rather than per-thread level. Under certain circumstances a task may consist of multiple threads, so using `ReadWriteLock` would mean that we might call `unlock()` from a thread which didn't hold the lock in question, an operation which has undefined semantics. If we could rely on Java 8 classes, we might be able to use [`StampedLock`](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/StampedLock.html) to work around this issue. - Why not detect "leaked" locks in tests?: See above notes about `take()` and `limit`. Author: Josh Rosen <joshrosen@databricks.com> Closes #10705 from JoshRosen/pin-pages.	2016-02-25 17:17:56 -08:00
Josh Rosen	f2cfafdfe0	[SPARK-13501] Remove use of Guava Stopwatch Our nightly doc snapshot builds are failing due to some issue involving the Guava Stopwatch constructor: ``` [error] /home/jenkins/workspace/spark-master-docs/spark/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala:496: constructor Stopwatch in class Stopwatch cannot be accessed in class CoarseMesosSchedulerBackend [error] val stopwatch = new Stopwatch() [error] ^ ``` This Stopwatch constructor was deprecated in newer versions of Guava (`fd0cbc2c5c`) and it's possible that some classpath issues affecting Unidoc could be causing this to trigger compilation failures. In order to work around this issue, this patch removes this use of Stopwatch since we don't use it anywhere else in the Spark codebase. Author: Josh Rosen <joshrosen@databricks.com> Closes #11376 from JoshRosen/remove-stopwatch.	2016-02-25 17:04:43 -08:00
Liwei Lin	dc6c5ea4c9	[SPARK-13468][WEB UI] Fix a corner case where the Stage UI page should show DAG but it doesn't show When uses clicks more than one time on any stage in the DAG graph on the Job web UI page, many new Stage web UI pages are opened, but only half of their DAG graphs are expanded. After this PR's fix, every newly opened Stage page's DAG graph is expanded. Before: ![](https://cloud.githubusercontent.com/assets/15843379/13279144/74808e86-db10-11e5-8514-cecf31af8908.png) After: ![](https://cloud.githubusercontent.com/assets/15843379/13279145/77ca5dec-db10-11e5-9457-8e1985461328.png) ## What changes were proposed in this pull request? - Removed the `expandDagViz` parameter for _Stage_ page and related codes - Added a `onclick` function setting `expandDagVizArrowKey(false)` as `true` ## How was this patch tested? Manual tests (with this fix) to verified this fix work: - clicked many times on _Job_ Page's DAG Graph → each newly opened Stage page's DAG graph is expanded Manual tests (with this fix) to verified this fix do not break features we already had: - refreshed many times for a same _Stage_ page (whose DAG already expanded) → DAG remained expanded upon every refresh - refreshed many times for a same _Stage_ page (whose DAG unexpanded) → DAG remained unexpanded upon every refresh - refreshed many times for a same _Job_ page (whose DAG already expanded) → DAG remained expanded upon every refresh - refreshed many times for a same _Job_ page (whose DAG unexpanded) → DAG remained unexpanded upon every refresh Author: Liwei Lin <proflin.me@gmail.com> Closes #11368 from proflin/SPARK-13468.	2016-02-25 15:36:25 -08:00
Shixiong Zhu	46f6e79316	Revert "[SPARK-13117][WEB UI] WebUI should use the local ip not 0.0.0.0" This reverts commit `2e44031faf`.	2016-02-25 11:39:26 -08:00
Devaraj K	2e44031faf	[SPARK-13117][WEB UI] WebUI should use the local ip not 0.0.0.0 Fixed the HTTP Server Host Name/IP issue i.e. HTTP Server to take the configured host name/IP and not '0.0.0.0' always. Author: Devaraj K <devaraj@apache.org> Closes #11133 from devaraj-kavali/SPARK-13117.	2016-02-25 12:18:43 +00:00
Wenchen Fan	a60f91284c	[SPARK-13467] [PYSPARK] abstract python function to simplify pyspark code ## What changes were proposed in this pull request? When we pass a Python function to JVM side, we also need to send its context, e.g. `envVars`, `pythonIncludes`, `pythonExec`, etc. However, it's annoying to pass around so many parameters at many places. This PR abstract python function along with its context, to simplify some pyspark code and make the logic more clear. ## How was the this patch tested? by existing unit tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #11342 from cloud-fan/python-clean.	2016-02-24 12:44:54 -08:00
Daniel Jalova	bcfd55fa98	[SPARK-12759][Core][Spark should fail fast if --executor-memory is too small for spark to start] Added an exception to be thrown in UnifiedMemoryManager.scala if the configuration given for executor memory is too low. Also modified the exception message thrown when driver memory is too low. This patch was tested manually by passing in config options to Spark shell. I also added a test in UnifiedMemoryManagerSuite.scala Author: Daniel Jalova <djalova@us.ibm.com> Closes #11255 from djalova/SPARK-12759.	2016-02-24 12:15:11 +00:00
Davies Liu	9cdd867da9	[SPARK-13373] [SQL] generate sort merge join ## What changes were proposed in this pull request? Generates code for SortMergeJoin. ## How was the this patch tested? Unit tests and manually tested with TPCDS Q72, which showed 70% performance improvements (from 42s to 25s), but micro benchmark only show minor improvements, it may depends the distribution of data and number of columns. Author: Davies Liu <davies@databricks.com> Closes #11248 from davies/gen_smj.	2016-02-23 15:00:10 -08:00
Lianhui Wang	9f4263392e	[SPARK-7729][UI] Executor which has been killed should also be displayed on Executor Tab andrewor14 squito Dead Executors should also be displayed on Executor Tab. as following: ![image](https://cloud.githubusercontent.com/assets/545478/11492707/ae55d7f6-982b-11e5-919a-b62cd84684b2.png) Author: Lianhui Wang <lianhuiwang09@gmail.com> This patch had conflicts when merged, resolved by Committer: Andrew Or <andrew@databricks.com> Closes #10058 from lianhuiwang/SPARK-7729.	2016-02-23 11:08:39 -08:00
zhuol	4d1e5f92e1	[SPARK-13364] Sort appId as num rather than str in history page. ## What changes were proposed in this pull request? History page now sorts the appID as a string, which can lead to unexpected order for the case "application_11111_9" and "application_11111_20". Add a new sort type called appId-numeric can fix it. ## How was the this patch tested? This patch was manually tested with UI. See the screenshot below: ![sortappidbetter](https://cloud.githubusercontent.com/assets/11683054/13185564/7f941a16-d707-11e5-8fb7-0316368d3030.png) Author: zhuol <zhuol@yahoo-inc.com> Closes #11259 from zhuoliu/13364.	2016-02-23 11:16:42 -06:00
Liang-Chi Hsieh	87d7f8904a	[SPARK-13358] [SQL] Retrieve grep path when do benchmark JIRA: https://issues.apache.org/jira/browse/SPARK-13358 When trying to run a benchmark, I found that on my Ubuntu linux grep is not in /usr/bin/ but /bin/. So wondering if it is better to use which to retrieve grep path. cc davies Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #11231 from viirya/benchmark-grep-path.	2016-02-23 07:56:08 -08:00
jerryshao	e99d017098	[SPARK-13220][CORE] deprecate yarn-client and yarn-cluster mode Author: jerryshao <sshao@hortonworks.com> Closes #11229 from jerryshao/SPARK-13220.	2016-02-23 12:30:57 +00:00
Shixiong Zhu	a11b399519	[SPARK-13298][CORE][UI] Escape "label" to avoid DAG being broken by some special character ## What changes were proposed in this pull request? When there are some special characters (e.g., `"`, `\`) in `label`, DAG will be broken. This patch just escapes `label` to avoid DAG being broken by some special characters ## How was the this patch tested? Jenkins tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11309 from zsxwing/SPARK-13298.	2016-02-22 17:42:30 -08:00
Reynold Xin	4a91806a45	[SPARK-13413] Remove SparkContext.metricsSystem ## What changes were proposed in this pull request? This patch removes SparkContext.metricsSystem. SparkContext.metricsSystem returns MetricsSystem, which is a private class. I think it was added by accident. In addition, I also removed an unused private[spark] method schedulerBackend setter. ## How was the this patch tested? N/A. Author: Reynold Xin <rxin@databricks.com> This patch had conflicts when merged, resolved by Committer: Josh Rosen <joshrosen@databricks.com> Closes #11282 from rxin/SPARK-13413.	2016-02-22 14:01:35 -08:00
Timothy Chen	00461bb911	[SPARK-10749][MESOS] Support multiple roles with mesos cluster mode. Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler. https://issues.apache.org/jira/browse/SPARK-10749 Author: Timothy Chen <tnachen@gmail.com> Closes #8872 from tnachen/cluster_multi_roles.	2016-02-22 11:11:33 -08:00
Dongjoon Hyun	024482bf51	[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.	2016-02-22 09:52:07 +00:00
jerryshao	39ff154570	[SPARK-13426][CORE] Remove the support of SIMR ## What changes were proposed in this pull request? This PR removes the support of SIMR, since SIMR is not actively used and maintained for a long time, also is not supported from `SparkSubmit`, so here propose to remove it. ## How was the this patch tested? This patch is tested locally by running unit tests. Author: jerryshao <sshao@hortonworks.com> Closes #11296 from jerryshao/SPARK-13426.	2016-02-22 00:57:10 -08:00
Shixiong Zhu	dfb2ae2f14	[SPARK-13408] [CORE] Ignore errors when it's already reported in JobWaiter ## What changes were proposed in this pull request? `JobWaiter.taskSucceeded` will be called for each task. When `resultHandler` throws an exception, `taskSucceeded` will also throw it for each task. DAGScheduler just catches it and reports it like this: ```Scala try { job.listener.taskSucceeded(rt.outputId, event.result) } catch { case e: Exception => // TODO: Perhaps we want to mark the resultStage as failed? job.listener.jobFailed(new SparkDriverExecutionException(e)) } ``` Therefore `JobWaiter.jobFailed` may be called multiple times. So `JobWaiter.jobFailed` should use `Promise.tryFailure` instead of `Promise.failure` because the latter one doesn't support calling multiple times. ## How was the this patch tested? Jenkins tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11280 from zsxwing/SPARK-13408.	2016-02-19 23:00:08 -08:00
Josh Rosen	983fa2d620	[SPARK-13407] Guard against garbage-collected accumulators in TaskMetrics.fromAccumulatorUpdates `TaskMetrics.fromAccumulatorUpdates()` can fail if accumulators have been garbage-collected on the driver. To guard against this, this patch introduces `ListenerTaskMetrics`, a subclass of `TaskMetrics` which is used only in `TaskMetrics.fromAccumulatorUpdates()` and which eliminates the need to access the original accumulators on the driver. Author: Josh Rosen <joshrosen@databricks.com> Closes #11276 from JoshRosen/accum-updates-fix.	2016-02-19 15:57:23 -08:00
Sean Owen	fb7e21797e	[SPARK-13339][DOCS] Clarify commutative / associative operator requirements for reduce, fold Clarify that reduce functions need to be commutative, and fold functions do not See https://github.com/apache/spark/pull/11091 Author: Sean Owen <sowen@cloudera.com> Closes #11217 from srowen/SPARK-13339.	2016-02-19 10:26:38 +00:00
Sean Owen	78562535fe	[SPARK-13371][CORE][STRING] TaskSetManager.dequeueSpeculativeTask compares Option and String directly. ## What changes were proposed in this pull request? Fix some comparisons between unequal types that cause IJ warnings and in at least one case a likely bug (TaskSetManager) ## How was the this patch tested? Running Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11253 from srowen/SPARK-13371.	2016-02-18 12:14:30 -08:00
Sital Kedia	1e1e31e03d	[SPARK-13279] Remove O(n^2) operation from scheduler. This commit removes an unnecessary duplicate check in addPendingTask that meant that scheduling a task set took time proportional to (# tasks)^2. Author: Sital Kedia <skedia@fb.com> Closes #11175 from sitalkedia/fix_stuck_driver.	2016-02-16 22:27:39 -08:00
Sean Owen	388cd9ea8d	[SPARK-13172][CORE][SQL] Stop using RichException.getStackTrace it is deprecated Replace `getStackTraceString` with `Utils.exceptionString` Author: Sean Owen <sowen@cloudera.com> Closes #11182 from srowen/SPARK-13172.	2016-02-13 21:05:48 -08:00
markpavey	374c4b2869	[SPARK-13142][WEB UI] Problem accessing Web UI /logPage/ on Microsoft Windows Due to being on a Windows platform I have been unable to run the tests as described in the "Contributing to Spark" instructions. As the change is only to two lines of code in the Web UI, which I have manually built and tested, I am submitting this pull request anyway. I hope this is OK. Is it worth considering also including this fix in any future 1.5.x releases (if any)? I confirm this is my own original work and license it to the Spark project under its open source license. Author: markpavey <mark.pavey@thefilter.com> Closes #11135 from markpavey/JIRA_SPARK-13142_WindowsWebUILogFix.	2016-02-13 08:39:43 +00:00
Michael Gummelt	38bc6018e9	[SPARK-5095] Fix style in mesos coarse grained scheduler code andrewor14 This addressed your style comments from #10993 Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11187 from mgummelt/fix_mesos_style.	2016-02-12 14:57:31 -08:00
Sanket	894921d813	[SPARK-6166] Limit number of in flight outbound requests This JIRA is related to https://github.com/apache/spark/pull/5852 Had to do some minor rework and test to make sure it works with current version of spark. Author: Sanket <schintap@untilservice-lm> Closes #10838 from redsanket/limit-outbound-connections.	2016-02-11 22:40:00 -08:00
Steve Loughran	a2c7dcf61f	[SPARK-7889][WEBUI] HistoryServer updates UI for incomplete apps When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available. It does this by checking if a version of the app has been loaded with a larger filesize. If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI. https://issues.apache.org/jira/browse/SPARK-7889 Author: Steve Loughran <stevel@hortonworks.com> Author: Imran Rashid <irashid@cloudera.com> Closes #11118 from squito/SPARK-7889-alternate.	2016-02-11 21:37:53 -06:00
Reynold Xin	c86009ceb9	Revert "[SPARK-13279] Remove O(n^2) operation from scheduler." This reverts commit `50fa6fd1b3`.	2016-02-11 13:31:13 -08:00
Sital Kedia	50fa6fd1b3	[SPARK-13279] Remove O(n^2) operation from scheduler. This commit removes an unnecessary duplicate check in addPendingTask that meant that scheduling a task set took time proportional to (# tasks)^2. Author: Sital Kedia <skedia@fb.com> Closes #11167 from sitalkedia/fix_stuck_driver and squashes the following commits: 3fe1af8 [Sital Kedia] [SPARK-13279] Remove unnecessary duplicate check in addPendingTask function	2016-02-11 13:28:14 -08:00
Alex Bozarth	13c17cbb05	[SPARK-13124][WEB UI] Fixed CSS and JS issues caused by addition of JQuery DataTables Made sure the old tables continue to use the old css and the new DataTables use the new css. Also fixed it so the Safari Web Inspector doesn't throw errors when on the new DataTables pages. Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #11038 from ajbozarth/spark13124.	2016-02-11 08:50:27 -06:00
Junyang	f9ae99fee1	[SPARK-13074][CORE] Add JavaSparkContext. getPersistentRDDs method The "getPersistentRDDs()" is a useful API of SparkContext to get cached RDDs. However, the JavaSparkContext does not have this API. Add a simple getPersistentRDDs() to get java.util.Map<Integer, JavaRDD> for Java users. Author: Junyang <fly.shenjy@gmail.com> Closes #10978 from flyjy/master.	2016-02-11 09:33:11 +00:00
Sean Owen	29c547303f	[SPARK-12414][CORE] Remove closure serializer Remove spark.closure.serializer option and use JavaSerializer always CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be. Author: Sean Owen <sowen@cloudera.com> Closes #11150 from srowen/SPARK-12414.	2016-02-10 13:34:53 -08:00
zhuol	4b80026f07	[SPARK-13126] fix the right margin of history page. The right margin of the history page is little bit off. A simple fix for that issue. Author: zhuol <zhuol@yahoo-inc.com> Closes #11029 from zhuoliu/13126.	2016-02-10 14:23:41 -06:00
Alex Bozarth	39cc620e9c	[SPARK-13163][WEB UI] Column width on new History Server DataTables not getting set correctly The column width for the new DataTables now adjusts for the current page rather than being hard-coded for the entire table's data. Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #11057 from ajbozarth/spark13163.	2016-02-10 14:07:50 -06:00
Michael Gummelt	80cb963ad9	[SPARK-5095][MESOS] Support launching multiple mesos executors in coarse grained mesos mode. This is the next iteration of tnachen's previous PR: https://github.com/apache/spark/pull/4027 In that PR, we resolved with andrewor14 and pwendell to implement the Mesos scheduler's support of `spark.executor.cores` to be consistent with YARN and Standalone. This PR implements that resolution. This PR implements two high-level features. These two features are co-dependent, so they're implemented both here: - Mesos support for spark.executor.cores - Multiple executors per slave We at Mesosphere have been working with Typesafe on a Spark/Mesos integration test suite: https://github.com/typesafehub/mesos-spark-integration-tests, which passes for this PR. The contribution is my original work and I license the work to the project under the project's open source license. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #10993 from mgummelt/executor_sizing.	2016-02-10 10:53:33 -08:00
Sean Owen	c0b71e0b8f	[SPARK-9307][CORE][SPARK] Logging: Make it either stable or private Make Logging private[spark]. Pretty much all there is to it. Author: Sean Owen <sowen@cloudera.com> Closes #11103 from srowen/SPARK-9307.	2016-02-10 11:02:00 +00:00
Davies Liu	0e5ebac3c1	[SPARK-12950] [SQL] Improve lookup of BytesToBytesMap in aggregate This PR improve the lookup of BytesToBytesMap by: 1. Generate code for calculate the hash code of grouping keys. 2. Do not use MemoryLocation, fetch the baseObject and offset for key and value directly (remove the indirection). Author: Davies Liu <davies@databricks.com> Closes #11010 from davies/gen_map.	2016-02-09 16:41:21 -08:00
Shixiong Zhu	fae830d158	[SPARK-13245][CORE] Call shuffleMetrics methods only in one thread for ShuffleBlockFetcherIterator Call shuffleMetrics's incRemoteBytesRead and incRemoteBlocksFetched when polling FetchResult from `results` so as to always use shuffleMetrics in one thread. Also fix a race condition that could cause memory leak. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11138 from zsxwing/SPARK-13245.	2016-02-09 16:31:00 -08:00
Wenchen Fan	7fe4fe630a	[SPARK-12888] [SQL] [FOLLOW-UP] benchmark the new hash expression Adds the benchmark results as comments. The codegen version is slower than the interpreted version for `simple` case becasue of 3 reasons: 1. codegen version use a more complex hash algorithm than interpreted version, i.e. `Murmur3_x86_32.hashInt` vs [simple multiplication and addition](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L153). 2. codegen version will write the hash value to a row first and then read it out. I tried to create a `GenerateHasher` that can generate code to return hash value directly and got about 60% speed up for the `simple` case, does it worth? 3. the row in `simple` case only has one int field, so the runtime reflection may be removed because of branch prediction, which makes the interpreted version faster. The `array` case is also slow for similar reasons, e.g. array elements are of same type, so interpreted version can probably get rid of runtime reflection by branch prediction. Author: Wenchen Fan <wenchen@databricks.com> Closes #10917 from cloud-fan/hash-benchmark.	2016-02-09 13:06:36 -08:00
Jakob Odersky	f9307d8fc5	[SPARK-13176][CORE] Use native file linking instead of external process ln Since Spark requires at least JRE 1.7, it is safe to use built-in java.nio.Files. Author: Jakob Odersky <jakob@odersky.com> Closes #11098 from jodersky/SPARK-13176.	2016-02-09 08:43:46 +00:00
Andrew Or	eeaf45b926	[SPARK-10620][SPARK-13054] Minor addendum to #10835 Additional changes to #10835, mainly related to style and visibility. This patch also adds back a few deprecated methods for backward compatibility. Author: Andrew Or <andrew@databricks.com> Closes #10958 from andrewor14/task-metrics-to-accums-followups.	2016-02-08 17:23:33 -08:00
Davies Liu	37bc203c8d	[SPARK-13210][SQL] catch OOM when allocate memory and expand array There is a bug when we try to grow the buffer, OOM is ignore wrongly (the assert also skipped by JVM), then we try grow the array again, this one will trigger spilling free the current page, the current record we inserted will be invalid. The root cause is that JVM has less free memory than MemoryManager thought, it will OOM when allocate a page without trigger spilling. We should catch the OOM, and acquire memory again to trigger spilling. And also, we could not grow the array in `insertRecord` of `InMemorySorter` (it was there just for easy testing). Author: Davies Liu <davies@databricks.com> Closes #11095 from davies/fix_expand.	2016-02-08 12:09:20 -08:00
Tommy YU	81da3bee66	[SPARK-5865][API DOC] Add doc warnings for methods that return local data structures rxin srowen I work out note message for rdd.take function, please help to review. If it's fine, I can apply to all other function later. Author: Tommy YU <tummyyu@163.com> Closes #10874 from Wenpei/spark-5865-add-warning-for-localdatastructure.	2016-02-06 17:29:09 +00:00
Davies Liu	4f28291f85	[HOTFIX] fix float part of avgRate	2016-02-05 22:40:40 -08:00
Jakob Odersky	6883a5120c	[SPARK-13171][CORE] Replace future calls with Future Trivial search-and-replace to eliminate deprecation warnings in Scala 2.11. Also works with 2.10 Author: Jakob Odersky <jakob@odersky.com> Closes #11085 from jodersky/SPARK-13171.	2016-02-05 19:00:12 -08:00
Luc Bourlier	0bb5b73387	[SPARK-13002][MESOS] Send initial request of executors for dyn allocation Fix for [SPARK-13002](https://issues.apache.org/jira/browse/SPARK-13002) about the initial number of executors when running with dynamic allocation on Mesos. Instead of fixing it just for the Mesos case, made the change in `ExecutorAllocationManager`. It is already driving the number of executors running on Mesos, only no the initial value. The `None` and `Some(0)` are internal details on the computation of resources to reserved, in the Mesos backend scheduler. `executorLimitOption` has to be initialized correctly, otherwise the Mesos backend scheduler will, either, create to many executors at launch, or not create any executors and not be able to recover from this state. Removed the 'special case' description in the doc. It was not totally accurate, and is not needed anymore. This doesn't fix the same problem visible with Spark standalone. There is no straightforward way to send the initial value in standalone mode. Somebody knowing this part of the yarn support should review this change. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #11047 from skyluc/issue/initial-dyn-alloc-2.	2016-02-05 14:37:42 -08:00
Jakob Odersky	352102ed0b	[SPARK-13208][CORE] Replace use of Pairs with Tuple2s Another trivial deprecation fix for Scala 2.11 Author: Jakob Odersky <jakob@odersky.com> Closes #11089 from jodersky/SPARK-13208.	2016-02-04 22:22:41 -08:00
Raafat Akkad	6dbfc40776	[SPARK-13052] waitingApps metric doesn't show the number of apps currently in the WAITING state Author: Raafat Akkad <raafat.akkad@gmail.com> Closes #10959 from RaafatAkkad/master.	2016-02-04 16:09:31 -08:00

... 4 5 6 7 8 ...

5339 commits