This PR fixes the sum issue and also adds `emptyRDD` so that it's easy to create a test case.
Author: zsxwing <zsxwing@gmail.com>
Closes#6826 from zsxwing/python-emptyRDD and squashes the following commits:
b36993f [zsxwing] Update the return type to JavaRDD[T]
71df047 [zsxwing] Add emptyRDD to pyspark and fix the issue when calling sum on an empty RDD
(cherry picked from commit 0fc4b96f3e)
Signed-off-by: Andrew Or <andrew@databricks.com>
The history server may show an incorrect App ID for an incomplete application like <App ID>.inprogress. This app info will never disappear even after the app is completed.
![incorrectappinfo](https://cloud.githubusercontent.com/assets/9278199/8156147/2a10fdbe-137d-11e5-9620-c5b61d93e3c1.png)
The cause of the issue is that a log path name is used as the app id when app id cannot be got during replay.
Author: Carson Wang <carson.wang@intel.com>
Closes#6827 from carsonwang/SPARK-8372 and squashes the following commits:
cdbb089 [Carson Wang] Fix code style
3e46b35 [Carson Wang] Update code style
90f5dde [Carson Wang] Add a unit test
d8c9cd0 [Carson Wang] Replaying events only return information when app is started
(cherry picked from commit 2837e06709)
Signed-off-by: Andrew Or <andrew@databricks.com>
externalBlockStoreInitialized is never set to be true, which causes the blocks stored in ExternalBlockStore can not be removed.
Author: Mingfei <mingfei.shi@intel.com>
Closes#6702 from shimingfei/SetTrue and squashes the following commits:
add61d8 [Mingfei] Set externalBlockStoreInitialized to be true, after ExternalBlockStore is initialized
(cherry picked from commit 7ad8c5d869)
Signed-off-by: Andrew Or <andrew@databricks.com>
The problem occurs because the position mask `0xEFFFFFF` is incorrect. It has zero 25th bit, so when capacity grows beyond 2^24, `OpenHashMap` calculates incorrect index of value in `_values` array.
I've also added a size check in `rehash()`, so that it fails instead of reporting invalid item indices.
Author: Vyacheslav Baranov <slavik.baranov@gmail.com>
Closes#6763 from SlavikBaranov/SPARK-8309 and squashes the following commits:
8557445 [Vyacheslav Baranov] Resolved review comments
4d5b954 [Vyacheslav Baranov] Resolved review comments
eaf1e68 [Vyacheslav Baranov] Fixed failing test
f9284fd [Vyacheslav Baranov] Resolved review comments
3920656 [Vyacheslav Baranov] SPARK-8309: Support for more than 12M items in OpenHashMap
(cherry picked from commit c13da20a55)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Safeguard against DOM rewriting.
Author: Andrew Or <andrew@databricks.com>
Closes#6787 from andrewor14/dag-viz-trim and squashes the following commits:
0fb4afe [Andrew Or] Trim input metadata from DOM
(cherry picked from commit 8860405151)
Signed-off-by: Andrew Or <andrew@databricks.com>
IBM Java has an extra method when we do getStackTrace(): this is "getStackTraceImpl", a native method. This causes two tests to fail within "DStreamScopeSuite" when running with IBM Java. Instead of "map" or "filter" being the method names found, "getStackTrace" is returned. This commit addresses such an issue by using dropWhile. Given that our current method is withScope, we look for the next method that isn't ours: we don't care about methods that come before us in the stack trace: e.g. getStackTrace (regardless of how many levels this might go).
IBM:
java.lang.Thread.getStackTraceImpl(Native Method)
java.lang.Thread.getStackTrace(Thread.java:1117)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:104)
Oracle:
PRINTING STACKTRACE!!!
java.lang.Thread.getStackTrace(Thread.java:1552)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:106)
I've tested this with Oracle and IBM Java, no side effects for other tests introduced.
Author: Adam Roberts <aroberts@uk.ibm.com>
Author: a-roberts <aroberts@uk.ibm.com>
Closes#6740 from a-roberts/RDDScopeStackCrawlFix and squashes the following commits:
13ce390 [Adam Roberts] Ensure consistency with String equality checking
a4fc0e0 [a-roberts] Update RDDOperationScope.scala
(cherry picked from commit 19e30b48f3)
Signed-off-by: Andrew Or <andrew@databricks.com>
Read number of threads for RBackend from configuration.
[SPARK-8282] #comment Linking with JIRA
Author: Hossein <hossein@databricks.com>
Closes#6730 from falaki/SPARK-8282 and squashes the following commits:
33b3d98 [Hossein] Documented new config parameter
70f2a9c [Hossein] Fixing import
ec44225 [Hossein] Read number of threads for RBackend from configuration
(cherry picked from commit 30ebf1a233)
Signed-off-by: Andrew Or <andrew@databricks.com>
Just as a safeguard against DOM rewriting.
Author: Andrew Or <andrew@databricks.com>
Closes#6732 from andrewor14/dag-viz-trim and squashes the following commits:
7e9bacb [Andrew Or] [MINOR] [UI] DAG visualization: trim whitespace from input
(cherry picked from commit 0d5892dc72)
Signed-off-by: Andrew Or <andrew@databricks.com>
This was caused by this commit: f271347
This patch does not attempt to fix the root cause of why the `VisibleForTesting` annotation causes a NPE in the shell. We should find a way to fix that separately.
Author: Andrew Or <andrew@databricks.com>
Closes#6711 from andrewor14/fix-spark-shell and squashes the following commits:
bf62ecc [Andrew Or] Prevent NPE in spark-shell
Even with all the efforts to cleanup the temp directories created by
unit tests, Spark leaves a lot of garbage in /tmp after a test run.
This change overrides java.io.tmpdir to place those files under the
build directory instead.
After an sbt full unit test run, I was left with > 400 MB of temp
files. Since they're now under the build dir, it's much easier to
clean them up.
Also make a slight change to a unit test to make it not pollute the
source directory with test data.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#6674 from vanzin/SPARK-8126 and squashes the following commits:
0f8ad41 [Marcelo Vanzin] Make sure tmp dir exists when tests run.
643e916 [Marcelo Vanzin] [MINOR] [BUILD] Use custom temp directory during build.
…moved if dynamic allocation is enabled.
This is a work in progress. This patch ensures that an executor that has cached RDD blocks are not removed,
but makes no attempt to find another executor to remove. This is meant to get some feedback on the current
approach, and if it makes sense then I will look at choosing another executor to remove. No testing has been done either.
Author: Hari Shreedharan <hshreedharan@apache.org>
Closes#6508 from harishreedharan/dymanic-caching and squashes the following commits:
dddf1eb [Hari Shreedharan] Minor configuration description update.
10130e2 [Hari Shreedharan] Fix compile issue.
5417b53 [Hari Shreedharan] Add documentation for new config. Remove block from cachedBlocks when it is dropped.
875916a [Hari Shreedharan] Make some code more readable.
39940ca [Hari Shreedharan] Handle the case where the executor has not yet registered.
90ad711 [Hari Shreedharan] Remove unused imports and unused methods.
063985c [Hari Shreedharan] Send correct message instead of recursively calling same method.
ec2fd7e [Hari Shreedharan] Add file missed in last commit
5d10fad [Hari Shreedharan] Update cached blocks status using local info, rather than doing an RPC.
193af4c [Hari Shreedharan] WIP. Use local state rather than via RPC.
ae932ff [Hari Shreedharan] Fix config param name.
272969d [Hari Shreedharan] Fix seconds to millis bug.
5a1993f [Hari Shreedharan] Add timeout for cache executors. Ignore broadcast blocks while checking if there are cached blocks.
57fefc2 [Hari Shreedharan] [SPARK-7955][Core] Ensure executors with cached RDD blocks are not removed if dynamic allocation is enabled.
(cherry picked from commit 3285a51121)
Signed-off-by: Andrew Or <andrew@databricks.com>
Even with all the efforts to cleanup the temp directories created by
unit tests, Spark leaves a lot of garbage in /tmp after a test run.
This change overrides java.io.tmpdir to place those files under the
build directory instead.
After an sbt full unit test run, I was left with > 400 MB of temp
files. Since they're now under the build dir, it's much easier to
clean them up.
Also make a slight change to a unit test to make it not pollute the
source directory with test data.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#6653 from vanzin/unit-test-tmp and squashes the following commits:
31e2dd5 [Marcelo Vanzin] Fix tests that depend on each other.
aa92944 [Marcelo Vanzin] [minor] [build] Use custom temp directory during build.
(cherry picked from commit b16b5434ff)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Completely trivial but I noticed this wrinkle in a log message today; `$sender` doesn't refer to anything and isn't interpolated here.
Author: Sean Owen <sowen@cloudera.com>
Closes#6650 from srowen/Interpolation and squashes the following commits:
518687a [Sean Owen] Actually interpolate log string
7edb866 [Sean Owen] Trivial: remove unused interpolation var in log message
(cherry picked from commit 3a5c4da473)
Signed-off-by: Reynold Xin <rxin@databricks.com>
The log page should only show desired length of bytes. Currently it shows bytes from the startIndex to the end of the file. The "Next" button on the page is always disabled.
Author: Carson Wang <carson.wang@intel.com>
Closes#6640 from carsonwang/logpage and squashes the following commits:
58cb3fd [Carson Wang] Show correct length of bytes on log page
(cherry picked from commit 63bc0c4430)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
Related to discussion in #6567
cc pwendell srowen -- Let me know if this looks better
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6593 from shivaram/sparkr-pom-cleanup and squashes the following commits:
b282241 [Shivaram Venkataraman] Remove sparkr-docs from release script as well
8f100a5 [Shivaram Venkataraman] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
(cherry picked from commit 3dc005282a)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
If maxTaskFailures is 1, the task set is aborted after 1 task failure. Other documentation and the code supports this reading, I think it's just this comment that was off. It's easy to make this mistake — can you please double-check if I'm correct? Thanks!
Author: Daniel Darabos <darabos.daniel@gmail.com>
Closes#6621 from darabos/patch-2 and squashes the following commits:
dfebdec [Daniel Darabos] Fix comment.
(cherry picked from commit 10ba188087)
Signed-off-by: Sean Owen <sowen@cloudera.com>
This includes the following commits:
original: 9eb222c
hotfix1: 8c99793
hotfix2: a4f2412
scalastyle check: 609c492
---
Original patch #6441
Branch-1.3 patch #6602
Author: Andrew Or <andrew@databricks.com>
Closes#6598 from andrewor14/demarcate-tests-1.4 and squashes the following commits:
4c3c566 [Andrew Or] Merge branch 'branch-1.4' of github.com:apache/spark into demarcate-tests-1.4
e217b78 [Andrew Or] [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike
46d4361 [Andrew Or] Various whitespace changes (minor)
3d9bf04 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
eaa520e [Andrew Or] Fix tests?
b4d93de [Andrew Or] Fix tests
634a777 [Andrew Or] Fix log message
a932e8d [Andrew Or] Fix manual things that cannot be covered through automation
8bc355d [Andrew Or] Add core tests as dependencies in all modules
75d361f [Andrew Or] Introduce base abstract class for all test suites
Author: Ryan Williams <ryan.blake.williams@gmail.com>
Closes#6624 from ryan-williams/execs and squashes the following commits:
b6f71d4 [Ryan Williams] don't attempt to lower number of executors by 0
(cherry picked from commit 51898b5158)
Signed-off-by: Andrew Or <andrew@databricks.com>
The flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite will fail if there are not enough executors up before running the jobs.
This PR adds `JobProgressListener.waitUntilExecutorsUp`. The tests for the cluster mode can use it to wait until the expected executors are up.
Author: zsxwing <zsxwing@gmail.com>
Closes#6546 from zsxwing/SPARK-7989 and squashes the following commits:
5560e09 [zsxwing] Fix a typo
3b69840 [zsxwing] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite
(cherry picked from commit f27134782e)
Signed-off-by: Andrew Or <andrew@databricks.com>
Conflicts:
core/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala
core/src/test/scala/org/apache/spark/scheduler/SparkListenerWithClusterSuite.scala
Some places forget to call `assert` to check the return value of `AsynchronousListenerBus.waitUntilEmpty`. Instead of adding `assert` in these places, I think it's better to make `AsynchronousListenerBus.waitUntilEmpty` throw `TimeoutException`.
Author: zsxwing <zsxwing@gmail.com>
Closes#6550 from zsxwing/SPARK-8001 and squashes the following commits:
607674a [zsxwing] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout
(cherry picked from commit 1d8669f15c)
Signed-off-by: Andrew Or <andrew@databricks.com>
Author: Timothy Chen <tnachen@gmail.com>
Closes#6615 from tnachen/mesos_driver_path and squashes the following commits:
4f47b7c [Timothy Chen] Use the correct base path in mesos driver page.
(cherry picked from commit bfbf12b349)
Signed-off-by: Andrew Or <andrew@databricks.com>
Also use that profile in create-release.sh
cc pwendell -- Note that this means that we need `knitr` and `roxygen` installed on the machines used for building the release. Let me know if you need help with that.
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6567 from shivaram/SPARK-8027 and squashes the following commits:
8dc8ecf [Shivaram Venkataraman] Add maven profile to build R package docs Also use that profile in create-release.sh
(cherry picked from commit cae9306c4f)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
This prevents the spark.jars from being cleared while using `--packages` or `--jars`
cc pwendell davies brkyvz
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6568 from shivaram/SPARK-8028 and squashes the following commits:
3a9cf1f [Shivaram Venkataraman] Use addJar instead of setJars in SparkR This prevents the spark.jars from being cleared
(cherry picked from commit 6b44278ef7)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Sun Rui <rui.sun@intel.com>
Closes#6183 from sun-rui/SPARK-7227 and squashes the following commits:
dd6f5b3 [Sun Rui] Rename readEnv() back to readMap(). Add alias na.omit() for dropna().
41cf725 [Sun Rui] [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.
(cherry picked from commit 46576ab303)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Reynold Xin <rxin@databricks.com>
Closes#6533 from rxin/whitespace-2 and squashes the following commits:
038314c [Reynold Xin] [SPARK-3850] Trim trailing spaces for core.
(cherry picked from commit 74fdc97c72)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
core/src/main/scala/org/apache/spark/storage/TachyonBlockManager.scala
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala
Only parse standalone master url when master url starts with spark://
Author: Timothy Chen <tnachen@gmail.com>
Closes#6517 from tnachen/fix_mesos_client and squashes the following commits:
61a1198 [Timothy Chen] Fix master url parsing in rest submission client.
(cherry picked from commit 78657d53d7)
Signed-off-by: Andrew Or <andrew@databricks.com>
cc JoshRosen
Thanks for noticing this!
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#6509 from brkyvz/sample-perf-reg and squashes the following commits:
497465d [Burak Yavuz] addressed code review
293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using randomSplit
(cherry picked from commit 7ed06c3992)
Signed-off-by: Reynold Xin <rxin@databricks.com>