Even with all the efforts to cleanup the temp directories created by
unit tests, Spark leaves a lot of garbage in /tmp after a test run.
This change overrides java.io.tmpdir to place those files under the
build directory instead.
After an sbt full unit test run, I was left with > 400 MB of temp
files. Since they're now under the build dir, it's much easier to
clean them up.
Also make a slight change to a unit test to make it not pollute the
source directory with test data.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#6674 from vanzin/SPARK-8126 and squashes the following commits:
0f8ad41 [Marcelo Vanzin] Make sure tmp dir exists when tests run.
643e916 [Marcelo Vanzin] [MINOR] [BUILD] Use custom temp directory during build.
…moved if dynamic allocation is enabled.
This is a work in progress. This patch ensures that an executor that has cached RDD blocks are not removed,
but makes no attempt to find another executor to remove. This is meant to get some feedback on the current
approach, and if it makes sense then I will look at choosing another executor to remove. No testing has been done either.
Author: Hari Shreedharan <hshreedharan@apache.org>
Closes#6508 from harishreedharan/dymanic-caching and squashes the following commits:
dddf1eb [Hari Shreedharan] Minor configuration description update.
10130e2 [Hari Shreedharan] Fix compile issue.
5417b53 [Hari Shreedharan] Add documentation for new config. Remove block from cachedBlocks when it is dropped.
875916a [Hari Shreedharan] Make some code more readable.
39940ca [Hari Shreedharan] Handle the case where the executor has not yet registered.
90ad711 [Hari Shreedharan] Remove unused imports and unused methods.
063985c [Hari Shreedharan] Send correct message instead of recursively calling same method.
ec2fd7e [Hari Shreedharan] Add file missed in last commit
5d10fad [Hari Shreedharan] Update cached blocks status using local info, rather than doing an RPC.
193af4c [Hari Shreedharan] WIP. Use local state rather than via RPC.
ae932ff [Hari Shreedharan] Fix config param name.
272969d [Hari Shreedharan] Fix seconds to millis bug.
5a1993f [Hari Shreedharan] Add timeout for cache executors. Ignore broadcast blocks while checking if there are cached blocks.
57fefc2 [Hari Shreedharan] [SPARK-7955][Core] Ensure executors with cached RDD blocks are not removed if dynamic allocation is enabled.
(cherry picked from commit 3285a51121)
Signed-off-by: Andrew Or <andrew@databricks.com>
Even with all the efforts to cleanup the temp directories created by
unit tests, Spark leaves a lot of garbage in /tmp after a test run.
This change overrides java.io.tmpdir to place those files under the
build directory instead.
After an sbt full unit test run, I was left with > 400 MB of temp
files. Since they're now under the build dir, it's much easier to
clean them up.
Also make a slight change to a unit test to make it not pollute the
source directory with test data.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#6653 from vanzin/unit-test-tmp and squashes the following commits:
31e2dd5 [Marcelo Vanzin] Fix tests that depend on each other.
aa92944 [Marcelo Vanzin] [minor] [build] Use custom temp directory during build.
(cherry picked from commit b16b5434ff)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Completely trivial but I noticed this wrinkle in a log message today; `$sender` doesn't refer to anything and isn't interpolated here.
Author: Sean Owen <sowen@cloudera.com>
Closes#6650 from srowen/Interpolation and squashes the following commits:
518687a [Sean Owen] Actually interpolate log string
7edb866 [Sean Owen] Trivial: remove unused interpolation var in log message
(cherry picked from commit 3a5c4da473)
Signed-off-by: Reynold Xin <rxin@databricks.com>
The log page should only show desired length of bytes. Currently it shows bytes from the startIndex to the end of the file. The "Next" button on the page is always disabled.
Author: Carson Wang <carson.wang@intel.com>
Closes#6640 from carsonwang/logpage and squashes the following commits:
58cb3fd [Carson Wang] Show correct length of bytes on log page
(cherry picked from commit 63bc0c4430)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
Related to discussion in #6567
cc pwendell srowen -- Let me know if this looks better
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6593 from shivaram/sparkr-pom-cleanup and squashes the following commits:
b282241 [Shivaram Venkataraman] Remove sparkr-docs from release script as well
8f100a5 [Shivaram Venkataraman] Move man pages creation to install-dev.sh This also helps us get rid of the sparkr-docs maven profile as docs are now built by just using -Psparkr when the roxygen2 package is available
(cherry picked from commit 3dc005282a)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
If maxTaskFailures is 1, the task set is aborted after 1 task failure. Other documentation and the code supports this reading, I think it's just this comment that was off. It's easy to make this mistake — can you please double-check if I'm correct? Thanks!
Author: Daniel Darabos <darabos.daniel@gmail.com>
Closes#6621 from darabos/patch-2 and squashes the following commits:
dfebdec [Daniel Darabos] Fix comment.
(cherry picked from commit 10ba188087)
Signed-off-by: Sean Owen <sowen@cloudera.com>
This includes the following commits:
original: 9eb222c
hotfix1: 8c99793
hotfix2: a4f2412
scalastyle check: 609c492
---
Original patch #6441
Branch-1.3 patch #6602
Author: Andrew Or <andrew@databricks.com>
Closes#6598 from andrewor14/demarcate-tests-1.4 and squashes the following commits:
4c3c566 [Andrew Or] Merge branch 'branch-1.4' of github.com:apache/spark into demarcate-tests-1.4
e217b78 [Andrew Or] [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike
46d4361 [Andrew Or] Various whitespace changes (minor)
3d9bf04 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
eaa520e [Andrew Or] Fix tests?
b4d93de [Andrew Or] Fix tests
634a777 [Andrew Or] Fix log message
a932e8d [Andrew Or] Fix manual things that cannot be covered through automation
8bc355d [Andrew Or] Add core tests as dependencies in all modules
75d361f [Andrew Or] Introduce base abstract class for all test suites
Author: Ryan Williams <ryan.blake.williams@gmail.com>
Closes#6624 from ryan-williams/execs and squashes the following commits:
b6f71d4 [Ryan Williams] don't attempt to lower number of executors by 0
(cherry picked from commit 51898b5158)
Signed-off-by: Andrew Or <andrew@databricks.com>
The flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite will fail if there are not enough executors up before running the jobs.
This PR adds `JobProgressListener.waitUntilExecutorsUp`. The tests for the cluster mode can use it to wait until the expected executors are up.
Author: zsxwing <zsxwing@gmail.com>
Closes#6546 from zsxwing/SPARK-7989 and squashes the following commits:
5560e09 [zsxwing] Fix a typo
3b69840 [zsxwing] Fix flaky tests in ExternalShuffleServiceSuite and SparkListenerWithClusterSuite
(cherry picked from commit f27134782e)
Signed-off-by: Andrew Or <andrew@databricks.com>
Conflicts:
core/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala
core/src/test/scala/org/apache/spark/scheduler/SparkListenerWithClusterSuite.scala
Some places forget to call `assert` to check the return value of `AsynchronousListenerBus.waitUntilEmpty`. Instead of adding `assert` in these places, I think it's better to make `AsynchronousListenerBus.waitUntilEmpty` throw `TimeoutException`.
Author: zsxwing <zsxwing@gmail.com>
Closes#6550 from zsxwing/SPARK-8001 and squashes the following commits:
607674a [zsxwing] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout
(cherry picked from commit 1d8669f15c)
Signed-off-by: Andrew Or <andrew@databricks.com>
Author: Timothy Chen <tnachen@gmail.com>
Closes#6615 from tnachen/mesos_driver_path and squashes the following commits:
4f47b7c [Timothy Chen] Use the correct base path in mesos driver page.
(cherry picked from commit bfbf12b349)
Signed-off-by: Andrew Or <andrew@databricks.com>
Also use that profile in create-release.sh
cc pwendell -- Note that this means that we need `knitr` and `roxygen` installed on the machines used for building the release. Let me know if you need help with that.
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6567 from shivaram/SPARK-8027 and squashes the following commits:
8dc8ecf [Shivaram Venkataraman] Add maven profile to build R package docs Also use that profile in create-release.sh
(cherry picked from commit cae9306c4f)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
This prevents the spark.jars from being cleared while using `--packages` or `--jars`
cc pwendell davies brkyvz
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6568 from shivaram/SPARK-8028 and squashes the following commits:
3a9cf1f [Shivaram Venkataraman] Use addJar instead of setJars in SparkR This prevents the spark.jars from being cleared
(cherry picked from commit 6b44278ef7)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Sun Rui <rui.sun@intel.com>
Closes#6183 from sun-rui/SPARK-7227 and squashes the following commits:
dd6f5b3 [Sun Rui] Rename readEnv() back to readMap(). Add alias na.omit() for dropna().
41cf725 [Sun Rui] [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.
(cherry picked from commit 46576ab303)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Reynold Xin <rxin@databricks.com>
Closes#6533 from rxin/whitespace-2 and squashes the following commits:
038314c [Reynold Xin] [SPARK-3850] Trim trailing spaces for core.
(cherry picked from commit 74fdc97c72)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
core/src/main/scala/org/apache/spark/storage/TachyonBlockManager.scala
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala
Only parse standalone master url when master url starts with spark://
Author: Timothy Chen <tnachen@gmail.com>
Closes#6517 from tnachen/fix_mesos_client and squashes the following commits:
61a1198 [Timothy Chen] Fix master url parsing in rest submission client.
(cherry picked from commit 78657d53d7)
Signed-off-by: Andrew Or <andrew@databricks.com>
cc JoshRosen
Thanks for noticing this!
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#6509 from brkyvz/sample-perf-reg and squashes the following commits:
497465d [Burak Yavuz] addressed code review
293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using randomSplit
(cherry picked from commit 7ed06c3992)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Add alias names for supported cipher suites to the sample SSL configuration.
The IBM JSSE provider reports its cipher suite with an SSL_ prefix, but accepts TLS_ prefixed suite names as an alias. However, Jetty filters the requested ciphers based on the provider's reported supported suites, so the TLS_ versions are never passed through to JSSE causing an SSL handshake failure.
Author: Tim Ellison <t.p.ellison@gmail.com>
Closes#6282 from tellison/SSLFailure and squashes the following commits:
8de8a3e [Tim Ellison] Update SecurityManagerSuite with new expected suite names
96158b2 [Tim Ellison] Update the sample configs to use ciphers that are common to both the Oracle and IBM security providers.
705421b [Tim Ellison] Merge branch 'master' of github.com:tellison/spark into SSLFailure
68b9425 [Tim Ellison] Merge branch 'master' of https://github.com/apache/spark into SSLFailure
b0c35f6 [Tim Ellison] [CORE] Add aliases used for cipher suites in IBM provider
(cherry picked from commit bf46580708)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Shutdown hook for temp directories had priority 100 while SparkContext was 50. So the local root directory was deleted before SparkContext was shutdown. This leads to scary errors on running jobs, at the time of shutdown. This is especially a problem when running streaming examples, where Ctrl-C is the only way to shutdown.
The fix in this PR is to make the temp directory shutdown priority lower than SparkContext, so that the temp dirs are the last thing to get deleted, after the SparkContext has been shut down. Also, the DiskBlockManager shutdown priority is change from default 100 to temp_dir_prio + 1, so that it gets invoked just before all temp dirs are cleared.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#6482 from tdas/SPARK-7930 and squashes the following commits:
d7cbeb5 [Tathagata Das] Removed unnecessary line
1514d0b [Tathagata Das] Fixed shutdown hook priorities
(cherry picked from commit cd3d9a5c0c)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
The existing code rounds down to the nearest percent when computing the proportion
of a task's time that was spent on each phase of execution, and then computes
the scheduler delay proportion as 100 - sum(all other proportions). As a result,
a few extra percent can end up in the scheduler delay. This commit eliminates
the rounding so that the time visualizations correspond properly to the real times.
sarutak If you could take a look at this, that would be great! Not sure if there's a good
reason to round here that I missed.
cc shivaram
Author: Kay Ousterhout <kayousterhout@gmail.com>
Closes#6484 from kayousterhout/SPARK-7932 and squashes the following commits:
1723cc4 [Kay Ousterhout] [SPARK-7932] Fix misleading scheduler delay visualization
(cherry picked from commit 04ddcd4db7)
Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
Switch to the official Pyrolite release from the one published under `org.spark-project`. Thanks irmen for making the releases on Maven Central. We didn't upgrade to 4.6 because we don't have enough time for QA. I excludes `serpent` from its dependencies because we don't use it in Spark.
~~~
[info] +-net.jpountz.lz4:lz4:1.3.0
[info] +-net.razorvine:pyrolite:4.4
[info] +-net.sf.py4j:py4j:0.8.2.1
~~~
davies
Author: Xiangrui Meng <meng@databricks.com>
Closes#6472 from mengxr/SPARK-7926 and squashes the following commits:
7b3c6bf [Xiangrui Meng] use the official Pyrolite release
(cherry picked from commit c45d58c143)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
So we can enable a whitespace enforcement rule in the style checker to save code review time.
Author: Reynold Xin <rxin@databricks.com>
Closes#6473 from rxin/whitespace-core and squashes the following commits:
058195d [Reynold Xin] Fixed tests.
fce11e9 [Reynold Xin] [SPARK-7927] whitespace fixes for core.
(cherry picked from commit 7f7505d8db)
Signed-off-by: Reynold Xin <rxin@databricks.com>