Commit graph

23 commits

Author SHA1 Message Date
Jakub Nowacki b4edafa99b [SPARK-22495] Fix setup of SPARK_HOME variable on Windows
## What changes were proposed in this pull request?

Fixing the way how `SPARK_HOME` is resolved on Windows. While the previous version was working with the built release download, the set of directories changed slightly for the PySpark `pip` or `conda` install. This has been reflected in Linux files in `bin` but not for Windows `cmd` files.

First fix improves the way how the `jars` directory is found, as this was stoping Windows version of `pip/conda` install from working; JARs were not found by on Session/Context setup.

Second fix is adding `find-spark-home.cmd` script, which uses `find_spark_home.py` script, as the Linux version, to resolve `SPARK_HOME`. It is based on `find-spark-home` bash script, though, some operations are done in different order due to the `cmd` script language limitations. If environment variable is set, the Python script `find_spark_home.py` will not be run. The process can fail if Python is not installed, but it will mostly use this way if PySpark is installed via `pip/conda`, thus, there is some Python in the system.

## How was this patch tested?

Tested on local installation.

Author: Jakub Nowacki <j.s.nowacki@gmail.com>

Closes #19370 from jsnowacki/fix_spark_cmds.
2017-11-23 12:47:38 +09:00
Jakub Nowacki c11f24a940 [SPARK-18136] Fix SPARK_JARS_DIR for Python pip install on Windows
## What changes were proposed in this pull request?

Fix for setup of `SPARK_JARS_DIR` on Windows as it looks for `%SPARK_HOME%\RELEASE` file instead of `%SPARK_HOME%\jars` as it should. RELEASE file is not included in the `pip` build of PySpark.

## How was this patch tested?

Local install of PySpark on Anaconda 4.4.0 (Python 3.6.1).

Author: Jakub Nowacki <j.s.nowacki@gmail.com>

Closes #19310 from jsnowacki/master.
2017-09-23 21:04:10 +09:00
Jarrett Meyer b9ad2d1916 [SPARK-20613] Remove excess quotes in Windows executable
## What changes were proposed in this pull request?

Quotes are already added to the RUNNER variable on line 54. There is no need to put quotes on line 67. If you do, you will get an error when launching Spark.

'""C:\Program' is not recognized as an internal or external command, operable program or batch file.

## How was this patch tested?

Tested manually on Windows 10.

Author: Jarrett Meyer <jarrettmeyer@gmail.com>

Closes #17861 from jarrettmeyer/fix-windows-cmd.
2017-05-05 08:30:42 -07:00
Felix Cheung a8877bdbba [SPARK-19237][SPARKR][CORE] On Windows spark-submit should handle when java is not installed
## What changes were proposed in this pull request?

When SparkR is installed as a R package there might not be any java runtime.
If it is not there SparkR's `sparkR.session()` will block waiting for the connection timeout, hanging the R IDE/shell, without any notification or message.

## How was this patch tested?

manually

- [x] need to test on Windows

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16596 from felixcheung/rcheckjava.
2017-03-21 14:24:41 -07:00
Sean Owen 623aae5907 [SPARK-15531][DEPLOY] spark-class tries to use too much memory when running Launcher
## What changes were proposed in this pull request?

Explicitly limit launcher JVM memory to modest 128m

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #13360 from srowen/SPARK-15531.
2016-05-27 11:28:28 -07:00
Mark Grover ff9ae61a3b [SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assembly
## What changes were proposed in this pull request?

Removing references to assembly jar in documentation.
Adding an additional (previously undocumented) usage of spark-submit to run examples.

## How was this patch tested?

Ran spark-submit usage to ensure formatting was fine. Ran examples using SparkSubmit.

Author: Mark Grover <mark@apache.org>

Closes #12365 from markgrover/spark-14601.
2016-04-14 18:51:43 -07:00
Marcelo Vanzin 24d7d2e453 [SPARK-13579][BUILD] Stop building the main Spark assembly.
This change modifies the "assembly/" module to just copy needed
dependencies to its build directory, and modifies the packaging
script to pick those up (and remove duplicate jars packages in the
examples module).

I also made some minor adjustments to dependencies to remove some
test jars from the final packaging, and remove jars that conflict with each
other when packaged separately (e.g. servlet api).

Also note that this change restores guava in applications' classpaths, even
though it's still shaded inside Spark. This is now needed for the Hadoop
libraries that are packaged with Spark, which now are not processed by
the shade plugin.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #11796 from vanzin/SPARK-13579.
2016-04-04 16:52:22 -07:00
Marcelo Vanzin 45f8053be5 [SPARK-13578][CORE] Modify launch scripts to not use assemblies.
Instead of looking for a specially-named assembly, the scripts now will
blindly add all jars under the libs directory to the classpath. This
libs directory is still currently the old assembly dir, so things should
keep working the same way as before until we make more packaging changes.

The only lost feature is the detection of multiple assemblies; I consider
that a minor nicety that only really affects few developers, so it's probably
ok.

Tested locally by running spark-shell; also did some minor Win32 testing
(just made sure spark-shell started).

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #11591 from vanzin/SPARK-13578.
2016-03-14 11:13:26 -07:00
Jon Maurer 2ba9b6a2df [SPARK-11518][DEPLOY, WINDOWS] Handle spaces in Windows command scripts
Author: Jon Maurer <tritab@gmail.com>
Author: Jonathan Maurer <jmaurer@Jonathans-MacBook-Pro.local>

Closes #10789 from tritab/cmd_updates.
2016-02-10 09:54:22 +00:00
Masayoshi TSUZUKI 268c419f15 [SPARK-6435] spark-shell --jars option does not add all jars to classpath
Modified to accept double-quotated args properly in spark-shell.cmd.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>

Closes #5227 from tsudukim/feature/SPARK-6435-2 and squashes the following commits:

ac55787 [Masayoshi TSUZUKI] removed unnecessary argument.
60789a7 [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6435-2
1fee420 [Masayoshi TSUZUKI] fixed test code for escaping '='.
0d4dc41 [Masayoshi TSUZUKI] - escaped comman and semicolon in CommandBuilderUtils.java - added random string to the temporary filename - double-quotation followed by `cmd /c` did not worked properly - no need to escape `=` by `^` - if double-quoted string ended with `\` like classpath, the last `\` is parsed as the escape charactor and the closing `"` didn't work properly
2a332e5 [Masayoshi TSUZUKI] Merge branch 'master' into feature/SPARK-6435-2
04f4291 [Masayoshi TSUZUKI] [SPARK-6435] spark-shell --jars option does not add all jars to classpath
2015-04-28 07:56:36 -04:00
Marcelo Vanzin 9717389365 [SPARK-6890] [core] Fix launcher lib work with SPARK_PREPEND_CLASSES.
The fix for SPARK-6406 broke the case where sub-processes are launched
when SPARK_PREPEND_CLASSES is set, because the code now would only add
the launcher's build directory to the sub-process's classpath instead
of the complete assembly.

This patch fixes the problem by having the launch scripts stash the
assembly's location in an environment variable. This is not the prettiest
solution, but it avoids having to plumb that location all the way through
the Worker code that launches executors. The env variable is always
set by the launch scripts, so users cannot override it.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #5504 from vanzin/SPARK-6890 and squashes the following commits:

7aec921 [Marcelo Vanzin] Fix tests.
ff87a60 [Marcelo Vanzin] Merge branch 'master' into SPARK-6890
31d3ce8 [Marcelo Vanzin] [SPARK-6890] [core] Fix launcher lib work with SPARK_PREPEND_CLASSES.
2015-04-14 18:51:39 -07:00
Masayoshi TSUZUKI 49f38824a4 [SPARK-6673] spark-shell.cmd can't start in Windows even when spark was built
added equivalent script to load-spark-env.sh

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>

Closes #5328 from tsudukim/feature/SPARK-6673 and squashes the following commits:

aaefb19 [Masayoshi TSUZUKI] removed dust.
be3405e [Masayoshi TSUZUKI] [SPARK-6673] spark-shell.cmd can't start in Windows even when spark was built
2015-04-06 10:11:20 +01:00
Nishkam Ravi e3eb393961 [SPARK-6406] Launch Spark using assembly jar instead of a separate launcher jar
Author: Nishkam Ravi <nravi@cloudera.com>
Author: nishkamravi2 <nishkamravi@gmail.com>
Author: nravi <nravi@c1704.halxg.cloudera.com>

Closes #5085 from nishkamravi2/master_nravi and squashes the following commits:

bad4349 [nishkamravi2] Update Main.java
36a6f87 [Nishkam Ravi] Minor changes and bug fixes
b7f4ae7 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
4a45d6a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
458af39 [Nishkam Ravi] Locate the jar using getLocation, obviates the need to pass assembly path as an argument
d9658d6 [Nishkam Ravi] Changes for SPARK-6406
ccdc334 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
3faa7a4 [Nishkam Ravi] Launcher library changes (SPARK-6406)
345206a [Nishkam Ravi] spark-class merge Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
ac58975 [Nishkam Ravi] spark-class changes
06bfeb0 [nishkamravi2] Update spark-class
35af990 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
32c3ab3 [nishkamravi2] Update AbstractCommandBuilder.java
4bd4489 [nishkamravi2] Update AbstractCommandBuilder.java
746f35b [Nishkam Ravi] "hadoop" string in the assembly name should not be mandatory (everywhere else in spark we mandate spark-assembly*hadoop*.jar)
bfe96e0 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
ee902fa [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
d453197 [nishkamravi2] Update NewHadoopRDD.scala
6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
0ce2c32 [nishkamravi2] Update HadoopRDD.scala
f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown.
71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
494d8c0 [nishkamravi2] Update DiskBlockManager.scala
3c5ddba [nishkamravi2] Update DiskBlockManager.scala
f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop
79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
535295a [nishkamravi2] Update TaskSetManager.scala
3e1b616 [Nishkam Ravi] Modify test for maxResultSize
9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0)
5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
636a9ff [nishkamravi2] Update YarnAllocator.scala
8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
5ac2ec1 [Nishkam Ravi] Remove out
dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue
42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
1cf2d1e [nishkamravi2] Update YarnAllocator.scala
ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts)
2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark
2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark
3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
2015-03-29 12:40:37 +01:00
Marcelo Vanzin 517975d89d [SPARK-4924] Add a library for launching Spark jobs programmatically.
This change encapsulates all the logic involved in launching a Spark job
into a small Java library that can be easily embedded into other applications.

The overall goal of this change is twofold, as described in the bug:

- Provide a public API for launching Spark processes. This is a common request
  from users and currently there's no good answer for it.

- Remove a lot of the duplicated code and other coupling that exists in the
  different parts of Spark that deal with launching processes.

A lot of the duplication was due to different code needed to build an
application's classpath (and the bootstrapper needed to run the driver in
certain situations), and also different code needed to parse spark-submit
command line options in different contexts. The change centralizes those
as much as possible so that all code paths can rely on the library for
handling those appropriately.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #3916 from vanzin/SPARK-4924 and squashes the following commits:

18c7e4d [Marcelo Vanzin] Fix make-distribution.sh.
2ce741f [Marcelo Vanzin] Add lots of quotes.
3b28a75 [Marcelo Vanzin] Update new pom.
a1b8af1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
897141f [Marcelo Vanzin] Review feedback.
e2367d2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
28cd35e [Marcelo Vanzin] Remove stale comment.
b1d86b0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
00505f9 [Marcelo Vanzin] Add blurb about new API in the programming guide.
5f4ddcc [Marcelo Vanzin] Better usage messages.
92a9cfb [Marcelo Vanzin] Fix Win32 launcher, usage.
6184c07 [Marcelo Vanzin] Rename field.
4c19196 [Marcelo Vanzin] Update comment.
7e66c18 [Marcelo Vanzin] Fix pyspark tests.
0031a8e [Marcelo Vanzin] Review feedback.
c12d84b [Marcelo Vanzin] Review feedback. And fix spark-submit on Windows.
e2d4d71 [Marcelo Vanzin] Simplify some code used to launch pyspark.
43008a7 [Marcelo Vanzin] Don't make builder extend SparkLauncher.
b4d6912 [Marcelo Vanzin] Use spark-submit script in SparkLauncher.
28b1434 [Marcelo Vanzin] Add a comment.
304333a [Marcelo Vanzin] Fix propagation of properties file arg.
bb67b93 [Marcelo Vanzin] Remove unrelated Yarn change (that is also wrong).
8ec0243 [Marcelo Vanzin] Add missing newline.
95ddfa8 [Marcelo Vanzin] Fix handling of --help for spark-class command builder.
72da7ec [Marcelo Vanzin] Rename SparkClassLauncher.
62978e4 [Marcelo Vanzin] Minor cleanup of Windows code path.
9cd5b44 [Marcelo Vanzin] Make all non-public APIs package-private.
e4c80b6 [Marcelo Vanzin] Reorganize the code so that only SparkLauncher is public.
e50dc5e [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
de81da2 [Marcelo Vanzin] Fix CommandUtils.
86a87bf [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
2061967 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
46d46da [Marcelo Vanzin] Clean up a test and make it more future-proof.
b93692a [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
ad03c48 [Marcelo Vanzin] Revert "Fix a thread-safety issue in "local" mode."
0b509d0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
23aa2a9 [Marcelo Vanzin] Read java-opts from conf dir, not spark home.
7cff919 [Marcelo Vanzin] Javadoc updates.
eae4d8e [Marcelo Vanzin] Fix new unit tests on Windows.
e570fb5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
44cd5f7 [Marcelo Vanzin] Add package-info.java, clean up javadocs.
f7cacff [Marcelo Vanzin] Remove "launch Spark in new thread" feature.
7ed8859 [Marcelo Vanzin] Some more feedback.
54cd4fd [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
61919df [Marcelo Vanzin] Clean leftover debug statement.
aae5897 [Marcelo Vanzin] Use launcher classes instead of jars in non-release mode.
e584fc3 [Marcelo Vanzin] Rework command building a little bit.
525ef5b [Marcelo Vanzin] Rework Unix spark-class to handle argument with newlines.
8ac4e92 [Marcelo Vanzin] Minor test cleanup.
e946a99 [Marcelo Vanzin] Merge PySparkLauncher into SparkSubmitCliLauncher.
c617539 [Marcelo Vanzin] Review feedback round 1.
fc6a3e2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
f26556b [Marcelo Vanzin] Fix a thread-safety issue in "local" mode.
2f4e8b4 [Marcelo Vanzin] Changes needed to make this work with SPARK-4048.
799fc20 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
bb5d324 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
53faef1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
a7936ef [Marcelo Vanzin] Fix pyspark tests.
656374e [Marcelo Vanzin] Mima fixes.
4d511e7 [Marcelo Vanzin] Fix tools search code.
7a01e4a [Marcelo Vanzin] Fix pyspark on Yarn.
1b3f6e9 [Marcelo Vanzin] Call SparkSubmit from spark-class launcher for unknown classes.
25c5ae6 [Marcelo Vanzin] Centralize SparkSubmit command line parsing.
27be98a [Marcelo Vanzin] Modify Spark to use launcher lib.
6f70eea [Marcelo Vanzin] [SPARK-4924] Add a library for launching Spark jobs programatically.
2015-03-11 01:03:01 -07:00
Masayoshi TSUZUKI 358d7ffd01 [SPARK-3775] Not suitable error message in spark-shell.cmd
Modified some sentence of error message in bin\*.cmd.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>

Closes #2640 from tsudukim/feature/SPARK-3775 and squashes the following commits:

3458afb [Masayoshi TSUZUKI] [SPARK-3775] Not suitable error message in spark-shell.cmd
2014-10-03 13:09:48 -07:00
Andrew Or 7557c4cfef [SPARK-3167] Handle special driver configs in Windows
This is an effort to bring the Windows scripts up to speed after recent splashing changes in #1845.

Author: Andrew Or <andrewor14@gmail.com>

Closes #2129 from andrewor14/windows-config and squashes the following commits:

881a8f0 [Andrew Or] Add reference to Windows taskkill
92e6047 [Andrew Or] Update a few comments (minor)
22b1acd [Andrew Or] Fix style again (minor)
afcffea [Andrew Or] Fix style (minor)
72004c2 [Andrew Or] Actually respect --driver-java-options
803218b [Andrew Or] Actually respect SPARK_*_CLASSPATH
eeb34a0 [Andrew Or] Update outdated comment (minor)
35caecc [Andrew Or] In Windows, actually kill Java processes on exit
f97daa2 [Andrew Or] Fix Windows spark shell stdin issue
83ebe60 [Andrew Or] Parse special driver configs in Windows (broken)
2014-08-26 22:52:16 -07:00
Daoyuan Wang f3d65cd0bf [SPARK-3068]remove MaxPermSize option for jvm 1.8
In JVM 1.8.0, MaxPermSize is no longer supported.
In spark `stderr` output, there would be a line of

    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #2011 from adrian-wang/maxpermsize and squashes the following commits:

ef1d660 [Daoyuan Wang] direct get java version in runtime
37db9c1 [Daoyuan Wang] code refine
3c1d554 [Daoyuan Wang] remove MaxPermSize option for jvm 1.8
2014-08-23 08:09:30 -07:00
Matei Zaharia 5af99d7617 SPARK-1879. Increase MaxPermSize since some of our builds have many classes
See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler.

Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m.

Author: Matei Zaharia <matei@databricks.com>

Closes #823 from mateiz/spark-1879 and squashes the following commits:

6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes
2014-05-19 18:42:28 -07:00
Matei Zaharia 7b70a70718 [SPARK-1876] Windows fixes to deal with latest distribution layout changes
- Look for JARs in the right place
- Launch examples the same way as on Unix
- Load datanucleus JARs if they exist
- Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs
- Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was)

Author: Matei Zaharia <matei@databricks.com>

Closes #819 from mateiz/win-fixes and squashes the following commits:

d558f96 [Matei Zaharia] Fix comment
228577b [Matei Zaharia] Review comments
d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly
144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout
2014-05-19 15:02:35 -07:00
Andrew Or 79820fe825 [SPARK-1276] Add a HistoryServer to render persisted UI
The new feature of event logging, introduced in #42, allows the user to persist the details of his/her Spark application to storage, and later replay these events to reconstruct an after-the-fact SparkUI.
Currently, however, a persisted UI can only be rendered through the standalone Master. This greatly limits the use case of this new feature as many people also run Spark on Yarn / Mesos.

This PR introduces a new entity called the HistoryServer, which, given a log directory, keeps track of all completed applications independently of a Spark Master. Unlike Master, the HistoryServer needs not be running while the application is still running. It is relatively light-weight in that it only maintains static information of applications and performs no scheduling.

To quickly test it out, generate event logs with ```spark.eventLog.enabled=true``` and run ```sbin/start-history-server.sh <log-dir-path>```. Your HistoryServer awaits on port 18080.

Comments and feedback are most welcome.

---

A few other changes introduced in this PR include refactoring the WebUI interface, which is beginning to have a lot of duplicate code now that we have added more functionality to it. Two new SparkListenerEvents have been introduced (SparkListenerApplicationStart/End) to keep track of application name and start/finish times. This PR also clarifies the semantics of the ReplayListenerBus introduced in #42.

A potential TODO in the future (not part of this PR) is to render live applications in addition to just completed applications. This is useful when applications fail, a condition that our current HistoryServer does not handle unless the user manually signals application completion (by creating the APPLICATION_COMPLETION file). Handling live applications becomes significantly more challenging, however, because it is now necessary to render the same SparkUI multiple times. To avoid reading the entire log every time, which is inefficient, we must handle reading the log from where we previously left off, but this becomes fairly complicated because we must deal with the arbitrary behavior of each input stream.

Author: Andrew Or <andrewor14@gmail.com>

Closes #204 from andrewor14/master and squashes the following commits:

7b7234c [Andrew Or] Finished -> Completed
b158d98 [Andrew Or] Address Patrick's comments
69d1b41 [Andrew Or] Do not block on posting SparkListenerApplicationEnd
19d5dd0 [Andrew Or] Merge github.com:apache/spark
f7f5bf0 [Andrew Or] Make history server's web UI port a Spark configuration
2dfb494 [Andrew Or] Decouple checking for application completion from replaying
d02dbaa [Andrew Or] Expose Spark version and include it in event logs
2282300 [Andrew Or] Add documentation for the HistoryServer
567474a [Andrew Or] Merge github.com:apache/spark
6edf052 [Andrew Or] Merge github.com:apache/spark
19e1fb4 [Andrew Or] Address Thomas' comments
248cb3d [Andrew Or] Limit number of live applications + add configurability
a3598de [Andrew Or] Do not close file system with ReplayBus + fix bind address
bc46fc8 [Andrew Or] Merge github.com:apache/spark
e2f4ff9 [Andrew Or] Merge github.com:apache/spark
050419e [Andrew Or] Merge github.com:apache/spark
81b568b [Andrew Or] Fix strange error messages...
0670743 [Andrew Or] Decouple page rendering from loading files from disk
1b2f391 [Andrew Or] Minor changes
a9eae7e [Andrew Or] Merge branch 'master' of github.com:apache/spark
d5154da [Andrew Or] Styling and comments
5dbfbb4 [Andrew Or] Merge branch 'master' of github.com:apache/spark
60bc6d5 [Andrew Or] First complete implementation of HistoryServer (only for finished apps)
7584418 [Andrew Or] Report application start/end times to HistoryServer
8aac163 [Andrew Or] Add basic application table
c086bd5 [Andrew Or] Add HistoryServer and scripts ++ Refactor WebUI interface
2014-04-10 10:39:34 -07:00
Aaron Davidson 52834d761b SPARK-929: Fully deprecate usage of SPARK_MEM
(Continued from old repo, prior discussion at https://github.com/apache/incubator-spark/pull/615)

This patch cements our deprecation of the SPARK_MEM environment variable by replacing it with three more specialized variables:
SPARK_DAEMON_MEMORY, SPARK_EXECUTOR_MEMORY, and SPARK_DRIVER_MEMORY

The creation of the latter two variables means that we can safely set driver/job memory without accidentally setting the executor memory. Neither is public.

SPARK_EXECUTOR_MEMORY is only used by the Mesos scheduler (and set within SparkContext). The proper way of configuring executor memory is through the "spark.executor.memory" property.

SPARK_DRIVER_MEMORY is the new way of specifying the amount of memory run by jobs launched by spark-class, without possibly affecting executor memory.

Other memory considerations:
- The repl's memory can be set through the "--drivermem" command-line option, which really just sets SPARK_DRIVER_MEMORY.
- run-example doesn't use spark-class, so the only way to modify examples' memory is actually an unusual use of SPARK_JAVA_OPTS (which is normally overriden in all cases by spark-class).

This patch also fixes a lurking bug where spark-shell misused spark-class (the first argument is supposed to be the main class name, not java options), as well as a bug in the Windows spark-class2.cmd. I have not yet tested this patch on either Windows or Mesos, however.

Author: Aaron Davidson <aaron@databricks.com>

Closes #99 from aarondav/sparkmem and squashes the following commits:

9df4c68 [Aaron Davidson] SPARK-929: Fully deprecate usage of SPARK_MEM
2014-03-09 11:08:39 -07:00
Qiuzhuang Lian 4e510b0b0c Fixed Window spark shell launch script error.
JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
2014-01-16 16:09:10 +08:00
Prashant Sharma 74ba97fcf7 sbin/spark-class* -> bin/spark-class* 2014-01-03 15:08:01 +05:30
Renamed from sbin/spark-class2.cmd (Browse further)