diff --git a/docs/mllib-decision-tree.md b/docs/mllib-decision-tree.md index 0e753b8dd0..ec13b81f85 100644 --- a/docs/mllib-decision-tree.md +++ b/docs/mllib-decision-tree.md @@ -91,7 +91,7 @@ For a categorical feature with `$M$` possible values (categories), one could com `$2^{M-1}-1$` split candidates. For binary (0/1) classification and regression, we can reduce the number of split candidates to `$M-1$` by ordering the categorical feature values by the average label. (See Section 9.2.4 in -[Elements of Statistical Machine Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/) for +[Elements of Statistical Machine Learning](https://web.stanford.edu/~hastie/ElemStatLearn/) for details.) For example, for a binary classification problem with one categorical feature with three categories A, B and C whose corresponding proportions of label 1 are 0.2, 0.6 and 0.4, the categorical features are ordered as A, C, B. The two split candidates are A \| C, B diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md index 19ec7c1e0a..382cbfd530 100644 --- a/docs/running-on-mesos.md +++ b/docs/running-on-mesos.md @@ -47,7 +47,7 @@ To install Apache Mesos from source, follow these steps: 1. Download a Mesos release from a [mirror](http://www.apache.org/dyn/closer.lua/mesos/{{site.MESOS_VERSION}}/) -2. Follow the Mesos [Getting Started](http://mesos.apache.org/gettingstarted) page for compiling and +2. Follow the Mesos [Getting Started](http://mesos.apache.org/getting-started) page for compiling and installing Mesos **Note:** If you want to run Mesos without installing it into the default paths on your system @@ -159,7 +159,7 @@ By setting the Mesos proxy config property (requires mesos version >= 1.4), `--c If you like to run the `MesosClusterDispatcher` with Marathon, you need to run the `MesosClusterDispatcher` in the foreground (i.e: `bin/spark-class org.apache.spark.deploy.mesos.MesosClusterDispatcher`). Note that the `MesosClusterDispatcher` not yet supports multiple instances for HA. The `MesosClusterDispatcher` also supports writing recovery state into Zookeeper. This will allow the `MesosClusterDispatcher` to be able to recover all submitted and running containers on relaunch. In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env by configuring `spark.deploy.recoveryMode` and related spark.deploy.zookeeper.* configurations. -For more information about these configurations please refer to the configurations [doc](configurations.html#deploy). +For more information about these configurations please refer to the configurations [doc](configuration.html#deploy). You can also specify any additional jars required by the `MesosClusterDispatcher` in the classpath by setting the environment variable SPARK_DAEMON_CLASSPATH in spark-env. diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index f51c5cc38f..8fa643abf1 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -364,7 +364,7 @@ By default, standalone scheduling clusters are resilient to Worker failures (ins Utilizing ZooKeeper to provide leader election and some state storage, you can launch multiple Masters in your cluster connected to the same ZooKeeper instance. One will be elected "leader" and the others will remain in standby mode. If the current leader dies, another Master will be elected, recover the old Master's state, and then resume scheduling. The entire recovery process (from the time the first leader goes down) should take between 1 and 2 minutes. Note that this delay only affects scheduling _new_ applications -- applications that were already running during Master failover are unaffected. -Learn more about getting started with ZooKeeper [here](http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html). +Learn more about getting started with ZooKeeper [here](http://zookeeper.apache.org/doc/current/zookeeperStarted.html). **Configuration** diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index b76be9132d..f02f46236e 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -501,6 +501,7 @@ To load a CSV file you can use: + ### Run SQL on files directly Instead of using read API to load a file into DataFrame and query it, you can also query that diff --git a/docs/tuning.md b/docs/tuning.md index 7d5f97a02f..fc27713f28 100644 --- a/docs/tuning.md +++ b/docs/tuning.md @@ -219,7 +219,7 @@ temporary objects created during task execution. Some steps which may be useful * Try the G1GC garbage collector with `-XX:+UseG1GC`. It can improve performance in some situations where garbage collection is a bottleneck. Note that with large executor heap sizes, it may be important to - increase the [G1 region size](https://blogs.oracle.com/g1gc/entry/g1_gc_tuning_a_case) + increase the [G1 region size](http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html) with `-XX:G1HeapRegionSize` * As an example, if your task is reading data from HDFS, the amount of memory used by the task can be estimated using