SPARK-6454 [DOCS] Fix links to pyspark api
Author: Kamil Smuga <smugakamil@gmail.com> Author: stderr <smugakamil@gmail.com> Closes #5120 from kamilsmuga/master and squashes the following commits: fee3281 [Kamil Smuga] more python api links fixed for docs 13240cb [Kamil Smuga] resolved merge conflicts with upstream/master 6649b3b [Kamil Smuga] fix broken docs links to Python API 92f03d7 [stderr] Fix links to pyspark api
This commit is contained in:
parent
adb2ff752f
commit
6ef48632fb
|
@ -78,13 +78,13 @@ MLlib recognizes the following types as dense vectors:
|
|||
|
||||
and the following as sparse vectors:
|
||||
|
||||
* MLlib's [`SparseVector`](api/python/pyspark.mllib.linalg.SparseVector-class.html).
|
||||
* MLlib's [`SparseVector`](api/python/pyspark.mllib.html#pyspark.mllib.linalg.SparseVector).
|
||||
* SciPy's
|
||||
[`csc_matrix`](http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html#scipy.sparse.csc_matrix)
|
||||
with a single column
|
||||
|
||||
We recommend using NumPy arrays over lists for efficiency, and using the factory methods implemented
|
||||
in [`Vectors`](api/python/pyspark.mllib.linalg.Vectors-class.html) to create sparse vectors.
|
||||
in [`Vectors`](api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vector) to create sparse vectors.
|
||||
|
||||
{% highlight python %}
|
||||
import numpy as np
|
||||
|
@ -151,7 +151,7 @@ LabeledPoint neg = new LabeledPoint(1.0, Vectors.sparse(3, new int[] {0, 2}, new
|
|||
<div data-lang="python" markdown="1">
|
||||
|
||||
A labeled point is represented by
|
||||
[`LabeledPoint`](api/python/pyspark.mllib.regression.LabeledPoint-class.html).
|
||||
[`LabeledPoint`](api/python/pyspark.mllib.html#pyspark.mllib.regression.LabeledPoint).
|
||||
|
||||
{% highlight python %}
|
||||
from pyspark.mllib.linalg import SparseVector
|
||||
|
@ -211,7 +211,7 @@ JavaRDD<LabeledPoint> examples =
|
|||
</div>
|
||||
|
||||
<div data-lang="python" markdown="1">
|
||||
[`MLUtils.loadLibSVMFile`](api/python/pyspark.mllib.util.MLUtils-class.html) reads training
|
||||
[`MLUtils.loadLibSVMFile`](api/python/pyspark.mllib.html#pyspark.mllib.util.MLUtils) reads training
|
||||
examples stored in LIBSVM format.
|
||||
|
||||
{% highlight python %}
|
||||
|
|
|
@ -106,11 +106,11 @@ NaiveBayesModel sameModel = NaiveBayesModel.load(sc.sc(), "myModelPath");
|
|||
|
||||
<div data-lang="python" markdown="1">
|
||||
|
||||
[NaiveBayes](api/python/pyspark.mllib.classification.NaiveBayes-class.html) implements multinomial
|
||||
[NaiveBayes](api/python/pyspark.mllib.html#pyspark.mllib.classification.NaiveBayes) implements multinomial
|
||||
naive Bayes. It takes an RDD of
|
||||
[LabeledPoint](api/python/pyspark.mllib.regression.LabeledPoint-class.html) and an optionally
|
||||
[LabeledPoint](api/python/pyspark.mllib.html#pyspark.mllib.regression.LabeledPoint) and an optionally
|
||||
smoothing parameter `lambda` as input, and output a
|
||||
[NaiveBayesModel](api/python/pyspark.mllib.classification.NaiveBayesModel-class.html), which can be
|
||||
[NaiveBayesModel](api/python/pyspark.mllib.html#pyspark.mllib.classification.NaiveBayesModel), which can be
|
||||
used for evaluation and prediction.
|
||||
|
||||
Note that the Python API does not yet support model save/load but will in the future.
|
||||
|
|
|
@ -81,8 +81,8 @@ System.out.println(summary.numNonzeros()); // number of nonzeros in each column
|
|||
</div>
|
||||
|
||||
<div data-lang="python" markdown="1">
|
||||
[`colStats()`](api/python/pyspark.mllib.stat.Statistics-class.html#colStats) returns an instance of
|
||||
[`MultivariateStatisticalSummary`](api/python/pyspark.mllib.stat.MultivariateStatisticalSummary-class.html),
|
||||
[`colStats()`](api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics.colStats) returns an instance of
|
||||
[`MultivariateStatisticalSummary`](api/python/pyspark.mllib.html#pyspark.mllib.stat.MultivariateStatisticalSummary),
|
||||
which contains the column-wise max, min, mean, variance, and number of nonzeros, as well as the
|
||||
total count.
|
||||
|
||||
|
@ -169,7 +169,7 @@ Matrix correlMatrix = Statistics.corr(data.rdd(), "pearson");
|
|||
</div>
|
||||
|
||||
<div data-lang="python" markdown="1">
|
||||
[`Statistics`](api/python/pyspark.mllib.stat.Statistics-class.html) provides methods to
|
||||
[`Statistics`](api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics) provides methods to
|
||||
calculate correlations between series. Depending on the type of input, two `RDD[Double]`s or
|
||||
an `RDD[Vector]`, the output will be a `Double` or the correlation `Matrix` respectively.
|
||||
|
||||
|
@ -258,7 +258,7 @@ JavaPairRDD<K, V> exactSample = data.sampleByKeyExact(false, fractions);
|
|||
{% endhighlight %}
|
||||
</div>
|
||||
<div data-lang="python" markdown="1">
|
||||
[`sampleByKey()`](api/python/pyspark.rdd.RDD-class.html#sampleByKey) allows users to
|
||||
[`sampleByKey()`](api/python/pyspark.html#pyspark.RDD.sampleByKey) allows users to
|
||||
sample approximately $\lceil f_k \cdot n_k \rceil \, \forall k \in K$ items, where $f_k$ is the
|
||||
desired fraction for key $k$, $n_k$ is the number of key-value pairs for key $k$, and $K$ is the
|
||||
set of keys.
|
||||
|
@ -476,7 +476,7 @@ JavaDoubleRDD v = u.map(
|
|||
</div>
|
||||
|
||||
<div data-lang="python" markdown="1">
|
||||
[`RandomRDDs`](api/python/pyspark.mllib.random.RandomRDDs-class.html) provides factory
|
||||
[`RandomRDDs`](api/python/pyspark.mllib.html#pyspark.mllib.random.RandomRDDs) provides factory
|
||||
methods to generate random double RDDs or vector RDDs.
|
||||
The following example generates a random double RDD, whose values follows the standard normal
|
||||
distribution `N(0, 1)`, and then map it to `N(1, 4)`.
|
||||
|
|
|
@ -142,8 +142,8 @@ JavaSparkContext sc = new JavaSparkContext(conf);
|
|||
|
||||
<div data-lang="python" markdown="1">
|
||||
|
||||
The first thing a Spark program must do is to create a [SparkContext](api/python/pyspark.context.SparkContext-class.html) object, which tells Spark
|
||||
how to access a cluster. To create a `SparkContext` you first need to build a [SparkConf](api/python/pyspark.conf.SparkConf-class.html) object
|
||||
The first thing a Spark program must do is to create a [SparkContext](api/python/pyspark.html#pyspark.SparkContext) object, which tells Spark
|
||||
how to access a cluster. To create a `SparkContext` you first need to build a [SparkConf](api/python/pyspark.html#pyspark.SparkConf) object
|
||||
that contains information about your application.
|
||||
|
||||
{% highlight python %}
|
||||
|
@ -912,7 +912,7 @@ The following table lists some of the common transformations supported by Spark.
|
|||
RDD API doc
|
||||
([Scala](api/scala/index.html#org.apache.spark.rdd.RDD),
|
||||
[Java](api/java/index.html?org/apache/spark/api/java/JavaRDD.html),
|
||||
[Python](api/python/pyspark.rdd.RDD-class.html))
|
||||
[Python](api/python/pyspark.html#pyspark.RDD))
|
||||
and pair RDD functions doc
|
||||
([Scala](api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions),
|
||||
[Java](api/java/index.html?org/apache/spark/api/java/JavaPairRDD.html))
|
||||
|
@ -1025,7 +1025,7 @@ The following table lists some of the common actions supported by Spark. Refer t
|
|||
RDD API doc
|
||||
([Scala](api/scala/index.html#org.apache.spark.rdd.RDD),
|
||||
[Java](api/java/index.html?org/apache/spark/api/java/JavaRDD.html),
|
||||
[Python](api/python/pyspark.rdd.RDD-class.html))
|
||||
[Python](api/python/pyspark.html#pyspark.RDD))
|
||||
and pair RDD functions doc
|
||||
([Scala](api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions),
|
||||
[Java](api/java/index.html?org/apache/spark/api/java/JavaPairRDD.html))
|
||||
|
@ -1105,7 +1105,7 @@ replicate it across nodes, or store it off-heap in [Tachyon](http://tachyon-proj
|
|||
These levels are set by passing a
|
||||
`StorageLevel` object ([Scala](api/scala/index.html#org.apache.spark.storage.StorageLevel),
|
||||
[Java](api/java/index.html?org/apache/spark/storage/StorageLevel.html),
|
||||
[Python](api/python/pyspark.storagelevel.StorageLevel-class.html))
|
||||
[Python](api/python/pyspark.html#pyspark.StorageLevel))
|
||||
to `persist()`. The `cache()` method is a shorthand for using the default storage level,
|
||||
which is `StorageLevel.MEMORY_ONLY` (store deserialized objects in memory). The full set of
|
||||
storage levels is:
|
||||
|
@ -1374,7 +1374,7 @@ scala> accum.value
|
|||
{% endhighlight %}
|
||||
|
||||
While this code used the built-in support for accumulators of type Int, programmers can also
|
||||
create their own types by subclassing [AccumulatorParam](api/python/pyspark.accumulators.AccumulatorParam-class.html).
|
||||
create their own types by subclassing [AccumulatorParam](api/python/pyspark.html#pyspark.AccumulatorParam).
|
||||
The AccumulatorParam interface has two methods: `zero` for providing a "zero value" for your data
|
||||
type, and `addInPlace` for adding two values together. For example, supposing we had a `Vector` class
|
||||
representing mathematical vectors, we could write:
|
||||
|
|
|
@ -56,7 +56,7 @@ SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
|
|||
<div data-lang="python" markdown="1">
|
||||
|
||||
The entry point into all relational functionality in Spark is the
|
||||
[`SQLContext`](api/python/pyspark.sql.SQLContext-class.html) class, or one
|
||||
[`SQLContext`](api/python/pyspark.sql.html#pyspark.sql.SQLContext) class, or one
|
||||
of its decedents. To create a basic `SQLContext`, all you need is a SparkContext.
|
||||
|
||||
{% highlight python %}
|
||||
|
|
Loading…
Reference in a new issue