[SPARK-11920][ML][DOC] ML LinearRegression should use correct dataset in examples and user guide doc

ML ```LinearRegression``` use ```data/mllib/sample_libsvm_data.txt``` as dataset in examples and user guide doc, but it's actually classification dataset rather than regression dataset. We should use ```data/mllib/sample_linear_regression_data.txt``` instead.
The deeper causes is that ```LinearRegression``` with "normal" solver can not solve this dataset correctly, may be due to the ill condition and unreasonable label. This issue has been reported at [SPARK-11918](https://issues.apache.org/jira/browse/SPARK-11918).
It will confuse users if they run the example code but get exception, so we should make this change which can clearly illustrate the usage of ```LinearRegression``` algorithm.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9905 from yanboliang/spark-11920.
This commit is contained in:
Yanbo Liang 2015-11-23 11:51:29 -08:00 committed by Joseph K. Bradley
parent 5231cd5aca
commit 98d7ec7df4
3 changed files with 5 additions and 3 deletions

View file

@ -37,7 +37,7 @@ public class JavaLinearRegressionWithElasticNetExample {
// $example on$
// Load training data
DataFrame training = sqlContext.read().format("libsvm")
.load("data/mllib/sample_libsvm_data.txt");
.load("data/mllib/sample_linear_regression_data.txt");
LinearRegression lr = new LinearRegression()
.setMaxIter(10)

View file

@ -29,7 +29,8 @@ if __name__ == "__main__":
# $example on$
# Load training data
training = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
training = sqlContext.read.format("libsvm")\
.load("data/mllib/sample_linear_regression_data.txt")
lr = LinearRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)

View file

@ -33,7 +33,8 @@ object LinearRegressionWithElasticNetExample {
// $example on$
// Load training data
val training = sqlCtx.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
val training = sqlCtx.read.format("libsvm")
.load("data/mllib/sample_linear_regression_data.txt")
val lr = new LinearRegression()
.setMaxIter(10)