Commit graph

1237 commits

Author SHA1 Message Date
Yanbo Liang 72427c3e11 [SPARK-13429][MLLIB] Unify Logistic Regression convergence tolerance of ML & MLlib
## What changes were proposed in this pull request?
In order to provide better and consistent result, let's change the default value of MLlib ```LogisticRegressionWithLBFGS convergenceTol``` from ```1E-4``` to ```1E-6``` which will be equal to ML ```LogisticRegression```.
cc dbtsai
## How was the this patch tested?
unit tests

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11299 from yanboliang/spark-13429.
2016-02-22 23:37:09 -08:00
Narine Kokhlikyan 33ef3aa7ea [SPARK-13295][ ML, MLLIB ] AFTSurvivalRegression.AFTAggregator improvements - avoid creating new instances of arrays/vectors for each record
As also mentioned/marked by TODO in AFTAggregator.AFTAggregator.add(data: AFTPoint) method a new array is being created for intercept value and it is being concatenated
with another array which contains the betas, the resulted Array is being converted into a Dense vector which in its turn is being converted into breeze vector.
This is expensive and not necessarily beautiful.

I've tried to solve above mentioned problem by simple algebraic decompositions - keeping and treating intercept independently.

Please let me know what do you think and if you have any questions.

Thanks,
Narine

Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>

Closes #11179 from NarineK/survivaloptim.
2016-02-22 17:26:32 -08:00
Yanbo Liang 40e6d40fe7 [SPARK-13334][ML] ML KMeansModel / BisectingKMeansModel / QuantileDiscretizer should set parent
ML ```KMeansModel / BisectingKMeansModel / QuantileDiscretizer``` should set parent.

cc mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11214 from yanboliang/spark-13334.
2016-02-22 12:59:50 +02:00
Bryan Cutler e298ac91e3 [SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format
Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent.  This is for the fpm and recommendation modules.

Closes #10602
Closes #10897

Author: Bryan Cutler <cutlerb@gmail.com>
Author: somideshmukh <somilde@us.ibm.com>

Closes #11186 from BryanCutler/param-desc-consistent-fpmrecc-SPARK-12632.
2016-02-22 12:48:37 +02:00
Dongjoon Hyun 024482bf51 [MINOR][DOCS] Fix all typos in markdown files of doc and similar patterns in other comments
## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.
2016-02-22 09:52:07 +00:00
Yong Gang Cao ef1047fca7 [SPARK-12153][SPARK-7617][MLLIB] add support of arbitrary length sentence and other tuning for Word2Vec
add support of arbitrary length sentence by using the nature representation of sentences in the input.

add new similarity functions and add normalization option for distances in synonym finding
add new accessor for internal structure(the vocabulary and wordindex) for convenience

need instructions about how to set value for the Since annotation for newly added public functions. 1.5.3?

jira link: https://issues.apache.org/jira/browse/SPARK-12153

Author: Yong Gang Cao <ygcao@amazon.com>
Author: Yong-Gang Cao <ygcao@users.noreply.github.com>

Closes #10152 from ygcao/improvementForSentenceBoundary.
2016-02-22 09:47:36 +00:00
Yanbo Liang 8a4ed78869 [SPARK-13379][MLLIB] Fix MLlib LogisticRegressionWithLBFGS set regularization incorrectly
## What changes were proposed in this pull request?
Fix MLlib LogisticRegressionWithLBFGS regularization map as:
```SquaredL2Updater``` -> ```elasticNetParam = 0.0```
```L1Updater``` -> ```elasticNetParam = 1.0```
cc dbtsai
## How was the this patch tested?
unit tests

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11258 from yanboliang/spark-13379.
2016-02-21 20:20:41 -08:00
Xiangrui Meng 0088b252bf [MINOR][MLLIB] fix mllib compile warnings
This PR fixes some warnings found by `build/sbt mllib/test:compile`.

Author: Xiangrui Meng <meng@databricks.com>

Closes #11227 from mengxr/fix-mllib-warnings-201602.
2016-02-17 18:56:19 -08:00
BenFradet 00c72d27bf [SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative filtering in general
This documents the implementation of ALS in `spark.ml` with example code in scala, java and python.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10411 from BenFradet/SPARK-12247.
2016-02-16 13:03:28 +00:00
seddonm1 cbeb006f23 [SPARK-13097][ML] Binarizer allowing Double AND Vector input types
This enhancement extends the existing SparkML Binarizer [SPARK-5891] to allow Vector in addition to the existing Double input column type.

A use case for this enhancement is for when a user wants to Binarize many similar feature columns at once using the same threshold value (for example a binary threshold applied to many pixels in an image).

This contribution is my original work and I license the work to the project under the project's open source license.

viirya mengxr

Author: seddonm1 <seddonm1@gmail.com>

Closes #10976 from seddonm1/master.
2016-02-15 20:15:27 -08:00
Liang-Chi Hsieh e3441e3f68 [SPARK-12363][MLLIB] Remove setRun and fix PowerIterationClustering failed test
JIRA: https://issues.apache.org/jira/browse/SPARK-12363

This issue is pointed by yanboliang. When `setRuns` is removed from PowerIterationClustering, one of the tests will be failed. I found that some `dstAttr`s of the normalized graph are not correct values but 0.0. By setting `TripletFields.All` in `mapTriplets` it can work.

Author: Liang-Chi Hsieh <viirya@gmail.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #10539 from viirya/fix-poweriter.
2016-02-13 15:56:20 -08:00
Earthson Lu 5f1c359069 [SPARK-12746][ML] ArrayType(_, true) should also accept ArrayType(_, false)
https://issues.apache.org/jira/browse/SPARK-12746

Author: Earthson Lu <Earthson.Lu@gmail.com>

Closes #10697 from Earthson/SPARK-12746.
2016-02-11 18:31:46 -08:00
Liu Xiang a5257048d7 [SPARK-12765][ML][COUNTVECTORIZER] fix CountVectorizer.transform's lost transformSchema
https://issues.apache.org/jira/browse/SPARK-12765

Author: Liu Xiang <lxmtlab@gmail.com>

Closes #10720 from sloth2012/sloth.
2016-02-11 17:28:37 -08:00
Yu ISHIKAWA 574571c870 [SPARK-11515][ML] QuantileDiscretizer should take random seed
cc jkbradley

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #9535 from yu-iskw/SPARK-11515.
2016-02-11 15:05:34 -08:00
Yu ISHIKAWA efb65e09bc [SPARK-13265][ML] Refactoring of basic ML import/export for other file system besides HDFS
jkbradley I tried to improve the function to export a model. When I tried to export a model to S3 under Spark 1.6, we couldn't do that. So, it should offer S3 besides HDFS. Can you review it when you have time? Thanks!

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #11151 from yu-iskw/SPARK-13265.
2016-02-11 15:00:23 -08:00
Sasaki Toru c2f21d8898 [SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template
In spark-env.sh.template, there are multi-byte characters, this PR will remove it.

Author: Sasaki Toru <sasakitoa@nttdata.co.jp>

Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.
2016-02-11 09:30:36 +00:00
Liang-Chi Hsieh 9267bc68fa [SPARK-10524][ML] Use the soft prediction to order categories' bins
JIRA: https://issues.apache.org/jira/browse/SPARK-10524

Currently we use the hard prediction (`ImpurityCalculator.predict`) to order categories' bins. But we should use the soft prediction.

Author: Liang-Chi Hsieh <viirya@gmail.com>
Author: Liang-Chi Hsieh <viirya@appier.com>
Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8734 from viirya/dt-soft-centroids.
2016-02-09 17:10:55 -08:00
Holden Karau ce83fe9756 [SPARK-13201][SPARK-13200] Deprecation warning cleanups: KMeans & MFDataGenerator
KMeans:
Make a private non-deprecated version of setRuns API so that we can call it from the PythonAPI without deprecation warnings in our own build. Also use it internally when being called from train. Add a logWarning for non-1 values

MFDataGenerator:
Apparently we are calling round on an integer which now in Scala 2.11 results in a warning (it didn't make any sense before either). Figure out if this is a mistake we can just remove or if we got the types wrong somewhere.

I put these two together since they are both deprecation fixes in MLlib and pretty small, but I can split them up if we would prefer it that way.

Author: Holden Karau <holden@us.ibm.com>

Closes #11112 from holdenk/SPARK-13201-non-deprecated-setRuns-SPARK-mathround-integer.
2016-02-09 08:47:28 +00:00
Gary King bc8890b357 [SPARK-13132][MLLIB] cache standardization param value in LogisticRegression
cache the value of the standardization Param in LogisticRegression, rather than re-fetching it from the ParamMap for every index and every optimization step in the quasi-newton optimizer

also, fix Param#toString to cache the stringified representation, rather than re-interpolating it on every call, so any other implementations that have similar repeated access patterns will see a benefit.

this change improves training times for one of my test sets from ~7m30s to ~4m30s

Author: Gary King <gary@idibon.com>

Closes #11027 from idigary/spark-13132-optimize-logistic-regression.
2016-02-07 09:13:28 +00:00
Imran Younus 0557146619 [SPARK-12732][ML] bug fix in linear regression train
Fixed the bug in linear regression train for the case when the target variable is constant. The two cases for `fitIntercept=true` or `fitIntercept=false` should be treated differently.

Author: Imran Younus <iyounus@us.ibm.com>

Closes #10702 from iyounus/SPARK-12732_bug_fix_in_linear_regression_train.
2016-02-02 20:38:53 -08:00
Grzegorz Chilkiewicz b1835d7272 [SPARK-12711][ML] ML StopWordsRemover does not protect itself from column name duplication
Fixes problem and verifies fix by test suite.
Also - adds optional parameter: nullable (Boolean) to: SchemaUtils.appendColumn
and deduplicates SchemaUtils.appendColumn functions.

Author: Grzegorz Chilkiewicz <grzegorz.chilkiewicz@codilime.com>

Closes #10741 from grzegorz-chilkiewicz/master.
2016-02-02 11:16:24 -08:00
Bryan Cutler cba1d6b659 [SPARK-12631][PYSPARK][DOC] PySpark clustering parameter desc to consistent format
Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent.  This is for the clustering module.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #10610 from BryanCutler/param-desc-consistent-cluster-SPARK-12631.
2016-02-02 10:50:22 -08:00
Josh Rosen 289373b28c [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version
This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).

The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).

After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10608 from JoshRosen/SPARK-6363.
2016-01-30 00:20:28 -08:00
Yanbo Liang df78a934a0 [SPARK-9835][ML] Implement IterativelyReweightedLeastSquares solver
Implement ```IterativelyReweightedLeastSquares``` solver for GLM. I consider it as a solver rather than estimator, it only used internal so I keep it ```private[ml]```.
There are two limitations in the current implementation compared with R:
* It can not support ```Tuple``` as response for ```Binomial``` family, such as the following code:
```
glm( cbind(using, notUsing) ~  age + education + wantsMore , family = binomial)
```
* It does not support ```offset```.

Because I considered that ```RFormula``` did not support ```Tuple``` as label and ```offset``` keyword, so I simplified the implementation. But to add support for these two functions is not very hard, I can do it in follow-up PR if it is necessary. Meanwhile, we can also add R-like statistic summary for IRLS.
The implementation refers R, [statsmodels](https://github.com/statsmodels/statsmodels) and [sparkGLM](https://github.com/AlteryxLabs/sparkGLM).
Please focus on the main structure and overpass minor issues/docs that I will update later. Any comments and opinions will be appreciated.

cc mengxr jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10639 from yanboliang/spark-9835.
2016-01-28 14:29:47 -08:00
Holden Karau b72611f20a [SPARK-7780][MLLIB] intercept in logisticregressionwith lbfgs should not be regularized
The intercept in Logistic Regression represents a prior on categories which should not be regularized. In MLlib, the regularization is handled through Updater, and the Updater penalizes all the components without excluding the intercept which resulting poor training accuracy with regularization.
The new implementation in ML framework handles this properly, and we should call the implementation in ML from MLlib since majority of users are still using MLlib api.
Note that both of them are doing feature scalings to improve the convergence, and the only difference is ML version doesn't regularize the intercept. As a result, when lambda is zero, they will converge to the same solution.

Previously partially reviewed at https://github.com/apache/spark/pull/6386#issuecomment-168781424 re-opening for dbtsai to review.

Author: Holden Karau <holden@us.ibm.com>
Author: Holden Karau <holden@pigscanfly.ca>

Closes #10788 from holdenk/SPARK-7780-intercept-in-logisticregressionwithLBFGS-should-not-be-regularized.
2016-01-26 17:59:05 -08:00
Jeff Zhang 1dac964c1b [SPARK-11622][MLLIB] Make LibSVMRelation extends HadoopFsRelation and…
… Add LibSVMOutputWriter

The behavior of LibSVMRelation is not changed except adding LibSVMOutputWriter
* Partition is still not supported
* Multiple input paths is not supported

Author: Jeff Zhang <zjffdu@apache.org>

Closes #9595 from zjffdu/SPARK-11622.
2016-01-26 17:31:19 -08:00
Xusen Yin fbf7623d49 [SPARK-12952] EMLDAOptimizer initialize() should return EMLDAOptimizer other than its parent class
https://issues.apache.org/jira/browse/SPARK-12952

Author: Xusen Yin <yinxusen@gmail.com>

Closes #10863 from yinxusen/SPARK-12952.
2016-01-26 13:18:01 -08:00
Xusen Yin ae47ba718a [SPARK-12834] Change ser/de of JavaArray and JavaList
https://issues.apache.org/jira/browse/SPARK-12834

We use `SerDe.dumps()` to serialize `JavaArray` and `JavaList` in `PythonMLLibAPI`, then deserialize them with `PickleSerializer` in Python side. However, there is no need to transform them in such an inefficient way. Instead of it, we can use type conversion to convert them, e.g. `list(JavaArray)` or `list(JavaList)`. What's more, there is an issue to Ser/De Scala Array as I said in https://issues.apache.org/jira/browse/SPARK-12780

Author: Xusen Yin <yinxusen@gmail.com>

Closes #10772 from yinxusen/SPARK-12834.
2016-01-25 22:41:52 -08:00
Yanbo Liang dcae355c64 [SPARK-12905][ML][PYSPARK] PCAModel return eigenvalues for PySpark
```PCAModel```  can output ```explainedVariance``` at Python side.

cc mengxr srowen

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10830 from yanboliang/spark-12905.
2016-01-25 13:54:21 -08:00
Yanbo Liang dd2325d9a7 [SPARK-11965][ML][DOC] Update user guide for RFormula feature interactions
Update user guide for RFormula feature interactions. Meanwhile we also update other new features such as supporting string label in Spark 1.6.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10222 from yanboliang/spark-11965.
2016-01-25 11:52:26 -08:00
Shixiong Zhu bc1babd63d [SPARK-7997][CORE] Remove Akka from Spark Core and Streaming
- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
- Remove HttpFileServer
- Remove Akka configs from SparkConf and SSLOptions
- Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult`  depends on it.
- Update comments and docs

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10854 from zsxwing/remove-akka.
2016-01-22 21:20:04 -08:00
DB Tsai b4574e387d [SPARK-12908][ML] Add warning message for LogisticRegression for potential converge issue
When all labels are the same, it's a dangerous ground for LogisticRegression without intercept to converge. GLMNET doesn't support this case, and will just exit. GLM can train, but will have a warning message saying the algorithm doesn't converge.

Author: DB Tsai <dbt@netflix.com>

Closes #10862 from dbtsai/add-tests.
2016-01-21 17:24:48 -08:00
Takahashi Hiroshi e3727c409f [SPARK-10263][ML] Add @Since annotation to ml.param and ml.*
Add Since annotations to ml.param and ml.*

Author: Takahashi Hiroshi <takahashi.hiroshi@lab.ntt.co.jp>
Author: Hiroshi Takahashi <takahashi.hiroshi@lab.ntt.co.jp>

Closes #8935 from taishi-oss/issue10263.
2016-01-20 11:44:04 -08:00
Imran Younus 9753835cf3 [SPARK-12230][ML] WeightedLeastSquares.fit() should handle division by zero properly if standard deviation of target variable is zero.
This fixes the behavior of WeightedLeastSquars.fit() when the standard deviation of the target variable is zero. If the fitIntercept is true, there is no need to train.

Author: Imran Younus <iyounus@us.ibm.com>

Closes #10274 from iyounus/SPARK-12230_bug_fix_in_weighted_least_squares.
2016-01-20 11:16:59 -08:00
Yu ISHIKAWA 9376ae723e [SPARK-6519][ML] Add spark.ml API for bisecting k-means
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #9604 from yu-iskw/SPARK-6519.
2016-01-20 10:48:10 -08:00
BenFradet f6f7ca9d2e [SPARK-9716][ML] BinaryClassificationEvaluator should accept Double prediction column
This PR aims to allow the prediction column of `BinaryClassificationEvaluator` to be of double type.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10472 from BenFradet/SPARK-9716.
2016-01-19 14:59:20 -08:00
Feynman Liang 2388de5191 [SPARK-12804][ML] Fix LogisticRegression with FitIntercept on all same label training data
CC jkbradley mengxr dbtsai

Author: Feynman Liang <feynman.liang@gmail.com>

Closes #10743 from feynmanliang/SPARK-12804.
2016-01-19 11:08:52 -08:00
Holden Karau 0ddba6d88f [SPARK-11944][PYSPARK][MLLIB] python mllib.clustering.bisecting k means
From the coverage issues for 1.6 : Add Python API for mllib.clustering.BisectingKMeans.

Author: Holden Karau <holden@us.ibm.com>

Closes #10150 from holdenk/SPARK-11937-python-api-coverage-SPARK-11944-python-mllib.clustering.BisectingKMeans.
2016-01-19 10:15:54 -08:00
Wojciech Jurczyk ebd9ce0f1f [MLLIB] Fix CholeskyDecomposition assertion's message
Change assertion's message so it's consistent with the code. The old message says that the invoked method was lapack.dports, where in fact it was lapack.dppsv method.

Author: Wojciech Jurczyk <wojtek.jurczyk@gmail.com>

Closes #10818 from wjur/wjur/rename_error_message.
2016-01-19 09:36:45 +00:00
Eric Liang 5e492e9d5b [SPARK-12346][ML] Missing attribute names in GLM for vector-type features
Currently `summary()` fails on a GLM model fitted over a vector feature missing ML attrs, since the output feature attrs will also have no name. We can avoid this situation by forcing `VectorAssembler` to make up suitable names when inputs are missing names.

cc mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #10323 from ericl/spark-12346.
2016-01-18 12:50:58 -08:00
Tommy YU 233d6cee96 [SPARK-10264][DOCUMENTATION] Added @Since to ml.recomendation
I create new pr since original pr long time no update.
Please help to review.

srowen

Author: Tommy YU <tummyyu@163.com>

Closes #10756 from Wenpei/add_since_to_recomm.
2016-01-18 13:46:14 +00:00
Reynold Xin fe7246fea6 [SPARK-12830] Java style: disallow trailing whitespaces.
Author: Reynold Xin <rxin@databricks.com>

Closes #10764 from rxin/SPARK-12830.
2016-01-14 23:33:45 -08:00
Yuhao Yang 021dafc6a0 [SPARK-12026][MLLIB] ChiSqTest gets slower and slower over time when number of features is large
jira: https://issues.apache.org/jira/browse/SPARK-12026

The issue is valid as features.toArray.view.zipWithIndex.slice(startCol, endCol) becomes slower as startCol gets larger.

I tested on local and the change can improve the performance and the running time was stable.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #10146 from hhbyyh/chiSq.
2016-01-13 17:43:27 -08:00
Sean Owen c48f2a3a5f [SPARK-7615][MLLIB] MLLIB Word2Vec wordVectors divided by Euclidean Norm equals to zero
Cosine similarity with 0 vector should be 0

Related to https://github.com/apache/spark/pull/10152

Author: Sean Owen <sowen@cloudera.com>

Closes #10696 from srowen/SPARK-7615.
2016-01-12 11:50:33 +00:00
Yuhao Yang bbea88852c [SPARK-10809][MLLIB] Single-document topicDistributions method for LocalLDAModel
jira: https://issues.apache.org/jira/browse/SPARK-10809

We could provide a single-document topicDistributions method for LocalLDAModel to allow for quick queries which avoid RDD operations. Currently, the user must use an RDD of documents.

add some missing assert too.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #9484 from hhbyyh/ldaTopicPre.
2016-01-11 14:55:44 -08:00
Yuhao Yang 4f8eefa36b [SPARK-12685][MLLIB] word2vec trainWordsCount gets overflow
jira: https://issues.apache.org/jira/browse/SPARK-12685
the log of `word2vec` reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
`alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))`

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #10627 from hhbyyh/w2voverflow.
2016-01-11 14:48:35 -08:00
Yanbo Liang ee4ee02b86 [SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support single instance predict/predictSoft
PySpark MLlib ```GaussianMixtureModel``` should support single instance ```predict/predictSoft``` just like Scala do.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10552 from yanboliang/spark-12603.
2016-01-11 14:43:25 -08:00
Marcelo Vanzin 6439a82503 [SPARK-3873][BUILD] Enable import ordering error checking.
Turn import ordering violations into build errors, plus a few adjustments
to account for how the checker behaves. I'm a little on the fence about
whether the existing code is right, but it's easier to appease the checker
than to discuss what's the more correct order here.

Plus a few fixes to imports that cropped in since my recent cleanups.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #10612 from vanzin/SPARK-3873-enable.
2016-01-10 20:04:50 -08:00
Kousuke Saruta e5904bb5e7 [SPARK-12692][BUILD][MLLIB] Scala style: Fix the style violation (Space before "," or ":")
Fix the style violation (space before , and :).
This PR is a followup for #10643.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #10684 from sarutak/SPARK-12692-followup-mllib.
2016-01-10 12:38:57 -08:00
Sean Owen b9c8353378 [SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 edition
Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs.

Author: Sean Owen <sowen@cloudera.com>

Closes #10570 from srowen/SPARK-12618.
2016-01-08 17:47:44 +00:00