When we run FlumeStreamSuite on Jenkins, sometimes we get error like as follows.
sbt.ForkMain$ForkError: The code passed to eventually never returned normally. Attempted 52 times over 10.094849836 seconds. Last failure message: Error connecting to localhost/127.0.0.1:23456.
at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
at org.apache.spark.streaming.flume.FlumeStreamSuite.writeAndVerify(FlumeStreamSuite.scala:116)
at org.apache.spark.streaming.flume.FlumeStreamSuite.org$apache$spark$streaming$flume$FlumeStreamSuite$$testFlumeStream(FlumeStreamSuite.scala:74)
at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply$mcV$sp(FlumeStreamSuite.scala:66)
at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66)
at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66)
at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
This error is caused by check-then-act logic when it find free-port .
/** Find a free port */
private def findFreePort(): Int = {
Utils.startServiceOnPort(23456, (trialPort: Int) => {
val socket = new ServerSocket(trialPort)
socket.close()
(null, trialPort)
}, conf)._2
}
Removing the check-then-act is not easy but we can reduce the chance of having the error by choosing random value for initial port instead of 23456.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes#4337 from sarutak/SPARK-5559 and squashes the following commits:
16f109f [Kousuke Saruta] Added `require` to Utils#startServiceOnPort
c39d8b6 [Kousuke Saruta] Merge branch 'SPARK-5559' of github.com:sarutak/spark into SPARK-5559
1610ba2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
33357e3 [Kousuke Saruta] Changed "findFreePort" method in MQTTStreamSuite and FlumeStreamSuite so that it can choose valid random port
a9029fe [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
9489ef9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
8212e42 [Kousuke Saruta] Modified default port used in FlumeStreamSuite from 23456 to random value
....
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes#5143 from vanzin/SPARK-6473 and squashes the following commits:
a2e5e2d [Marcelo Vanzin] [SPARK-6473] [core] Do not try to figure out Scala version if not needed.
As for "notebook --pylab inline" is not supported any more, update the related documentation for this.
Author: Cong Yue <yuecong1104@gmail.com>
Closes#5111 from yuecong/patch-1 and squashes the following commits:
872df76 [Cong Yue] Update the command to use IPython notebook
This moves the MIMA checks to before the full Spark test suite such that, if new PR's fail the MIMA check, they will return much faster having not run the entire test suite. This is preferable to the current scenario where a user would have to wait until the entire test suite completes before realizing it failed on a MIMA check in which case, once the MIMA issues are fixed, the user would have to resubmit and rerun the full test suite again.
Author: Brennon York <brennon.york@capitalone.com>
Closes#5145 from brennonyork/SPARK-6477 and squashes the following commits:
12b0aee [Brennon York] updated to put the mima checks before the spark test suite
In `CheckAnalysis`, `Filter` and `Aggregate` are checked in separate case clauses, thus never hit those clauses for unresolved operators and missing input attributes.
This PR also removes the `prettyString` call when generating error message for missing input attributes. Because result of `prettyString` doesn't contain expression ID, and may give confusing messages like
> resolved attributes a missing from a
cc rxin
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5129)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#5129 from liancheng/spark-6452 and squashes the following commits:
52cdc69 [Cheng Lian] Addresses comments
029f9bd [Cheng Lian] Checks for missing attributes and unresolved operator for all types of operator
One more thing if this PR is considered to be OK - it might make sense to add extra .jdbc() API's that take Properties to SQLContext.
Author: Volodymyr Lyubinets <vlyubin@gmail.com>
Closes#4859 from vlyubin/jdbcProperties and squashes the following commits:
7a8cfda [Volodymyr Lyubinets] Support jdbc connection properties in OPTIONS part of the query
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes#5118 from MechCoder/spark-6308 and squashes the following commits:
6c8ffab [MechCoder] Add test for simpleString
b966242 [MechCoder] [SPARK-6308] [MLlib][Sql] VectorUDT is displayed as vecto in dtypes
https://github.com/apache/spark/pull/5082
/cc liancheng
Author: Yadong Qi <qiyadong2010@gmail.com>
Closes#5132 from watermen/sql-missingInput-new and squashes the following commits:
1e5bdc5 [Yadong Qi] Check the missingInput simply
Author: q00251598 <qiyadong@huawei.com>
Closes#5082 from watermen/sql-missingInput and squashes the following commits:
25766b9 [q00251598] Check the missingInput simply
This PR might have some issues with #3732 ,
and this would have merge conflicts with #3820 so the review can be delayed till that 2 were merged.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#3822 from adrian-wang/parquetdate and squashes the following commits:
2c5d54d [Daoyuan Wang] add a test case
faef887 [Daoyuan Wang] parquet support for primitive date
97e9080 [Daoyuan Wang] parquet support for date type
Author: vinodkc <vinod.kc.in@gmail.com>
Closes#5112 from vinodkc/spark_1.3_doc_fixes and squashes the following commits:
2c6aee6 [vinodkc] Spark 1.3 doc fixes
Changes the Tachyon client version from 0.5 to 0.6 in spark core and distribution script.
New dependencies in Tachyon 0.6.0 include
commons-codec:commons-codec:jar:1.5:compile
io.netty:netty-all:jar:4.0.23.Final:compile
These are already in spark core.
Author: Calvin Jia <jia.calvin@gmail.com>
Closes#4867 from calvinjia/upgrade_tachyon_0.6.0 and squashes the following commits:
eed9230 [Calvin Jia] Update tachyon version to 0.6.1.
11907b3 [Calvin Jia] Use TachyonURI for tachyon paths instead of strings.
71bf441 [Calvin Jia] Upgrade Tachyon client version to 0.6.0.
Author: Kamil Smuga <smugakamil@gmail.com>
Author: stderr <smugakamil@gmail.com>
Closes#5120 from kamilsmuga/master and squashes the following commits:
fee3281 [Kamil Smuga] more python api links fixed for docs
13240cb [Kamil Smuga] resolved merge conflicts with upstream/master
6649b3b [Kamil Smuga] fix broken docs links to Python API
92f03d7 [stderr] Fix links to pyspark api
- Moved Suites from o.a.s.s.mesos to o.a.s.s.cluster.mesos
Author: Jongyoul Lee <jongyoul@gmail.com>
Closes#5126 from jongyoul/SPARK-6453 and squashes the following commits:
4f24a3e [Jongyoul Lee] [SPARK-6453][Mesos] Some Mesos*Suite have a different package with their classes - Fixed imports orders
8ab149d [Jongyoul Lee] [SPARK-6453][Mesos] Some Mesos*Suite have a different package with their classes - Moved Suites from o.a.s.s.mesos to o.a.s.s.cluster.mesos
Correct some typos. Correct a mistake in lib/PageRank.scala. The first PageRank implementation uses standalone Graph interface, but the second uses Pregel interface. It may mislead the code viewers.
Author: Hangchen Yu <yuhc@gitcafe.com>
Closes#5128 from yuhc/master and squashes the following commits:
53e5432 [Hangchen Yu] Merge branch 'master' of https://github.com/yuhc/spark
67b77b5 [Hangchen Yu] [SPARK-6455] [docs] Correct some mistakes and typos
206f2dc [Hangchen Yu] Correct some mistakes and typos.
This helped me to debug a parse error that was due to the event log format changing recently.
Author: Ryan Williams <ryan.blake.williams@gmail.com>
Closes#5122 from ryan-williams/histerror and squashes the following commits:
5831656 [Ryan Williams] line length
c3742ae [Ryan Williams] Make history server log parse exceptions
Author: Reynold Xin <rxin@databricks.com>
Closes#5108 from rxin/hive-public-type and squashes the following commits:
a320328 [Reynold Xin] [SPARK-6428][SQL] Added explicit type for all public methods for Hive module.
This PR creates a trait `DataTypeParser` used to parse data types. This trait aims to be single place to provide the functionality of parsing data types' string representation. It is currently mixed in with `DDLParser` and `SqlParser`. It is also used to parse the data type for `DataFrame.cast` and to convert Hive metastore's data type string back to a `DataType`.
JIRA: https://issues.apache.org/jira/browse/SPARK-6250
Author: Yin Huai <yhuai@databricks.com>
Closes#5078 from yhuai/ddlKeywords and squashes the following commits:
0e66097 [Yin Huai] Special handle struct<>.
fea6012 [Yin Huai] Style.
c9733fb [Yin Huai] Create a trait to parse data types.
SELECT sum('a'), avg('a'), variance('a'), std('a') FROM src;
Should give output as
0.0 NULL NULL NULL
This fixes hive udaf_number_format.q
Author: Venkata Ramana G <ramana.gollamudihuawei.com>
Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
Closes#4466 from gvramana/sum_fix and squashes the following commits:
42e14d1 [Venkata Ramana Gollamudi] Added comments
39415c0 [Venkata Ramana Gollamudi] Handled the partitioned Sum expression scenario
df66515 [Venkata Ramana Gollamudi] code style fix
4be2606 [Venkata Ramana Gollamudi] Add udaf_number_format to whitelist and golden answer
330fd64 [Venkata Ramana Gollamudi] fix sum function for all null data
Because of no statistics override, in spute of super class say 'LeafNode must override'.
fix issue
[SPARK-5320: Joins on simple table created using select gives error](https://issues.apache.org/jira/browse/SPARK-5320)
Author: x1- <viva008@gmail.com>
Closes#5105 from x1-/SPARK-5320 and squashes the following commits:
e561aac [x1-] Add statistics method at NoRelation (override super).
When using "CREATE TEMPORARY TABLE AS SELECT" to create JSON table, we first delete the path file or directory and then generate a new directory with the same name. But if only read permission was granted, the delete failed.
Here we just throwing an error message to let users know what happened.
ParquetRelation2 may also hit this problem. I think to restrict JSONRelation and ParquetRelation2 must base on directory is more reasonable for access control. Maybe I can do it in follow up works.
Author: Yanbo Liang <ybliang8@gmail.com>
Author: Yanbo Liang <yanbohappy@gmail.com>
Closes#4610 from yanboliang/jsonInsertImprovements and squashes the following commits:
c387fce [Yanbo Liang] fix typos
42d7fb6 [Yanbo Liang] add unittest & fix output format
46f0d9d [Yanbo Liang] Update JSONRelation.scala
e2df8d5 [Yanbo Liang] check path exisit when write
79f7040 [Yanbo Liang] Update JSONRelation.scala
e4bc229 [Yanbo Liang] Update JSONRelation.scala
5a42d83 [Yanbo Liang] JSONRelation CTAS should check if delete is successful
When writing Parquet files, Spark 1.1.x persists the schema string into Parquet metadata with the result of `StructType.toString`, which was then deprecated in Spark 1.2 by a schema string in JSON format. But we still need to take the old schema format into account while reading Parquet files.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5034)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#5034 from liancheng/spark-6315 and squashes the following commits:
a182f58 [Cheng Lian] Adds a regression test
b9c6dbe [Cheng Lian] Also tries the case class string parser while reading Parquet schema
Do the same check as #4610 for ParquetRelation2.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#5107 from yanboliang/spark-5821-parquet and squashes the following commits:
7092c8d [Yanbo Liang] ParquetRelation2 CTAS should check if delete is successful
Added evaluateEachIteration to allow the user to manually extract the error for each iteration of GradientBoosting. The internal optimisation can be dealt with later.
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes#4906 from MechCoder/spark-6025 and squashes the following commits:
67146ab [MechCoder] Minor
352001f [MechCoder] Minor
6e8aa10 [MechCoder] Made the following changes Used mapPartition instead of map Refactored computeError and unpersisted broadcast variables
bc99ac6 [MechCoder] Refactor the method and stuff
dbda033 [MechCoder] [SPARK-6025] Add helper method evaluateEachIteration to extract learning curve
Also implemented equals/hashCode when they are missing.
This is done in order to enable automatic public method type checking.
Author: Reynold Xin <rxin@databricks.com>
Closes#5104 from rxin/sql-hashcode-explicittype and squashes the following commits:
ffce6f3 [Reynold Xin] Code review feedback.
8b36733 [Reynold Xin] [SPARK-6428][SQL] Added explicit type for all public methods.
Weight parameters must be initialized correctly even when numpy array is passed as initial weights.
Author: lewuathe <lewuathe@me.com>
Closes#5101 from Lewuathe/SPARK-6421 and squashes the following commits:
7795201 [lewuathe] Fix lint-python errors
21d4fe3 [lewuathe] Fix init logic of weights
Utilities to serialize and deserialize Matrices in MLlib
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes#5048 from MechCoder/spark-6309 and squashes the following commits:
05dc6f2 [MechCoder] Hashcode and organize imports
16d5d47 [MechCoder] Test some more
6e67020 [MechCoder] TST: Test using Array conversion instead of equals
7fa7a2c [MechCoder] [SPARK-6309] [SQL] [MLlib] Implement MatrixUDT
- Fixed calculateTotalMemory to use spark.mesos.executor.memoryOverhead
- Added testCase
Author: Jongyoul Lee <jongyoul@gmail.com>
Closes#5099 from jongyoul/SPARK-6423 and squashes the following commits:
6747fce [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Changed a description of spark.mesos.executor.memoryOverhead
475a7c8 [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Fit the import rules
453c5a2 [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Fixed calculateTotalMemory to use spark.mesos.executor.memoryOverhead - Added testCase
Add checkpiontInterval to ALS to prevent:
1. StackOverflow exceptions caused by long lineage,
2. large shuffle files generated during iterations,
3. slow recovery when some node fail.
srowen coderxiang
Author: Xiangrui Meng <meng@databricks.com>
Closes#5076 from mengxr/SPARK-5955 and squashes the following commits:
df56791 [Xiangrui Meng] update impl to reuse code
29affcb [Xiangrui Meng] do not materialize factors in implicit
20d3f7f [Xiangrui Meng] add checkpointInterval to ALS
This PR implements two functions
- `topByKey(num: Int): RDD[(K, Array[V])]` finds the top-k values for each key in a pair RDD. This can be used, e.g., in computing top recommendations.
- `takeOrderedByKey(num: Int): RDD[(K, Array[V])] ` does the opposite of `topByKey`
The `sorted` is used here as the `toArray` method of the PriorityQueue does not return a necessarily sorted array.
Author: Shuo Xiang <shuoxiangpub@gmail.com>
Closes#5075 from coderxiang/topByKey and squashes the following commits:
1611c37 [Shuo Xiang] code clean up
6f565c0 [Shuo Xiang] naming
a80e0ec [Shuo Xiang] typo and warning
82dded9 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into topByKey
d202745 [Shuo Xiang] move to MLPairRDDFunctions
901b0af [Shuo Xiang] style check
70c6e35 [Shuo Xiang] remove takeOrderedByKey, update doc and test
0895c17 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into topByKey
b10e325 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into topByKey
debccad [Shuo Xiang] topByKey
For Python's linear models, weights and intercept are stored in Python.
This PR implements Python's linear models sava/load functions which do the same thing as scala.
It can also make model import/export cross languages.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#5016 from yanboliang/spark-6095 and squashes the following commits:
d9bb824 [Yanbo Liang] fix python style
b3813ca [Yanbo Liang] linear model save/load for Python reuse the Scala implementation
...R
https://issues.apache.org/jira/browse/SPARK-6426
Author: WangTaoTheTonic <wangtao111@huawei.com>
Closes#5103 from WangTaoTheTonic/SPARK-6426 and squashes the following commits:
e6dd78d [WangTaoTheTonic] User could also point the yarn cluster config directory via YARN_CONF_DIR
The docs for the `sample` method were insufficient, now less so.
Author: mbonaci <mbonaci@gmail.com>
Closes#5097 from mbonaci/master and squashes the following commits:
a6a9d97 [mbonaci] [SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample method
I want to add a checker to turn public type checking on, since future pull requests can accidentally expose a non-public type. This is the first cleanup task.
Author: Reynold Xin <rxin@databricks.com>
Closes#5102 from rxin/mllib-hashcode-publicmethodtypes and squashes the following commits:
617f19e [Reynold Xin] Fixed Scala compilation error.
52bc2d5 [Reynold Xin] [MLlib] Added explicit type for public methods and implemented hashCode when equals is defined.
Use `Utils.createTempDir()` to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify
Author: Sean Owen <sowen@cloudera.com>
Closes#5029 from srowen/SPARK-6338 and squashes the following commits:
27b740a [Sean Owen] Fix hive-thriftserver tests that don't expect an existing dir
4a212fa [Sean Owen] Standardize a bit more temp dir management
9004081 [Sean Owen] Revert some added recursive-delete calls
57609e4 [Sean Owen] Use Utils.createTempDir() to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify
Bump default Hadoop version to 2.2.0. (This is already the dependency version reported by published Maven artifacts.) See JIRA for further discussion.
Author: Sean Owen <sowen@cloudera.com>
Closes#5027 from srowen/SPARK-5134 and squashes the following commits:
acbee14 [Sean Owen] Bump default Hadoop version to 2.2.0. (This is already the dependency version reported by published Maven artifacts.)
- Made TaskState.isFailed for handling TASK_LOST and TASK_ERROR and synchronizing CoarseMesosSchedulerBackend and MesosSchedulerBackend
- This is related #5000
Author: Jongyoul Lee <jongyoul@gmail.com>
Closes#5088 from jongyoul/SPARK-6286-1 and squashes the following commits:
4f2362f [Jongyoul Lee] [SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR - Fixed scalastyle
ac4336a [Jongyoul Lee] [SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR - Made TaskState.isFailed for handling TASK_LOST and TASK_ERROR and synchronizing CoarseMesosSchedulerBackend and MesosSchedulerBackend
I was reading Executor just now and found that some latest changes introduced some weird code path with too much monadic chaining and unnecessary fields. I cleaned it up a bit, and also tightened up the visibility of various fields/methods. Also added some inline documentation to help understand this code better.
Author: Reynold Xin <rxin@databricks.com>
Closes#4850 from rxin/executor and squashes the following commits:
866fc60 [Reynold Xin] Code review feedback.
020efbb [Reynold Xin] Tighten up field/method visibility in Executor and made some code more clear to read.
This PR expands the Python lint checks so that they check for obvious compilation errors in our Python code.
For example:
```
$ ./dev/lint-python
Python lint checks failed.
Compiling ./ec2/spark_ec2.py ...
File "./ec2/spark_ec2.py", line 618
return (master_nodes,, slave_nodes)
^
SyntaxError: invalid syntax
./ec2/spark_ec2.py:618:25: E231 missing whitespace after ','
./ec2/spark_ec2.py:1117:101: E501 line too long (102 > 100 characters)
```
This PR also bumps up the version of `pep8`. It ignores new types of checks introduced by that version bump while fixing problems missed by the older version of `pep8` we were using.
Author: Nicholas Chammas <nicholas.chammas@gmail.com>
Closes#4941 from nchammas/compile-spark-ec2 and squashes the following commits:
75e31d8 [Nicholas Chammas] upgrade pep8 + check compile
b33651c [Nicholas Chammas] PEP8 line length
We define and update `visitedStages` in `DAGScheduler.stageDependsOn`, but never read it. So we can safely remove it.
Author: Wenchen Fan <cloud0fan@outlook.com>
Closes#5086 from cloud-fan/minor and squashes the following commits:
24663ea [Wenchen Fan] remove un-used variable
Built a simple framework with a `dev/tests` directory to house all pull request related tests. I've moved the two original tests (`pr_merge_ability` and `pr_public_classes`) into the new `dev/tests` directory and tested to the best of my ability. At this point I need to test against Jenkins actually running the new `run-tests-jenkins` script to ensure things aren't broken down the path.
Author: Brennon York <brennon.york@capitalone.com>
Closes#5072 from brennonyork/SPARK-5313 and squashes the following commits:
8ae990c [Brennon York] added dev/run-tests back, removed echo
5db4ed4 [Brennon York] removed the git checkout
1b50050 [Brennon York] adding echos to see what jenkins is seeing
b823959 [Brennon York] removed run-tests to further test the public_classes pr test
2b9ce12 [Brennon York] added the dev/run-tests call back in
ffd49c0 [Brennon York] remove -c from bash as that was removing the trailing args
735d615 [Brennon York] removed the actual dev/run-tests command to further test jenkins
d579662 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-5313
aa48029 [Brennon York] removed echo lines for testing jenkins
24cd965 [Brennon York] added test output to check within jenkins to verify
3a38e73 [Brennon York] removed the temporary read
9c881ff [Brennon York] updated test suite
183b7ee [Brennon York] added documentation on how to create tests
0bc2efe [Brennon York] ensure each test starts on the current pr branch
1743378 [Brennon York] added tests in test suite
abd7430 [Brennon York] updated to include test suite
GLM toString prints out intercept, numFeatures.
For LogisticRegression and SVM model, toString also prints out numClasses, threshold.
GLM toDebugString prints out the whole weights, intercept.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#5038 from yanboliang/spark-6291 and squashes the following commits:
2f578b0 [Yanbo Liang] code format
78b33f2 [Yanbo Liang] fix typos
1e8a023 [Yanbo Liang] GLM toString & toDebugString
Specifically, when calling JavaPairRDD.combineByKey(), there is a new
six-parameter method that exposes the map-side-combine boolean as the
fifth parameter and the serializer as the sixth parameter.
Author: mcheah <mcheah@palantir.com>
Closes#4634 from mccheah/pair-rdd-map-side-combine and squashes the following commits:
5c58319 [mcheah] Fixing compiler errors.
3ce7deb [mcheah] Addressing style and documentation comments.
7455c7a [mcheah] Allowing Java combineByKey to specify Serializer as well.
6ddd729 [mcheah] [SPARK-5843] Allowing map-side combine to be specified in Java.