spark-instrumented-optimizer

History

Liang-Chi Hsieh 8503aa3007 [SPARK-26646][TEST][PYSPARK] Fix flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction ## What changes were proposed in this pull request? The test pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction looks sometimes flaky. ``` ====================================================================== FAIL: test_training_and_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests) Test that the model improves on toy data with no. of batches ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 367, in test_training_and_prediction self._eventually(condition, timeout=60.0) File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 69, in _eventually lastValue = condition() File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 362, in condition self.assertGreater(errors[1] - errors[-1], 0.3) AssertionError: -0.070000000000000062 not greater than 0.3 ---------------------------------------------------------------------- Ran 13 tests in 198.327s FAILED (failures=1, skipped=1) Had test failures in pyspark.mllib.tests.test_streaming_algorithms with python3.4; see logs ``` The predict stream can possibly be consumed to the end before the input stream. When it happens, the model improvement is not high as expected and causes test failed. This patch tries to increase number of batches of streams. This won't increase test time because we have a timeout there. ## How was this patch tested? Manually test. Closes #23586 from viirya/SPARK-26646. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>		2019-01-18 23:53:11 +08:00
..
linalg	[SPARK-26638][PYSPARK][ML] Pyspark vector classes always return error for unary negation	2019-01-17 14:24:21 -06:00
stat	Fix typos detected by github.com/client9/misspell	2018-08-11 21:23:36 -05:00
tests	[SPARK-26646][TEST][PYSPARK] Fix flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction	2019-01-18 23:53:11 +08:00
__init__.py	[SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide	2016-07-15 13:38:23 -07:00
classification.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
clustering.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
common.py	[SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConverter patch	2016-10-03 14:12:03 -07:00
evaluation.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
feature.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
fpm.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
random.py	[SPARK-23522][PYTHON] always use sys.exit over builtin exit	2018-03-08 20:38:34 +09:00
recommendation.py	[SPARK-23522][PYTHON] always use sys.exit over builtin exit	2018-03-08 20:38:34 +09:00
regression.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
tree.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
util.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00