spark-instrumented-optimizer

History

Dongjoon Hyun 534f5d409a [SPARK-29138][PYTHON][TEST] Increase timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy ### What changes were proposed in this pull request? This PR aims to increase the timeout of `StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy` from 30s (default) to 60s. In this PR, before increasing the timeout, 1. I verified that this is not a JDK11 environmental issue by repeating 3 times first. 2. I reproduced the accuracy failure by reducing the timeout in Jenkins (https://github.com/apache/spark/pull/27424#issuecomment-580981262) Then, the final commit passed the Jenkins. ### Why are the changes needed? This seems to happen when Jenkins environment has congestion and the jobs are slowdown. The streaming job seems to be unable to repeat the designed iteration `numIteration=25` in 30 seconds. Since the error is decreasing at each iteration, the failure occurs. By reducing the timeout, we can reproduce the similar issue locally like Jenkins. ```python - eventually(condition, catch_assertions=True) + eventually(condition, timeout=10.0, catch_assertions=True) ``` ``` $ python/run-tests --testname 'pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy' --python-executables=python ... ====================================================================== FAIL: test_parameter_accuracy (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/dongjoon/PRS/SPARK-TEST/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 229, in test_parameter_accuracy eventually(condition, timeout=10.0, catch_assertions=True) File "/Users/dongjoon/PRS/SPARK-TEST/python/pyspark/testing/utils.py", line 86, in eventually raise lastValue Reproduce the error File "/Users/dongjoon/PRS/SPARK-TEST/python/pyspark/testing/utils.py", line 77, in eventually lastValue = condition() File "/Users/dongjoon/PRS/SPARK-TEST/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 226, in condition self.assertAlmostEqual(rel, 0.1, 1) AssertionError: 0.25749106949322637 != 0.1 within 1 places (0.15749106949322636 difference) ---------------------------------------------------------------------- Ran 1 test in 14.814s FAILED (failures=1) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins (and manual check by reducing the timeout). Since this is a flakiness issue depending on the Jenkins job situation, it's difficult to reproduce there. Closes #27424 from dongjoon-hyun/SPARK-TEST. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2020-02-01 15:38:16 +09:00
..
linalg	[SPARK-28140][MLLIB][PYTHON] Accept DataFrames in RowMatrix and IndexedRowMatrix constructors	2019-07-09 16:39:21 -05:00
stat	[SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation	2019-07-05 10:08:22 -07:00
tests	[SPARK-29138][PYTHON][TEST] Increase timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy	2020-02-01 15:38:16 +09:00
__init__.py	[SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide	2016-07-15 13:38:23 -07:00
classification.py	[SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation	2019-07-05 10:08:22 -07:00
clustering.py	[SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3	2019-09-09 10:19:40 -05:00
common.py	[SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConverter patch	2016-10-03 14:12:03 -07:00
evaluation.py	[SPARK-29489][ML][PYSPARK] ml.evaluation support log-loss	2019-10-18 17:57:13 +08:00
feature.py	[SPARK-26616][MLLIB] Expose document frequency in IDFModel	2019-01-22 07:41:54 -06:00
fpm.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
random.py	[SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation	2019-07-05 10:08:22 -07:00
recommendation.py	[SPARK-23643][CORE][SQL][ML] Shrinking the buffer in hashSeed up to size of the seed parameter	2019-03-23 11:26:09 -05:00
regression.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
tree.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
util.py	[SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation	2019-07-05 10:08:22 -07:00