[SPARK-30480][PYTHON][TESTS] Increases the memory limit being tested in 'WorkerMemoryTest.test_memory_limit'

### What changes were proposed in this pull request? This PR proposes to increase the memory in `WorkerMemoryTest.test_memory_limit` in order to make the test pass with PyPy. The test is currently failed only in PyPy as below in some PRs unexpectedly: ``` Current mem limits: 18446744073709551615 of max 18446744073709551615 Setting mem limits to 1048576 of max 1048576 RPython traceback: File "pypy_module_pypyjit_interp_jit.c", line 289, in portal_5 File "pypy_interpreter_pyopcode.c", line 3468, in handle_bytecode__AccessDirect_None File "pypy_interpreter_pyopcode.c", line 5558, in dispatch_bytecode__AccessDirect_None out of memory: couldn't allocate the next arena ERROR ``` It seems related to how PyPy allocates the memory and GC works PyPy-specifically. There seems nothing wrong in this configuration implementation itself in PySpark side. I roughly tested in higher PyPy versions on Ubuntu (PyPy v7.3.0) and this test seems passing fine so I suspect this might be an issue in old PyPy behaviours. The change only increases the limit so it would not affect actual memory allocations. It just needs to test if the limit is properly set in worker sides. For clarification, the memory is unlimited in the machine if not set. ### Why are the changes needed? To make the tests pass and unblock other PRs. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually and Jenkins should test it out. Closes #27186 from HyukjinKwon/SPARK-30480. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-01-13 18:47:15 +09:00 · 2020-01-13 18:47:15 +09:00 · 0823aec463
parent eefcc7d762
commit 0823aec463
2 changed files with 5 additions and 5 deletions
--- a/python/pyspark/tests/test_worker.py
+++ b/python/pyspark/tests/test_worker.py
@ -183,7 +183,7 @@ class WorkerReuseTest(PySparkTestCase):
 class WorkerMemoryTest(PySparkTestCase):

    def test_memory_limit(self):
-        self.sc._conf.set("spark.executor.pyspark.memory", "1m")
+        self.sc._conf.set("spark.executor.pyspark.memory", "2g")
        rdd = self.sc.parallelize(xrange(1), 1)

        def getrlimit():
@ -194,8 +194,8 @@ class WorkerMemoryTest(PySparkTestCase):
        self.assertTrue(len(actual) == 1)
        self.assertTrue(len(actual[0]) == 2)
        [(soft_limit, hard_limit)] = actual
-        self.assertEqual(soft_limit, 1024 * 1024)
-        self.assertEqual(hard_limit, 1024 * 1024)
+        self.assertEqual(soft_limit, 2 * 1024 * 1024 * 1024)
+        self.assertEqual(hard_limit, 2 * 1024 * 1024 * 1024)


 if __name__ == "__main__":
--- a/python/run-tests.py
+++ b/python/run-tests.py
@ -267,8 +267,8 @@ def main():
        python_implementation = subprocess_check_output(
            [python_exec, "-c", "import platform; print(platform.python_implementation())"],
            universal_newlines=True).strip()
-        LOGGER.debug("%s python_implementation is %s", python_exec, python_implementation)
-        LOGGER.debug("%s version is: %s", python_exec, subprocess_check_output(
+        LOGGER.info("%s python_implementation is %s", python_exec, python_implementation)
+        LOGGER.info("%s version is: %s", python_exec, subprocess_check_output(
            [python_exec, "--version"], stderr=subprocess.STDOUT, universal_newlines=True).strip())
        if should_test_modules:
            for module in modules_to_test: