[SPARK-12243][BUILD][PYTHON] PySpark tests are slow in Jenkins.

## What changes were proposed in this pull request?

In the Jenkins pull request builder, PySpark tests take around [962 seconds ](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52530/console) of end-to-end time to run, despite the fact that we run four Python test suites in parallel. According to the log, the basic reason is that the long running test starts at the end due to FIFO queue. We first try to reduce the test time by just starting some long running tests first with simple priority queue.

```
========================================================================
Running PySpark tests
========================================================================
...
Finished test(python3.4): pyspark.streaming.tests (213s)
Finished test(pypy): pyspark.sql.tests (92s)
Finished test(pypy): pyspark.streaming.tests (280s)
Tests passed in 962 seconds
```

## How was this patch tested?

Manual check.
Check 'Running PySpark tests' part of the Jenkins log.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11551 from dongjoon-hyun/SPARK-12243.
This commit is contained in:
Dongjoon Hyun 2016-03-07 12:06:46 -08:00 committed by Josh Rosen
parent ef77003178
commit e72914f37d

View file

@ -157,7 +157,7 @@ def main():
LOGGER.info("Will test against the following Python executables: %s", python_execs)
LOGGER.info("Will test the following Python modules: %s", [x.name for x in modules_to_test])
task_queue = Queue.Queue()
task_queue = Queue.PriorityQueue()
for python_exec in python_execs:
python_implementation = subprocess_check_output(
[python_exec, "-c", "import platform; print(platform.python_implementation())"],
@ -168,12 +168,17 @@ def main():
for module in modules_to_test:
if python_implementation not in module.blacklisted_python_implementations:
for test_goal in module.python_test_goals:
task_queue.put((python_exec, test_goal))
if test_goal in ('pyspark.streaming.tests', 'pyspark.mllib.tests',
'pyspark.tests', 'pyspark.sql.tests'):
priority = 0
else:
priority = 100
task_queue.put((priority, (python_exec, test_goal)))
def process_queue(task_queue):
while True:
try:
(python_exec, test_goal) = task_queue.get_nowait()
(priority, (python_exec, test_goal)) = task_queue.get_nowait()
except Queue.Empty:
break
try: