spark-instrumented-optimizer

History

Ryan Blue 31b59bd805 [SPARK-28843][PYTHON] Set OMP_NUM_THREADS to executor cores for python if not set ### What changes were proposed in this pull request? When starting python processes, set `OMP_NUM_THREADS` to the number of cores allocated to an executor or driver if `OMP_NUM_THREADS` is not already set. Each python process will use the same `OMP_NUM_THREADS` setting, even if workers are not shared. This avoids creating an OpenMP thread pool for parallel processing with a number of threads equal to the number of cores on the executor and [significantly reduces memory consumption](https://github.com/numpy/numpy/issues/10455). Instead, this threadpool should use the number of cores allocated to the executor, if available. If a setting for number of cores is not available, this doesn't change any behavior. OpenMP is used by numpy and pandas. ### Why are the changes needed? To reduce memory consumption for PySpark jobs. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Validated this reduces python worker memory consumption by more than 1GB on our cluster. Closes #25545 from rdblue/SPARK-28843-set-omp-num-cores. Authored-by: Ryan Blue <blue@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2019-08-30 10:29:46 +09:00
..
benchmarks	[SPARK-27070] Improve performance of DefaultPartitionCoalescer	2019-03-17 11:47:14 -05:00
src	[SPARK-28843][PYTHON] Set OMP_NUM_THREADS to executor cores for python if not set	2019-08-30 10:29:46 +09:00
pom.xml	[SPARK-17875][CORE][BUILD] Remove dependency on Netty 3	2019-08-21 21:27:56 -07:00