spark-instrumented-optimizer/python/pyspark
HyukjinKwon f984f6acfe Revert "[SPARK-27870][SQL][PYSPARK] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)"
## What changes were proposed in this pull request?

This PR reverts 9c4eb99c52 for the reasons below:

1. An alternative was not considered properly, https://github.com/apache/spark/pull/24734#issuecomment-500101639 https://github.com/apache/spark/pull/24734#issuecomment-500102340 https://github.com/apache/spark/pull/24734#issuecomment-499202982 - I opened a PR https://github.com/apache/spark/pull/24826

2. 9c4eb99c52 fixed timely flushing which behaviour is somewhat hacky and the timing isn't also guaranteed (in case each batch takes longer to process).

3. For pipelining for smaller batches, looks it's better to allow to configure buffer size rather than having another factor to flush

## How was this patch tested?

N/A

Closes #24827 from HyukjinKwon/revert-flush.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-09 08:28:31 -07:00
..
ml [SPARK-18570][ML][R] RFormula support * and ^ operators 2019-06-04 08:59:30 -05:00
mllib [SPARK-27540][MLLIB] Add 'meanAveragePrecision_at_k' metric to RankingMetrics 2019-05-09 08:47:05 -05:00
sql [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled 2019-06-04 10:10:27 -07:00
streaming [SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs 2019-03-11 10:15:07 +09:00
testing Revert "[SPARK-27870][SQL][PYSPARK] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)" 2019-06-09 08:28:31 -07:00
tests Revert "[SPARK-27870][SQL][PYSPARK] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)" 2019-06-09 08:28:31 -07:00
__init__.py [SPARK-25248][.1][PYSPARK] update barrier Python API 2018-08-29 07:22:03 -07:00
_globals.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
accumulators.py [SPARK-25591][PYSPARK][SQL] Avoid overwriting deserialized accumulator 2018-10-08 15:18:08 +08:00
broadcast.py [SPARK-18161][PYTHON] Update cloudpickle to v0.6.1 2019-02-02 10:49:45 +08:00
cloudpickle.py [SPARK-27000][PYTHON] Upgrades cloudpickle to v0.8.0 2019-02-28 02:33:10 +09:00
conf.py [SPARK-23522][PYTHON] always use sys.exit over builtin exit 2018-03-08 20:38:34 +09:00
context.py [SPARK-27887][PYTHON] Add deprecation warning for Python 2 2019-06-04 15:36:52 +09:00
daemon.py [PYSPARK] Update py4j to version 0.10.7. 2018-05-09 10:47:35 -07:00
files.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
find_spark_home.py Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
heapq3.py Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
java_gateway.py [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway 2019-02-15 18:08:06 -08:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
rdd.py [SPARK-23961][SPARK-27548][PYTHON] Fix error when toLocalIterator goes out of scope and properly raise errors from worker 2019-05-07 14:47:39 -07:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resultiterable.py [SPARK-3074] [PySpark] support groupByKey() with single huge key 2015-04-09 17:07:23 -07:00
serializers.py Revert "[SPARK-27870][SQL][PYSPARK] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)" 2019-06-09 08:28:31 -07:00
shell.py [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4 2018-09-13 11:19:43 +08:00
shuffle.py [SPARK-25696] The storage memory displayed on spark Application UI is… 2018-12-10 18:27:01 -06:00
statcounter.py [SPARK-6919] [PYSPARK] Add asDict method to StatCounter 2015-09-29 13:38:15 -07:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
storagelevel.py [SPARK-25908][CORE][SQL] Remove old deprecated items in Spark 3 2018-11-07 22:48:50 -06:00
taskcontext.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
util.py [SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs 2019-03-11 10:15:07 +09:00
version.py [SPARK-25592] Setting version to 3.0.0-SNAPSHOT 2018-10-02 08:48:24 -07:00
worker.py [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. 2019-03-25 11:26:09 -07:00