spark-instrumented-optimizer

History

HyukjinKwon f984f6acfe Revert "[SPARK-27870][SQL][PYSPARK] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)" ## What changes were proposed in this pull request? This PR reverts `9c4eb99c52` for the reasons below: 1. An alternative was not considered properly, https://github.com/apache/spark/pull/24734#issuecomment-500101639 https://github.com/apache/spark/pull/24734#issuecomment-500102340 https://github.com/apache/spark/pull/24734#issuecomment-499202982 - I opened a PR https://github.com/apache/spark/pull/24826 2. `9c4eb99c52` fixed timely flushing which behaviour is somewhat hacky and the timing isn't also guaranteed (in case each batch takes longer to process). 3. For pipelining for smaller batches, looks it's better to allow to configure buffer size rather than having another factor to flush ## How was this patch tested? N/A Closes #24827 from HyukjinKwon/revert-flush. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>		2019-06-09 08:28:31 -07:00
..
ml	[SPARK-18570][ML][R] RFormula support * and ^ operators	2019-06-04 08:59:30 -05:00
mllib	[SPARK-27540][MLLIB] Add 'meanAveragePrecision_at_k' metric to RankingMetrics	2019-05-09 08:47:05 -05:00
sql	[SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled	2019-06-04 10:10:27 -07:00
streaming	[SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs	2019-03-11 10:15:07 +09:00
testing	Revert "[SPARK-27870][SQL][PYSPARK] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)"	2019-06-09 08:28:31 -07:00
tests	Revert "[SPARK-27870][SQL][PYSPARK] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)"	2019-06-09 08:28:31 -07:00
__init__.py	[SPARK-25248][.1][PYSPARK] update barrier Python API	2018-08-29 07:22:03 -07:00
_globals.py	[SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary	2018-02-09 14:21:10 +08:00
accumulators.py	[SPARK-25591][PYSPARK][SQL] Avoid overwriting deserialized accumulator	2018-10-08 15:18:08 +08:00
broadcast.py	[SPARK-18161][PYTHON] Update cloudpickle to v0.6.1	2019-02-02 10:49:45 +08:00
cloudpickle.py	[SPARK-27000][PYTHON] Upgrades cloudpickle to v0.8.0	2019-02-28 02:33:10 +09:00
conf.py	[SPARK-23522][PYTHON] always use sys.exit over builtin exit	2018-03-08 20:38:34 +09:00
context.py	[SPARK-27887][PYTHON] Add deprecation warning for Python 2	2019-06-04 15:36:52 +09:00
daemon.py	[PYSPARK] Update py4j to version 0.10.7.	2018-05-09 10:47:35 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
find_spark_home.py	Fix typos detected by github.com/client9/misspell	2018-08-11 21:23:36 -05:00
heapq3.py	Fix typos detected by github.com/client9/misspell	2018-08-11 21:23:36 -05:00
java_gateway.py	[SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway	2019-02-15 18:08:06 -08:00
join.py	[SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo…	2016-03-28 14:51:36 -07:00
profiler.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
rdd.py	[SPARK-23961][SPARK-27548][PYTHON] Fix error when toLocalIterator goes out of scope and properly raise errors from worker	2019-05-07 14:47:39 -07:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-3074] [PySpark] support groupByKey() with single huge key	2015-04-09 17:07:23 -07:00
serializers.py	Revert "[SPARK-27870][SQL][PYSPARK] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)"	2019-06-09 08:28:31 -07:00
shell.py	[SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4	2018-09-13 11:19:43 +08:00
shuffle.py	[SPARK-25696] The storage memory displayed on spark Application UI is…	2018-12-10 18:27:01 -06:00
statcounter.py	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	2015-09-29 13:38:15 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-25908][CORE][SQL] Remove old deprecated items in Spark 3	2018-11-07 22:48:50 -06:00
taskcontext.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
util.py	[SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs	2019-03-11 10:15:07 +09:00
version.py	[SPARK-25592] Setting version to 3.0.0-SNAPSHOT	2018-10-02 08:48:24 -07:00
worker.py	[SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.	2019-03-25 11:26:09 -07:00