spark-instrumented-optimizer

History

Davies Liu ce95bd8e13 [SPARK-4531] [MLlib] cache serialized java object The Pyrolite is pretty slow (comparing to the adhoc serializer in 1.1), it cause much performance regression in 1.2, because we cache the serialized Python object in JVM, deserialize them into Java object in each step. This PR change to cache the deserialized JavaRDD instead of PythonRDD to avoid the deserialization of Pyrolite. It should have similar memory usage as before, but much faster. Author: Davies Liu <davies@databricks.com> Closes #3397 from davies/cache and squashes the following commits: 7f6e6ce [Davies Liu] Update -> Updater 4b52edd [Davies Liu] using named argument 63b984e [Davies Liu] fix 7da0332 [Davies Liu] add unpersist() dff33e1 [Davies Liu] address comments c2bdfc2 [Davies Liu] refactor d572f00 [Davies Liu] Merge branch 'master' into cache f1063e1 [Davies Liu] cache serialized java object		2014-11-21 15:02:31 -08:00
..
__init__.py	[SPARK-4348] [PySpark] [MLlib] rename random.py to rand.py	2014-11-13 10:24:54 -08:00
classification.py	[SPARK-4306] [MLlib] Python API for LogisticRegressionWithLBFGS	2014-11-18 15:57:33 -08:00
clustering.py	[SPARK-4531] [MLlib] cache serialized java object	2014-11-21 15:02:31 -08:00
common.py	[SPARK-4531] [MLlib] cache serialized java object	2014-11-21 15:02:31 -08:00
feature.py	[SPARK-4348] [PySpark] [MLlib] rename random.py to rand.py	2014-11-13 10:24:54 -08:00
linalg.py	[SPARK-4348] [PySpark] [MLlib] rename random.py to rand.py	2014-11-13 10:24:54 -08:00
rand.py	[SPARK-4348] [PySpark] [MLlib] rename random.py to rand.py	2014-11-13 10:24:54 -08:00
recommendation.py	[SPARK-4531] [MLlib] cache serialized java object	2014-11-21 15:02:31 -08:00
regression.py	[SPARK-4531] [MLlib] cache serialized java object	2014-11-21 15:02:31 -08:00
stat.py	[SPARK-4324] [PySpark] [MLlib] support numpy.array for all MLlib API	2014-11-10 22:26:16 -08:00
tests.py	[SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD	2014-11-03 22:29:48 -08:00
tree.py	[SPARK-4439] [MLlib] add python api for random forest	2014-11-20 15:31:28 -08:00
util.py	[SPARK-4324] [PySpark] [MLlib] support numpy.array for all MLlib API	2014-11-10 22:26:16 -08:00