spark-instrumented-optimizer

History

Xiangrui Meng b54c6ab3c5 [SPARK-4396] allow lookup by index in Python's Rating In PySpark, ALS can take an RDD of (user, product, rating) tuples as input. However, model.predict outputs an RDD of Rating. So on the input side, users can use r[0], r[1], r[2], while on the output side, users have to use r.user, r.product, r.rating. We should allow lookup by index in Rating by making Rating a namedtuple. davies <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3261) <!-- Reviewable:end --> Author: Xiangrui Meng <meng@databricks.com> Closes #3261 from mengxr/SPARK-4396 and squashes the following commits: 543aef0 [Xiangrui Meng] use named tuple to implement ALS 0b61bae [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4396 d3bd7d4 [Xiangrui Meng] allow lookup by index in Python's Rating		2014-11-18 10:35:29 -08:00
..
mllib	[SPARK-4396] allow lookup by index in Python's Rating	2014-11-18 10:35:29 -08:00
streaming	replace awaitTransformation with awaitTermination in scaladoc/javadoc	2014-10-21 09:37:17 -07:00
__init__.py	[SPARK-4348] [PySpark] [MLlib] rename random.py to rand.py	2014-11-13 10:24:54 -08:00
accumulators.py	[SPARK-3478] [PySpark] Profile the Python tasks	2014-09-30 18:24:57 -07:00
broadcast.py	[SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx	2014-09-16 12:51:58 -07:00
cloudpickle.py	[SPARK-3679] [PySpark] pickle the exact globals of functions	2014-09-24 13:00:05 -07:00
conf.py	[SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs	2014-10-07 18:09:27 -07:00
context.py	[SPARK-4398][PySpark] specialize sc.parallelize(xrange)	2014-11-14 12:43:17 -08:00
daemon.py	[SPARK-4088] [PySpark] Python worker should exit after socket is closed by JVM	2014-10-25 01:20:39 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
heapq3.py	[SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey()	2014-08-26 16:57:40 -07:00
java_gateway.py	[SPARK-4415] [PySpark] JVM should exit after Python exit	2014-11-14 20:14:33 -08:00
join.py	[SPARK-546] Add full outer join to RDD and DStream.	2014-09-24 20:39:09 -07:00
rdd.py	[SPARK-4304] [PySpark] Fix sort on empty RDD	2014-11-07 20:53:03 -08:00
rddsampler.py	[SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample	2014-11-03 12:24:24 -08:00
resultiterable.py	[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically	2014-08-06 12:58:24 -07:00
serializers.py	[SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by default.	2014-11-03 23:56:14 -08:00
shell.py	[SPARK-3273][SPARK-3301]We should read the version information from the same place	2014-09-06 15:08:43 -07:00
shuffle.py	[SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by default.	2014-11-03 23:56:14 -08:00
sql.py	[SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by default.	2014-11-03 23:56:14 -08:00
statcounter.py	StatCounter on NumPy arrays [PYSPARK][SPARK-2012]	2014-08-01 22:33:25 -07:00
storagelevel.py	[SPARK-3417] Use new-style classes in PySpark	2014-09-08 15:45:36 -07:00
tests.py	[SPARK-4304] [PySpark] Fix sort on empty RDD	2014-11-07 20:53:03 -08:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
worker.py	[SPARK-3993] [PySpark] fix bug while reuse worker after take()	2014-10-23 17:20:00 -07:00