spark-instrumented-optimizer/python/pyspark
Davies Liu 4fa2fda88f [SPARK-2871] [PySpark] add RDD.lookup(key)
RDD.lookup(key)

        Return the list of values in the RDD for key `key`. This operation
        is done efficiently if the RDD has a known partitioner by only
        searching the partition that the key maps to.

        >>> l = range(1000)
        >>> rdd = sc.parallelize(zip(l, l), 10)
        >>> rdd.lookup(42)  # slow
        [42]
        >>> sorted = rdd.sortByKey()
        >>> sorted.lookup(42)  # fast
        [42]

It also clean up the code in RDD.py, and fix several bugs (related to preservesPartitioning).

Author: Davies Liu <davies.liu@gmail.com>

Closes #2093 from davies/lookup and squashes the following commits:

1789cd4 [Davies Liu] `f` in foreach could be generator or not.
2871b80 [Davies Liu] Merge branch 'master' into lookup
c6390ea [Davies Liu] address all comments
0f1bce8 [Davies Liu] add test case for lookup()
be0e8ba [Davies Liu] fix preservesPartitioning
eb1305d [Davies Liu] add RDD.lookup(key)
2014-08-27 13:18:33 -07:00
..
mllib [SPARK-3136][MLLIB] Create Java-friendly methods in RandomRDDs 2014-08-19 16:06:48 -07:00
__init__.py [SPARK-2724] Python version of RandomRDDGenerators 2014-07-31 20:32:57 -07:00
accumulators.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
broadcast.py [SPARK-1065] [PySpark] improve supporting for large broadcast 2014-08-16 16:59:34 -07:00
cloudpickle.py [SPARK-791] [PySpark] fix pickle itemgetter with cloudpickle 2014-07-29 01:02:18 -07:00
conf.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
context.py [SPARK-1065] [PySpark] improve supporting for large broadcast 2014-08-16 16:59:34 -07:00
daemon.py [SPARK-2898] [PySpark] fix bugs in deamon.py 2014-08-10 13:00:38 -07:00
files.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
heapq3.py [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey() 2014-08-26 16:57:40 -07:00
java_gateway.py [SPARK-3167] Handle special driver configs in Windows 2014-08-26 22:52:16 -07:00
join.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
rdd.py [SPARK-2871] [PySpark] add RDD.lookup(key) 2014-08-27 13:18:33 -07:00
rddsampler.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
resultiterable.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
serializers.py [SPARK-2790] [PySpark] fix zip with serializers which have different batch sizes. 2014-08-19 14:46:32 -07:00
shell.py [SPARK-2470] PEP8 fixes to PySpark 2014-07-21 22:30:53 -07:00
shuffle.py [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey() 2014-08-26 16:57:40 -07:00
sql.py [SPARK-2969][SQL] Make ScalaReflection be able to handle ArrayType.containsNull and MapType.valueContainsNull. 2014-08-26 13:22:55 -07:00
statcounter.py StatCounter on NumPy arrays [PYSPARK][SPARK-2012] 2014-08-01 22:33:25 -07:00
storagelevel.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
tests.py [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey() 2014-08-26 16:57:40 -07:00
worker.py [SPARK-3114] [PySpark] Fix Python UDFs in Spark SQL. 2014-08-18 20:42:19 -07:00