Merge pull request #46 from mateiz/py-sort-update
Fix PySpark docs and an overly long line of code after #38 Just noticed these after merging that commit (https://github.com/apache/incubator-spark/pull/38).
This commit is contained in:
commit
7827efc87b
|
@ -16,7 +16,7 @@ This guide will show how to use the Spark features described there in Python.
|
|||
There are a few key differences between the Python and Scala APIs:
|
||||
|
||||
* Python is dynamically typed, so RDDs can hold objects of multiple types.
|
||||
* PySpark does not yet support a few API calls, such as `lookup`, `sort`, and non-text input files, though these will be added in future releases.
|
||||
* PySpark does not yet support a few API calls, such as `lookup` and non-text input files, though these will be added in future releases.
|
||||
|
||||
In PySpark, RDDs support the same methods as their Scala counterparts but take Python functions and return Python collection types.
|
||||
Short functions can be passed to RDD methods using Python's [`lambda`](http://www.diveintopython.net/power_of_introspection/lambda_functions.html) syntax:
|
||||
|
|
|
@ -117,8 +117,6 @@ class RDD(object):
|
|||
else:
|
||||
return None
|
||||
|
||||
# TODO persist(self, storageLevel)
|
||||
|
||||
def map(self, f, preservesPartitioning=False):
|
||||
"""
|
||||
Return a new RDD containing the distinct elements in this RDD.
|
||||
|
@ -309,7 +307,9 @@ class RDD(object):
|
|||
def mapFunc(iterator):
|
||||
yield sorted(iterator, reverse=(not ascending), key=lambda (k, v): keyfunc(k))
|
||||
|
||||
return self.partitionBy(numPartitions, partitionFunc=rangePartitionFunc).mapPartitions(mapFunc,preservesPartitioning=True).flatMap(lambda x: x, preservesPartitioning=True)
|
||||
return (self.partitionBy(numPartitions, partitionFunc=rangePartitionFunc)
|
||||
.mapPartitions(mapFunc,preservesPartitioning=True)
|
||||
.flatMap(lambda x: x, preservesPartitioning=True))
|
||||
|
||||
def glom(self):
|
||||
"""
|
||||
|
|
Loading…
Reference in a new issue