spark-instrumented-optimizer/python/pyspark/streaming
jerryshao 3ccebf36c5 [SPARK-8389] [STREAMING] [PYSPARK] Expose KafkaRDDs offsetRange in Python
This PR propose a simple way to expose OffsetRange in Python code, also the usage of offsetRanges is similar to Scala/Java way, here in Python we could get OffsetRange like:

```
dstream.foreachRDD(lambda r: KafkaUtils.offsetRanges(r))
```

Reason I didn't follow the way what SPARK-8389 suggested is that: Python Kafka API has one more step to decode the message compared to Scala/Java, Which makes Python API return a transformed RDD/DStream, not directly wrapped so-called JavaKafkaRDD, so it is hard to backtrack to the original RDD to get the offsetRange.

Author: jerryshao <saisai.shao@intel.com>

Closes #7185 from jerryshao/SPARK-8389 and squashes the following commits:

4c6d320 [jerryshao] Another way to fix subclass deserialization issue
e6a8011 [jerryshao] Address the comments
fd13937 [jerryshao] Fix serialization bug
7debf1c [jerryshao] bug fix
cff3893 [jerryshao] refactor the code according to the comments
2aabf9e [jerryshao] Style fix
848c708 [jerryshao] Add HasOffsetRanges for Python
2015-07-09 13:54:44 -07:00
..
__init__.py [SPARK-2377] Python API for Streaming 2014-10-12 02:46:56 -07:00
context.py [SPARK-6949] [SQL] [PySpark] Support Date/Timestamp in Column expression 2015-04-21 00:08:18 -07:00
dstream.py [SPARK-8444] [STREAMING] Adding Python streaming example for queueStream 2015-06-19 00:07:53 -07:00
flume.py [SPARK-8378] [STREAMING] Add the Python API for Flume 2015-07-01 11:59:24 -07:00
kafka.py [SPARK-8389] [STREAMING] [PYSPARK] Expose KafkaRDDs offsetRange in Python 2015-07-09 13:54:44 -07:00
tests.py [SPARK-8389] [STREAMING] [PYSPARK] Expose KafkaRDDs offsetRange in Python 2015-07-09 13:54:44 -07:00
util.py [SPARK-8389] [STREAMING] [PYSPARK] Expose KafkaRDDs offsetRange in Python 2015-07-09 13:54:44 -07:00