4513f1c0dc
## What changes were proposed in this pull request? This patch introduces new options "startingOffsetsByTimestamp" and "endingOffsetsByTimestamp" to set specific timestamp per topic (since we're unlikely to set the different value per partition) to let source starts reading from offsets which have equal of greater timestamp, and ends reading until offsets which have equal of greater timestamp. The new option would be optional of course, and take preference over existing offset options. ## How was this patch tested? New unit tests added. Also manually tested basic functionality with Kafka 2.0.0 server. Running query below ``` val df = spark.read.format("kafka") .option("kafka.bootstrap.servers", "localhost:9092") .option("subscribe", "spark_26848_test_v1,spark_26848_test_2_v1") .option("startingOffsetsByTimestamp", """{"spark_26848_test_v1": 1549669142193, "spark_26848_test_2_v1": 1549669240965}""") .option("endingOffsetsByTimestamp", """{"spark_26848_test_v1": 1549669265676, "spark_26848_test_2_v1": 1549699265676}""") .load().selectExpr("CAST(value AS STRING)") df.show() ``` with below records (one string which number part remarks when they're put after such timestamp) in topic `spark_26848_test_v1` ``` hello1 1549669142193 world1 1549669142193 hellow1 1549669240965 world1 1549669240965 hello1 1549669265676 world1 1549669265676 ``` topic `spark_26848_test_2_v1` ``` hello2 1549669142193 world2 1549669142193 hello2 1549669240965 world2 1549669240965 hello2 1549669265676 world2 1549669265676 ``` the result of `df.show()` follows: ``` +--------------------+ | value| +--------------------+ |world1 1549669240965| |world1 1549669142193| |world2 1549669240965| |hello2 1549669240965| |hellow1 154966924...| |hello2 1549669265676| |hello1 1549669142193| |world2 1549669265676| +--------------------+ ``` Note that endingOffsets (as well as endingOffsetsByTimestamp) are exclusive. Closes #23747 from HeartSaVioR/SPARK-26848. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com> |
||
---|---|---|
.. | ||
avro | ||
docker | ||
docker-integration-tests | ||
kafka-0-10 | ||
kafka-0-10-assembly | ||
kafka-0-10-sql | ||
kafka-0-10-token-provider | ||
kinesis-asl | ||
kinesis-asl-assembly | ||
spark-ganglia-lgpl |