[SPARK-14874][SQL][STREAMING] Remove the obsolete Batch representation
## What changes were proposed in this pull request?
The `Batch` class, which had been used to indicate progress in a stream, was abandoned by [[SPARK-13985][SQL] Deterministic batches with ids](caea152145
) and then became useless.
This patch:
- removes the `Batch` class
- ~~does some related renaming~~ (update: this has been reverted)
- fixes some related comments
## How was this patch tested?
N/A
Author: Liwei Lin <lwlin7@gmail.com>
Closes #12638 from lw-lin/remove-batch.
This commit is contained in:
parent
7dd01d9c01
commit
a234cc6146
|
@ -1,26 +0,0 @@
|
|||
/*
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.sql.execution.streaming
|
||||
|
||||
import org.apache.spark.sql.DataFrame
|
||||
|
||||
/**
|
||||
* Used to pass a batch of data through a streaming query execution along with an indication
|
||||
* of progress in the stream.
|
||||
*/
|
||||
class Batch(val end: Offset, val data: DataFrame)
|
|
@ -88,7 +88,7 @@ class FileStreamSource(
|
|||
}
|
||||
|
||||
/**
|
||||
* Returns the next batch of data that is available after `start`, if any is available.
|
||||
* Returns the data that is between the offsets (`start`, `end`].
|
||||
*/
|
||||
override def getBatch(start: Option[Offset], end: Offset): DataFrame = {
|
||||
val startId = start.map(_.asInstanceOf[LongOffset].offset).getOrElse(-1L)
|
||||
|
|
|
@ -27,7 +27,7 @@ import org.apache.spark.sql.DataFrame
|
|||
trait Sink {
|
||||
|
||||
/**
|
||||
* Adds a batch of data to this sink. The data for a given `batchId` is deterministic and if
|
||||
* Adds a batch of data to this sink. The data for a given `batchId` is deterministic and if
|
||||
* this method is called more than once with the same batchId (which will happen in the case of
|
||||
* failures), then `data` should only be added once.
|
||||
*/
|
||||
|
|
|
@ -34,7 +34,7 @@ trait Source {
|
|||
def getOffset: Option[Offset]
|
||||
|
||||
/**
|
||||
* Returns the data that is between the offsets (`start`, `end`]. When `start` is `None` then
|
||||
* Returns the data that is between the offsets (`start`, `end`]. When `start` is `None` then
|
||||
* the batch should begin with the first available record. This method must always return the
|
||||
* same data for a particular `start` and `end` pair.
|
||||
*/
|
||||
|
|
|
@ -91,7 +91,7 @@ case class MemoryStream[A : Encoder](id: Int, sqlContext: SQLContext)
|
|||
}
|
||||
|
||||
/**
|
||||
* Returns the next batch of data that is available after `start`, if any is available.
|
||||
* Returns the data that is between the offsets (`start`, `end`].
|
||||
*/
|
||||
override def getBatch(start: Option[Offset], end: Offset): DataFrame = {
|
||||
val startOrdinal =
|
||||
|
|
Loading…
Reference in a new issue