[SPARK-34945][DOC] Fix Javadoc for classes in catalyst module

### What changes were proposed in this pull request? Use proper Java doc format for Java classes within `catalyst` module ### Why are the changes needed? Many Java classes in `catalyst`, especially those for DataSource V2, do not have proper Java doc format. By fixing the format it helps to improve the doc's readability. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #32038 from sunchao/javadoc. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-04-02 23:00:19 -07:00 · 2021-04-02 23:00:19 -07:00 · f1d42bb68d
parent 65da9287bc
commit f1d42bb68d
31 changed files with 201 additions and 172 deletions
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomTaskMetric.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomTaskMetric.java
@ -25,7 +25,7 @@ import org.apache.spark.sql.connector.read.PartitionReader;
 * at the executor side. During query execution, Spark will collect the task metrics per partition
 * by {@link PartitionReader} and update internal metrics based on collected metric values.
 * For streaming query, Spark will collect and combine metrics for a final result per micro batch.
- *
+ * <p>
 * The metrics will be gathered during query execution back to the driver and then combined. How
 * the task metrics are combined is defined by corresponding {@link CustomMetric} with same metric
 * name. The final result will be shown up in the data source scan operator in Spark UI.
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
@ -30,15 +30,16 @@ import org.apache.spark.sql.catalyst.analysis.PartitionsAlreadyExistException;
 * <p>
 * These APIs are used to modify table partition or partition metadata,
 * they will change the table data as well.
- * ${@link #createPartitions}:
- *     add an array of partitions and any data they contain to the table
- * ${@link #dropPartitions}:
- *     remove an array of partitions and any data they contain from the table
- * ${@link #purgePartitions}:
- *     remove an array of partitions and any data they contain from the table by skipping
- *     a trash even if it is supported
- * ${@link #truncatePartitions}:
- *     truncate an array of partitions by removing partitions data
+ * <ul>
+ *   <li>{@link #createPartitions}: add an array of partitions and any data they contain to the
+ *   table</li>
+ *   <li>{@link #dropPartitions}: remove an array of partitions and any data they contain from
+ *   the table</li>
+ *   <li>{@link #purgePartitions}: remove an array of partitions and any data they contain from
+ *   the table by skipping a trash even if it is supported</li>
+ *   <li>{@link #truncatePartitions}: truncate an array of partitions by removing partitions
+ *   data</li>
+ * </ul>
 *
 * @since 3.1.0
 */
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsPartitionManagement.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsPartitionManagement.java
@ -32,17 +32,15 @@ import org.apache.spark.sql.types.StructType;
 * <p>
 * These APIs are used to modify table partition identifier or partition metadata.
 * In some cases, they will change the table data as well.
- * ${@link #createPartition}:
- *     add a partition and any data it contains to the table
- * ${@link #dropPartition}:
- *     remove a partition and any data it contains from the table
- * ${@link #purgePartition}:
- *     remove a partition and any data it contains from the table by skipping a trash
- *     even if it is supported.
- * ${@link #replacePartitionMetadata}:
- *     point a partition to a new location, which will swap one location's data for the other
- * ${@link #truncatePartition}:
- *     remove partition data from the table
+ * <ul>
+ *   <li>{@link #createPartition}: add a partition and any data it contains to the table</li>
+ *   <li>{@link #dropPartition}: remove a partition and any data it contains from the table</li>
+ *   <li>{@link #purgePartition}: remove a partition and any data it contains from the table by
+ *   skipping a trash even if it is supported.</li>
+ *   <li>{@link #replacePartitionMetadata}: point a partition to a new location, which will swap
+ *   one location's data for the other</li>
+ *   <li>{@link #truncatePartition}: remove partition data from the table</li>
+ * </ul>
 *
 * @since 3.1.0
 */
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableProvider.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableProvider.java
@ -78,14 +78,17 @@ public interface TableProvider {
  /**
   * Returns true if the source has the ability of accepting external table metadata when getting
   * tables. The external table metadata includes:
-   *   1. For table reader: user-specified schema from `DataFrameReader`/`DataStreamReader` and
-   *      schema/partitioning stored in Spark catalog.
-   *   2. For table writer: the schema of the input `Dataframe` of
-   *      `DataframeWriter`/`DataStreamWriter`.
+   * <ol>
+   *   <li>For table reader: user-specified schema from {@code DataFrameReader}/{@code
+   *   DataStreamReader} and schema/partitioning stored in Spark catalog.</li>
+   *   <li>For table writer: the schema of the input {@code Dataframe} of
+   *   {@code DataframeWriter}/{@code DataStreamWriter}.</li>
+   * </ol>
   * <p>
   * By default this method returns false, which means the schema and partitioning passed to
-   * `getTable` are from the infer methods. Please override it if this source has expensive
-   * schema/partitioning inference and wants external table metadata to avoid inference.
+   * {@link #getTable(StructType, Transform[], Map)} are from the infer methods. Please override it
+   * if this source has expensive schema/partitioning inference and wants external table metadata
+   * to avoid inference.
   */
  default boolean supportsExternalMetadata() {
    return false;
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Batch.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Batch.java
@ -36,10 +36,8 @@ public interface Batch {
   * <p>
   * If the {@link Scan} supports filter pushdown, this Batch is likely configured with a filter
   * and is responsible for creating splits for that filter, which is not a full scan.
-   * </p>
   * <p>
   * This method will be called only once during a data source scan, to launch one Spark job.
-   * </p>
   */
  InputPartition[] planInputPartitions();

--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/InputPartition.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/InputPartition.java
@ -24,7 +24,7 @@ import org.apache.spark.annotation.Evolving;
 /**
 * A serializable representation of an input partition returned by
 * {@link Batch#planInputPartitions()} and the corresponding ones in streaming .
- *
+ * <p>
 * Note that {@link InputPartition} will be serialized and sent to executors, then
 * {@link PartitionReader} will be created by
 * {@link PartitionReaderFactory#createReader(InputPartition)} or
@ -42,11 +42,11 @@ public interface InputPartition extends Serializable {
   * faster, but Spark does not guarantee to run the input partition reader on these locations.
   * The implementations should make sure that it can be run on any location.
   * The location is a string representing the host name.
-   *
+   * <p>
   * Note that if a host name cannot be recognized by Spark, it will be ignored as it was not in
   * the returned locations. The default return value is empty string array, which means this
   * input partition's reader has no location preference.
-   *
+   * <p>
   * If this method fails (by throwing an exception), the action will fail and no Spark job will be
   * submitted.
   */
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReader.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReader.java
@ -27,7 +27,7 @@ import org.apache.spark.sql.connector.CustomTaskMetric;
 * A partition reader returned by {@link PartitionReaderFactory#createReader(InputPartition)} or
 * {@link PartitionReaderFactory#createColumnarReader(InputPartition)}. It's responsible for
 * outputting data for a RDD partition.
- *
+ * <p>
 * Note that, Currently the type `T` can only be {@link org.apache.spark.sql.catalyst.InternalRow}
 * for normal data sources, or {@link org.apache.spark.sql.vectorized.ColumnarBatch} for columnar
 * data sources(whose {@link PartitionReaderFactory#supportColumnarReads(InputPartition)}
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderFactory.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderFactory.java
@ -25,7 +25,7 @@ import org.apache.spark.sql.vectorized.ColumnarBatch;

 /**
 * A factory used to create {@link PartitionReader} instances.
- *
+ * <p>
 * If Spark fails to execute any methods in the implementations of this interface or in the returned
 * {@link PartitionReader} (by throwing an exception), corresponding Spark task would fail and
 * get retried until hitting the maximum retry times.
@ -37,7 +37,7 @@ public interface PartitionReaderFactory extends Serializable {

  /**
   * Returns a row-based partition reader to read data from the given {@link InputPartition}.
-   *
+   * <p>
   * Implementations probably need to cast the input partition to the concrete
   * {@link InputPartition} class defined for the data source.
   */
@ -45,7 +45,7 @@ public interface PartitionReaderFactory extends Serializable {

  /**
   * Returns a columnar partition reader to read data from the given {@link InputPartition}.
-   *
+   * <p>
   * Implementations probably need to cast the input partition to the concrete
   * {@link InputPartition} class defined for the data source.
   */
@ -57,7 +57,7 @@ public interface PartitionReaderFactory extends Serializable {
   * Returns true if the given {@link InputPartition} should be read by Spark in a columnar way.
   * This means, implementations must also implement {@link #createColumnarReader(InputPartition)}
   * for the input partitions that this method returns true.
-   *
+   * <p>
   * As of Spark 2.4, Spark can only read all input partition in a columnar way, or none of them.
   * Data source can't mix columnar and row-based partitions. This may be relaxed in future
   * versions.
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownFilters.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownFilters.java
@ -39,14 +39,17 @@ public interface SupportsPushDownFilters extends ScanBuilder {

  /**
   * Returns the filters that are pushed to the data source via {@link #pushFilters(Filter[])}.
-   *
+   * <p>
   * There are 3 kinds of filters:
-   *  1. pushable filters which don't need to be evaluated again after scanning.
-   *  2. pushable filters which still need to be evaluated after scanning, e.g. parquet
-   *     row group filter.
-   *  3. non-pushable filters.
+   * <ol>
+   *  <li>pushable filters which don't need to be evaluated again after scanning.</li>
+   *  <li>pushable filters which still need to be evaluated after scanning, e.g. parquet row
+   *  group filter.</li>
+   *  <li>non-pushable filters.</li>
+   * </ol>
+   * <p>
   * Both case 1 and 2 should be considered as pushed filters and should be returned by this method.
-   *
+   * <p>
   * It's possible that there is no filters in the query and {@link #pushFilters(Filter[])}
   * is never called, empty array should be returned for this case.
   */
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownRequiredColumns.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownRequiredColumns.java
@ -32,11 +32,11 @@ public interface SupportsPushDownRequiredColumns extends ScanBuilder {

  /**
   * Applies column pruning w.r.t. the given requiredSchema.
-   *
+   * <p>
   * Implementation should try its best to prune the unnecessary columns or nested fields, but it's
   * also OK to do the pruning partially, e.g., a data source may not be able to prune nested
   * fields, and only prune top-level columns.
-   *
+   * <p>
   * Note that, {@link Scan#readSchema()} implementation should take care of the column
   * pruning applied here.
   */
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsReportPartitioning.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsReportPartitioning.java
@ -23,7 +23,7 @@ import org.apache.spark.sql.connector.read.partitioning.Partitioning;
 /**
 * A mix in interface for {@link Scan}. Data sources can implement this interface to
 * report data partitioning and try to avoid shuffle at Spark side.
- *
+ * <p>
 * Note that, when a {@link Scan} implementation creates exactly one {@link InputPartition},
 * Spark may avoid adding a shuffle even if the reader does not implement this interface.
 *
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsReportStatistics.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsReportStatistics.java
@ -22,7 +22,7 @@ import org.apache.spark.annotation.Evolving;
 /**
 * A mix in interface for {@link Scan}. Data sources can implement this interface to
 * report statistics to Spark.
- *
+ * <p>
 * As of Spark 3.0, statistics are reported to the optimizer after operators are pushed to the
 * data source. Implementations may return more accurate statistics based on pushed operators
 * which may improve query performance by providing better information to the optimizer.
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/partitioning/Distribution.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/partitioning/Distribution.java
@ -24,14 +24,15 @@ import org.apache.spark.sql.connector.read.PartitionReader;
 * An interface to represent data distribution requirement, which specifies how the records should
 * be distributed among the data partitions (one {@link PartitionReader} outputs data for one
 * partition).
+ * <p>
 * Note that this interface has nothing to do with the data ordering inside one
 * partition(the output records of a single {@link PartitionReader}).
- *
+ * <p>
 * The instance of this interface is created and provided by Spark, then consumed by
 * {@link Partitioning#satisfy(Distribution)}. This means data source developers don't need to
 * implement this interface, but need to catch as more concrete implementations of this interface
 * as possible in {@link Partitioning#satisfy(Distribution)}.
- *
+ * <p>
 * Concrete implementations until now:
 * <ul>
 *   <li>{@link ClusteredDistribution}</li>
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/partitioning/Partitioning.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/partitioning/Partitioning.java
@ -40,7 +40,7 @@ public interface Partitioning {
  /**
   * Returns true if this partitioning can satisfy the given distribution, which means Spark does
   * not need to shuffle the output data of this data source for some certain operations.
-   *
+   * <p>
   * Note that, Spark may add new concrete implementations of {@link Distribution} in new releases.
   * This method should be aware of it and always return false for unrecognized distributions. It's
   * recommended to check every Spark new release and support new distributions if possible, to
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReader.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReader.java
@ -30,7 +30,7 @@ public interface ContinuousPartitionReader<T> extends PartitionReader<T> {

  /**
   * Get the offset of the current record, or the start offset if no records have been read.
-   *
+   * <p>
   * The execution engine will call this method along with get() to keep track of the current
   * offset. When an epoch ends, the offset of the previous record in each partition will be saved
   * as a restart checkpoint.
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousStream.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousStream.java
@ -62,7 +62,7 @@ public interface ContinuousStream extends SparkDataStream {
   * The execution engine will call this method in every epoch to determine if new input
   * partitions need to be generated, which may be required if for example the underlying
   * source system has had partitions added or removed.
-   *
+   * <p>
   * If true, the Spark job to scan this continuous data stream will be interrupted and Spark will
   * launch it again with a new list of {@link InputPartition input partitions}.
   */
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/Offset.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/Offset.java
@ -22,6 +22,7 @@ import org.apache.spark.annotation.Evolving;
 /**
 * An abstract representation of progress through a {@link MicroBatchStream} or
 * {@link ContinuousStream}.
+ * <p>
 * During execution, offsets provided by the data source implementation will be logged and used as
 * restart checkpoints. Each source should provide an offset implementation which the source can use
 * to reconstruct a position in the stream up to which data has been seen/processed.
@ -33,6 +34,7 @@ public abstract class Offset {
    /**
     * A JSON-serialized representation of an Offset that is
     * used for saving offsets to the offset log.
+     * <p>
     * Note: We assume that equivalent/equal offsets serialize to
     * identical JSON strings.
     *
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/PartitionOffset.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/PartitionOffset.java
@ -24,7 +24,7 @@ import org.apache.spark.annotation.Evolving;
 /**
 * Used for per-partition offsets in continuous processing. ContinuousReader implementations will
 * provide a method to merge these into a global Offset.
- *
+ * <p>
 * These offsets must be serializable.
 *
 * @since 3.0.0
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ReadAllAvailable.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ReadAllAvailable.java
@ -22,7 +22,7 @@ import org.apache.spark.annotation.Evolving;
 /**
 * Represents a {@link ReadLimit} where the {@link MicroBatchStream} must scan all the data
 * available at the streaming source. This is meant to be a hard specification as being able
- * to return all available data is necessary for Trigger.Once() to work correctly.
+ * to return all available data is necessary for {@code Trigger.Once()} to work correctly.
 * If a source is unable to scan all available data, then it must throw an error.
 *
 * @see SupportsAdmissionControl#latestOffset(Offset, ReadLimit)
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/SparkDataStream.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/SparkDataStream.java
@ -22,7 +22,7 @@ import org.apache.spark.annotation.Evolving;
 /**
 * The base interface representing a readable data stream in a Spark streaming query. It's
 * responsible to manage the offsets of the streaming source in the streaming query.
- *
+ * <p>
 * Data sources should implement concrete data stream interfaces:
 * {@link MicroBatchStream} and {@link ContinuousStream}.
 *
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/SupportsAdmissionControl.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/SupportsAdmissionControl.java
@ -25,7 +25,7 @@ import org.apache.spark.annotation.Evolving;
 * contract of triggers, e.g. Trigger.Once() requires that a micro-batch process all data
 * available to the system at the start of the micro-batch. Alternatively, sources can decide to
 * limit ingest through data source options.
- *
+ * <p>
 * Through this interface, a MicroBatchStream should be able to return the next offset that it will
 * process until given a {@link ReadLimit}.
 *
@ -44,11 +44,11 @@ public interface SupportsAdmissionControl extends SparkDataStream {
   * Returns the most recent offset available given a read limit. The start offset can be used
   * to figure out how much new data should be read given the limit. Users should implement this
   * method instead of latestOffset for a MicroBatchStream or getOffset for Source.
-   *
+   * <p>
   * When this method is called on a `Source`, the source can return `null` if there is no
   * data to process. In addition, for the very first micro-batch, the `startOffset` will be
   * null as well.
-   *
+   * <p>
   * When this method is called on a MicroBatchStream, the `startOffset` will be `initialOffset`
   * for the very first micro-batch. The source can return `null` if there is no data to process.
   */
@ -56,7 +56,7 @@ public interface SupportsAdmissionControl extends SparkDataStream {

  /**
   * Returns the most recent offset available.
-   *
+   * <p>
   * The source can return `null`, if there is no data to process or the source does not support
   * to this method.
   */
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/BatchWrite.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/BatchWrite.java
@ -21,20 +21,22 @@ import org.apache.spark.annotation.Evolving;

 /**
 * An interface that defines how to write the data to data source for batch processing.
- *
+ * <p>
 * The writing procedure is:
- *   1. Create a writer factory by {@link #createBatchWriterFactory(PhysicalWriteInfo)}, serialize
- *      and send it to all the partitions of the input data(RDD).
- *   2. For each partition, create the data writer, and write the data of the partition with this
- *      writer. If all the data are written successfully, call {@link DataWriter#commit()}. If
- *      exception happens during the writing, call {@link DataWriter#abort()}.
- *   3. If all writers are successfully committed, call {@link #commit(WriterCommitMessage[])}. If
- *      some writers are aborted, or the job failed with an unknown reason, call
- *      {@link #abort(WriterCommitMessage[])}.
- *
+ * <ol>
+ *   <li>Create a writer factory by {@link #createBatchWriterFactory(PhysicalWriteInfo)}, serialize
+ *     and send it to all the partitions of the input data(RDD).</li>
+ *   <li>For each partition, create the data writer, and write the data of the partition with this
+ *     writer. If all the data are written successfully, call {@link DataWriter#commit()}. If
+ *     exception happens during the writing, call {@link DataWriter#abort()}.</li>
+ *   <li>If all writers are successfully committed, call {@link #commit(WriterCommitMessage[])}. If
+ *     some writers are aborted, or the job failed with an unknown reason, call
+ *     {@link #abort(WriterCommitMessage[])}.</li>
+ * </ol>
+ * <p>
 * While Spark will retry failed writing tasks, Spark won't retry failed writing jobs. Users should
 * do it manually in their Spark applications if they want to retry.
- *
+ * <p>
 * Please refer to the documentation of commit/abort methods for detailed specifications.
 *
 * @since 3.0.0
@ -44,7 +46,7 @@ public interface BatchWrite {

  /**
   * Creates a writer factory which will be serialized and sent to executors.
-   *
+   * <p>
   * If this method fails (by throwing an exception), the action will fail and no Spark job will be
   * submitted.
   *
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DataWriter.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DataWriter.java
@ -25,17 +25,17 @@ import org.apache.spark.annotation.Evolving;
 /**
 * A data writer returned by {@link DataWriterFactory#createWriter(int, long)} and is
 * responsible for writing data for an input RDD partition.
- *
+ * <p>
 * One Spark task has one exclusive data writer, so there is no thread-safe concern.
- *
+ * <p>
 * {@link #write(Object)} is called for each record in the input RDD partition. If one record fails
 * the {@link #write(Object)}, {@link #abort()} is called afterwards and the remaining records will
 * not be processed. If all records are successfully written, {@link #commit()} is called.
- *
+ * <p>
 * Once a data writer returns successfully from {@link #commit()} or {@link #abort()}, Spark will
 * call {@link #close()} to let DataWriter doing resource cleanup. After calling {@link #close()},
 * its lifecycle is over and Spark will not use it again.
- *
+ * <p>
 * If this data writer succeeds(all records are successfully written and {@link #commit()}
 * succeeds), a {@link WriterCommitMessage} will be sent to the driver side and pass to
 * {@link BatchWrite#commit(WriterCommitMessage[])} with commit messages from other data
@ -44,7 +44,7 @@ import org.apache.spark.annotation.Evolving;
 * In each retry, {@link DataWriterFactory#createWriter(int, long)} will receive a
 * different `taskId`. Spark will call {@link BatchWrite#abort(WriterCommitMessage[])}
 * when the configured number of retries is exhausted.
- *
+ * <p>
 * Besides the retry mechanism, Spark may launch speculative tasks if the existing writing task
 * takes too long to finish. Different from retried tasks, which are launched one by one after the
 * previous one fails, speculative tasks are running simultaneously. It's possible that one input
@ -54,8 +54,9 @@ import org.apache.spark.annotation.Evolving;
 * these data writers can commit successfully. Or implementations can allow all of them to commit
 * successfully, and have a way to revert committed data writers without the commit message, because
 * Spark only accepts the commit message that arrives first and ignore others.
- *
- * Note that, Currently the type `T` can only be {@link org.apache.spark.sql.catalyst.InternalRow}.
+ * <p>
+ * Note that, Currently the type {@code T} can only be
+ * {@link org.apache.spark.sql.catalyst.InternalRow}.
 *
 * @since 3.0.0
 */
@ -64,7 +65,7 @@ public interface DataWriter<T> extends Closeable {

  /**
   * Writes one record.
-   *
+   * <p>
   * If this method fails (by throwing an exception), {@link #abort()} will be called and this
   * data writer is considered to have been failed.
   *
@ -76,12 +77,12 @@ public interface DataWriter<T> extends Closeable {
   * Commits this writer after all records are written successfully, returns a commit message which
   * will be sent back to driver side and passed to
   * {@link BatchWrite#commit(WriterCommitMessage[])}.
-   *
+   * <p>
   * The written data should only be visible to data source readers after
   * {@link BatchWrite#commit(WriterCommitMessage[])} succeeds, which means this method
   * should still "hide" the written data and ask the {@link BatchWrite} at driver side to
   * do the final commit via {@link WriterCommitMessage}.
-   *
+   * <p>
   * If this method fails (by throwing an exception), {@link #abort()} will be called and this
   * data writer is considered to have been failed.
   *
@ -92,10 +93,10 @@ public interface DataWriter<T> extends Closeable {
  /**
   * Aborts this writer if it is failed. Implementations should clean up the data for already
   * written records.
-   *
+   * <p>
   * This method will only be called if there is one record failed to write, or {@link #commit()}
   * failed.
-   *
+   * <p>
   * If this method fails(by throwing an exception), the underlying data source may have garbage
   * that need to be cleaned by {@link BatchWrite#abort(WriterCommitMessage[])} or manually,
   * but these garbage should not be visible to data source readers.
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DataWriterFactory.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DataWriterFactory.java
@ -27,7 +27,7 @@ import org.apache.spark.sql.catalyst.InternalRow;
 * A factory of {@link DataWriter} returned by
 * {@link BatchWrite#createBatchWriterFactory(PhysicalWriteInfo)}, which is responsible for
 * creating and initializing the actual data writer at executor side.
- *
+ * <p>
 * Note that, the writer factory will be serialized and sent to executors, then the data writer
 * will be created on executors and do the actual writing. So this interface must be
 * serializable and {@link DataWriter} doesn't need to be.
@ -42,7 +42,7 @@ public interface DataWriterFactory extends Serializable {
   * object instance when sending data to the data writer, for better performance. Data writers
   * are responsible for defensive copies if necessary, e.g. copy the data before buffer it in a
   * list.
-   *
+   * <p>
   * If this method fails (by throwing an exception), the corresponding Spark write task would fail
   * and get retried until hitting the maximum retry times.
   *
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java
@ -35,7 +35,7 @@ public interface LogicalWriteInfo {
  CaseInsensitiveStringMap options();

  /**
-   * `queryId` is a unique string of the query. It's possible that there are many queries
+   * {@code queryId} is a unique string of the query. It's possible that there are many queries
   * running at the same time, or a query is restarted and resumed. {@link BatchWrite} can use
   * this id to identify the query.
   */
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/WriteBuilder.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/WriteBuilder.java
@ -23,7 +23,7 @@ import org.apache.spark.sql.connector.write.streaming.StreamingWrite;
 /**
 * An interface for building the {@link Write}. Implementations can mix in some interfaces to
 * support different ways to write data to data sources.
- *
+ * <p>
 * Unless modified by a mixin interface, the {@link Write} configured by this builder is to
 * append data without affecting existing data.
 *
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/WriterCommitMessage.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/WriterCommitMessage.java
@ -26,7 +26,7 @@ import org.apache.spark.sql.connector.write.streaming.StreamingWrite;
 * A commit message returned by {@link DataWriter#commit()} and will be sent back to the driver side
 * as the input parameter of {@link BatchWrite#commit(WriterCommitMessage[])} or
 * {@link StreamingWrite#commit(long, WriterCommitMessage[])}.
- *
+ * <p>
 * This is an empty interface, data sources should define their own message class and use it when
 * generating messages at executor side and handling the messages at driver side.
 *
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/streaming/StreamingDataWriterFactory.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/streaming/StreamingDataWriterFactory.java
@ -29,7 +29,7 @@ import org.apache.spark.sql.connector.write.PhysicalWriteInfo;
 * A factory of {@link DataWriter} returned by
 * {@link StreamingWrite#createStreamingWriterFactory(PhysicalWriteInfo)}, which is responsible for
 * creating and initializing the actual data writer at executor side.
- *
+ * <p>
 * Note that, the writer factory will be serialized and sent to executors, then the data writer
 * will be created on executors and do the actual writing. So this interface must be
 * serializable and {@link DataWriter} doesn't need to be.
@ -44,7 +44,7 @@ public interface StreamingDataWriterFactory extends Serializable {
   * object instance when sending data to the data writer, for better performance. Data writers
   * are responsible for defensive copies if necessary, e.g. copy the data before buffer it in a
   * list.
-   *
+   * <p>
   * If this method fails (by throwing an exception), the corresponding Spark write task would fail
   * and get retried until hitting the maximum retry times.
   *
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/streaming/StreamingWrite.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/streaming/StreamingWrite.java
@ -26,19 +26,21 @@ import org.apache.spark.sql.connector.write.WriterCommitMessage;
 * An interface that defines how to write the data to data source in streaming queries.
 *
 * The writing procedure is:
- *   1. Create a writer factory by {@link #createStreamingWriterFactory(PhysicalWriteInfo)},
- *      serialize and send it to all the partitions of the input data(RDD).
- *   2. For each epoch in each partition, create the data writer, and write the data of the epoch in
- *      the partition with this writer. If all the data are written successfully, call
- *      {@link DataWriter#commit()}. If exception happens during the writing, call
- *      {@link DataWriter#abort()}.
- *   3. If writers in all partitions of one epoch are successfully committed, call
- *      {@link #commit(long, WriterCommitMessage[])}. If some writers are aborted, or the job failed
- *      with an unknown reason, call {@link #abort(long, WriterCommitMessage[])}.
- *
+ * <ol>
+ *   <li>Create a writer factory by {@link #createStreamingWriterFactory(PhysicalWriteInfo)},
+ *   serialize and send it to all the partitions of the input data(RDD).</li>
+ *   <li>For each epoch in each partition, create the data writer, and write the data of the
+ *   epoch in the partition with this writer. If all the data are written successfully, call
+ *   {@link DataWriter#commit()}. If exception happens during the writing, call
+ *   {@link DataWriter#abort()}.</li>
+ *   <li>If writers in all partitions of one epoch are successfully committed, call
+ *   {@link #commit(long, WriterCommitMessage[])}. If some writers are aborted, or the job failed
+ *   with an unknown reason, call {@link #abort(long, WriterCommitMessage[])}.</li>
+ * </ol>
+ * <p>
 * While Spark will retry failed writing tasks, Spark won't retry failed writing jobs. Users should
 * do it manually in their Spark applications if they want to retry.
- *
+ * <p>
 * Please refer to the documentation of commit/abort methods for detailed specifications.
 *
 * @since 3.0.0
@ -48,7 +50,7 @@ public interface StreamingWrite {

  /**
   * Creates a writer factory which will be serialized and sent to executors.
-   *
+   * <p>
   * If this method fails (by throwing an exception), the action will fail and no Spark job will be
   * submitted.
   *
@ -60,14 +62,14 @@ public interface StreamingWrite {
   * Commits this writing job for the specified epoch with a list of commit messages. The commit
   * messages are collected from successful data writers and are produced by
   * {@link DataWriter#commit()}.
-   *
+   * <p>
   * If this method fails (by throwing an exception), this writing job is considered to have been
   * failed, and the execution engine will attempt to call
   * {@link #abort(long, WriterCommitMessage[])}.
-   *
-   * The execution engine may call `commit` multiple times for the same epoch in some circumstances.
-   * To support exactly-once data semantics, implementations must ensure that multiple commits for
-   * the same epoch are idempotent.
+   * <p>
+   * The execution engine may call {@code commit} multiple times for the same epoch in some
+   * circumstances. To support exactly-once data semantics, implementations must ensure that
+   * multiple commits for the same epoch are idempotent.
   */
  void commit(long epochId, WriterCommitMessage[] messages);

@ -75,10 +77,10 @@ public interface StreamingWrite {
   * Aborts this writing job because some data writers are failed and keep failing when retried, or
   * the Spark job fails with some unknown reasons, or {@link #commit(long, WriterCommitMessage[])}
   * fails.
-   *
+   * <p>
   * If this method fails (by throwing an exception), the underlying data source may require manual
   * cleanup.
-   *
+   * <p>
   * Unless the abort is triggered by the failure of commit, the given messages will have some
   * null slots, as there may be only a few data writers that were committed before the abort
   * happens, or some data writers were committed but their commit messages haven't reached the
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/streaming/GroupStateTimeout.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/streaming/GroupStateTimeout.java
@ -23,8 +23,9 @@ import org.apache.spark.sql.catalyst.plans.logical.*;

 /**
 * Represents the type of timeouts possible for the Dataset operations
- * `mapGroupsWithState` and `flatMapGroupsWithState`. See documentation on
- * `GroupState` for more details.
+ * {@code mapGroupsWithState} and {@code flatMapGroupsWithState}.
+ * <p>
+ * See documentation on {@code GroupState} for more details.
 *
 * @since 2.2.0
 */
@ -33,21 +34,29 @@ import org.apache.spark.sql.catalyst.plans.logical.*;
 public class GroupStateTimeout {

  /**
-   * Timeout based on processing time. The duration of timeout can be set for each group in
-   * `map/flatMapGroupsWithState` by calling `GroupState.setTimeoutDuration()`. See documentation
-   * on `GroupState` for more details.
+   * Timeout based on processing time.
+   * <p>
+   * The duration of timeout can be set for each group in
+   * {@code map/flatMapGroupsWithState} by calling {@code GroupState.setTimeoutDuration()}.
+   * <p>
+   * See documentation on {@code GroupState} for more details.
   */
  public static GroupStateTimeout ProcessingTimeTimeout() {
    return ProcessingTimeTimeout$.MODULE$;
  }

  /**
-   * Timeout based on event-time. The event-time timestamp for timeout can be set for each
-   * group in `map/flatMapGroupsWithState` by calling `GroupState.setTimeoutTimestamp()`.
-   * In addition, you have to define the watermark in the query using `Dataset.withWatermark`.
+   * Timeout based on event-time.
+   * <p>
+   * The event-time timestamp for timeout can be set for each
+   * group in {@code map/flatMapGroupsWithState} by calling
+   * {@code GroupState.setTimeoutTimestamp()}.
+   * In addition, you have to define the watermark in the query using
+   * {@code Dataset.withWatermark}.
   * When the watermark advances beyond the set timestamp of a group and the group has not
-   * received any data, then the group times out. See documentation on
-   * `GroupState` for more details.
+   * received any data, then the group times out.
+   * <p>
+   * See documentation on {@code GroupState} for more details.
   */
  public static GroupStateTimeout EventTimeTimeout() { return EventTimeTimeout$.MODULE$; }

--- a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java
@ -26,22 +26,23 @@ import org.apache.spark.unsafe.types.UTF8String;
 * An interface representing in-memory columnar data in Spark. This interface defines the main APIs
 * to access the data, as well as their batched versions. The batched versions are considered to be
 * faster and preferable whenever possible.
- *
+ * <p>
 * Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values
 * in this ColumnVector.
- *
- * Spark only calls specific `get` method according to the data type of this {@link ColumnVector},
+ * <p>
+ * Spark only calls specific {@code get} method according to the data type of this
+ * {@link ColumnVector},
 * e.g. if it's int type, Spark is guaranteed to only call {@link #getInt(int)} or
 * {@link #getInts(int, int)}.
- *
+ * <p>
 * ColumnVector supports all the data types including nested types. To handle nested types,
 * ColumnVector can have children and is a tree structure. Please refer to {@link #getStruct(int)},
 * {@link #getArray(int)} and {@link #getMap(int)} for the details about how to implement nested
 * types.
- *
+ * <p>
 * ColumnVector is expected to be reused during the entire data loading process, to avoid allocating
 * memory again and again.
- *
+ * <p>
 * ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint.
 * Implementations should prefer computing efficiency over storage efficiency when design the
 * format. Since it is expected to reuse the ColumnVector instance while loading data, the storage
@ -57,9 +58,10 @@ public abstract class ColumnVector implements AutoCloseable {

  /**
   * Cleans up memory for this column vector. The column vector is not usable after this.
-   *
-   * This overwrites `AutoCloseable.close` to remove the `throws` clause, as column vector is
-   * in-memory and we don't expect any exception to happen during closing.
+   * <p>
+   * This overwrites {@link AutoCloseable#close} to remove the
+   * {@code throws} clause, as column vector is in-memory and we don't expect any exception to
+   * happen during closing.
   */
  @Override
  public abstract void close();
@ -75,19 +77,19 @@ public abstract class ColumnVector implements AutoCloseable {
  public abstract int numNulls();

  /**
-   * Returns whether the value at rowId is NULL.
+   * Returns whether the value at {@code rowId} is NULL.
   */
  public abstract boolean isNullAt(int rowId);

  /**
-   * Returns the boolean type value for rowId. The return value is undefined and can be anything,
-   * if the slot for rowId is null.
+   * Returns the boolean type value for {@code rowId}. The return value is undefined and can be
+   * anything, if the slot for {@code rowId} is null.
   */
  public abstract boolean getBoolean(int rowId);

  /**
-   * Gets boolean type values from [rowId, rowId + count). The return values for the null slots
-   * are undefined and can be anything.
+   * Gets boolean type values from {@code [rowId, rowId + count)}. The return values for the null
+   * slots are undefined and can be anything.
   */
  public boolean[] getBooleans(int rowId, int count) {
    boolean[] res = new boolean[count];
@ -98,13 +100,13 @@ public abstract class ColumnVector implements AutoCloseable {
  }

  /**
-   * Returns the byte type value for rowId. The return value is undefined and can be anything,
-   * if the slot for rowId is null.
+   * Returns the byte type value for {@code rowId}. The return value is undefined and can be
+   * anything, if the slot for {@code rowId} is null.
   */
  public abstract byte getByte(int rowId);

  /**
-   * Gets byte type values from [rowId, rowId + count). The return values for the null slots
+   * Gets byte type values from {@code [rowId, rowId + count)}. The return values for the null slots
   * are undefined and can be anything.
   */
  public byte[] getBytes(int rowId, int count) {
@ -116,14 +118,14 @@ public abstract class ColumnVector implements AutoCloseable {
  }

  /**
-   * Returns the short type value for rowId. The return value is undefined and can be anything,
-   * if the slot for rowId is null.
+   * Returns the short type value for {@code rowId}. The return value is undefined and can be
+   * anything, if the slot for {@code rowId} is null.
   */
  public abstract short getShort(int rowId);

  /**
-   * Gets short type values from [rowId, rowId + count). The return values for the null slots
-   * are undefined and can be anything.
+   * Gets short type values from {@code [rowId, rowId + count)}. The return values for the null
+   * slots are undefined and can be anything.
   */
  public short[] getShorts(int rowId, int count) {
    short[] res = new short[count];
@ -134,13 +136,13 @@ public abstract class ColumnVector implements AutoCloseable {
  }

  /**
-   * Returns the int type value for rowId. The return value is undefined and can be anything,
-   * if the slot for rowId is null.
+   * Returns the int type value for {@code rowId}. The return value is undefined and can be
+   * anything, if the slot for {@code rowId} is null.
   */
  public abstract int getInt(int rowId);

  /**
-   * Gets int type values from [rowId, rowId + count). The return values for the null slots
+   * Gets int type values from {@code [rowId, rowId + count)}. The return values for the null slots
   * are undefined and can be anything.
   */
  public int[] getInts(int rowId, int count) {
@ -152,13 +154,13 @@ public abstract class ColumnVector implements AutoCloseable {
  }

  /**
-   * Returns the long type value for rowId. The return value is undefined and can be anything,
-   * if the slot for rowId is null.
+   * Returns the long type value for {@code rowId}. The return value is undefined and can be
+   * anything, if the slot for {@code rowId} is null.
   */
  public abstract long getLong(int rowId);

  /**
-   * Gets long type values from [rowId, rowId + count). The return values for the null slots
+   * Gets long type values from {@code [rowId, rowId + count)}. The return values for the null slots
   * are undefined and can be anything.
   */
  public long[] getLongs(int rowId, int count) {
@ -170,14 +172,14 @@ public abstract class ColumnVector implements AutoCloseable {
  }

  /**
-   * Returns the float type value for rowId. The return value is undefined and can be anything,
-   * if the slot for rowId is null.
+   * Returns the float type value for {@code rowId}. The return value is undefined and can be
+   * anything, if the slot for {@code rowId} is null.
   */
  public abstract float getFloat(int rowId);

  /**
-   * Gets float type values from [rowId, rowId + count). The return values for the null slots
-   * are undefined and can be anything.
+   * Gets float type values from {@code [rowId, rowId + count)}. The return values for the null
+   * slots are undefined and can be anything.
   */
  public float[] getFloats(int rowId, int count) {
    float[] res = new float[count];
@ -188,14 +190,14 @@ public abstract class ColumnVector implements AutoCloseable {
  }

  /**
-   * Returns the double type value for rowId. The return value is undefined and can be anything,
-   * if the slot for rowId is null.
+   * Returns the double type value for {@code rowId}. The return value is undefined and can be
+   * anything, if the slot for {@code rowId} is null.
   */
  public abstract double getDouble(int rowId);

  /**
-   * Gets double type values from [rowId, rowId + count). The return values for the null slots
-   * are undefined and can be anything.
+   * Gets double type values from {@code [rowId, rowId + count)}. The return values for the null
+   * slots are undefined and can be anything.
   */
  public double[] getDoubles(int rowId, int count) {
    double[] res = new double[count];
@ -206,8 +208,9 @@ public abstract class ColumnVector implements AutoCloseable {
  }

  /**
-   * Returns the struct type value for rowId. If the slot for rowId is null, it should return null.
-   *
+   * Returns the struct type value for {@code rowId}. If the slot for {@code rowId} is null, it
+   * should return null.
+   * <p>
   * To support struct type, implementations must implement {@link #getChild(int)} and make this
   * vector a tree structure. The number of child vectors must be same as the number of fields of
   * the struct type, and each child vector is responsible to store the data for its corresponding
@ -219,8 +222,9 @@ public abstract class ColumnVector implements AutoCloseable {
  }

  /**
-   * Returns the array type value for rowId. If the slot for rowId is null, it should return null.
-   *
+   * Returns the array type value for {@code rowId}. If the slot for {@code rowId} is null, it
+   * should return null.
+   * <p>
   * To support array type, implementations must construct an {@link ColumnarArray} and return it in
   * this method. {@link ColumnarArray} requires a {@link ColumnVector} that stores the data of all
   * the elements of all the arrays in this vector, and an offset and length which points to a range
@ -232,12 +236,13 @@ public abstract class ColumnVector implements AutoCloseable {
  public abstract ColumnarArray getArray(int rowId);

  /**
-   * Returns the map type value for rowId. If the slot for rowId is null, it should return null.
-   *
+   * Returns the map type value for {@code rowId}. If the slot for {@code rowId} is null, it
+   * should return null.
+   * <p>
   * In Spark, map type value is basically a key data array and a value data array. A key from the
   * key array with a index and a value from the value array with the same index contribute to
   * an entry of this map type value.
-   *
+   * <p>
   * To support map type, implementations must construct a {@link ColumnarMap} and return it in
   * this method. {@link ColumnarMap} requires a {@link ColumnVector} that stores the data of all
   * the keys of all the maps in this vector, and another {@link ColumnVector} that stores the data
@ -247,31 +252,35 @@ public abstract class ColumnVector implements AutoCloseable {
  public abstract ColumnarMap getMap(int ordinal);

  /**
-   * Returns the decimal type value for rowId. If the slot for rowId is null, it should return null.
+   * Returns the decimal type value for {@code rowId}. If the slot for {@code rowId} is null, it
+   * should return null.
   */
  public abstract Decimal getDecimal(int rowId, int precision, int scale);

  /**
-   * Returns the string type value for rowId. If the slot for rowId is null, it should return null.
-   * Note that the returned UTF8String may point to the data of this column vector, please copy it
-   * if you want to keep it after this column vector is freed.
+   * Returns the string type value for {@code rowId}. If the slot for {@code rowId} is null, it
+   * should return null.
+   * <p>
+   * Note that the returned {@link UTF8String} may point to the data of this column vector,
+   * please copy it if you want to keep it after this column vector is freed.
   */
  public abstract UTF8String getUTF8String(int rowId);

  /**
-   * Returns the binary type value for rowId. If the slot for rowId is null, it should return null.
+   * Returns the binary type value for {@code rowId}. If the slot for {@code rowId} is null, it
+   * should return null.
   */
  public abstract byte[] getBinary(int rowId);

  /**
-   * Returns the calendar interval type value for rowId. If the slot for rowId is null, it should
-   * return null.
-   *
+   * Returns the calendar interval type value for {@code rowId}. If the slot for
+   * {@code rowId} is null, it should return null.
+   * <p>
   * In Spark, calendar interval type value is basically two integer values representing the number
   * of months and days in this interval, and a long value representing the number of microseconds
   * in this interval. An interval type vector is the same as a struct type vector with 3 fields:
-   * `months`, `days` and `microseconds`.
-   *
+   * {@code months}, {@code days} and {@code microseconds}.
+   * <p>
   * To support interval type, implementations must implement {@link #getChild(int)} and define 3
   * child vectors: the first child vector is an int type vector, containing all the month values of
   * all the interval values in this vector. The second child vector is an int type vector,
@ -288,7 +297,7 @@ public abstract class ColumnVector implements AutoCloseable {
  }

  /**
-   * @return child [[ColumnVector]] at the given ordinal.
+   * @return child {@link ColumnVector} at the given ordinal.
   */
  public abstract ColumnVector getChild(int ordinal);