diff --git a/docs/sql-data-sources-hive-tables.md b/docs/sql-data-sources-hive-tables.md index e4ce3e938b..f99b064949 100644 --- a/docs/sql-data-sources-hive-tables.md +++ b/docs/sql-data-sources-hive-tables.md @@ -88,17 +88,17 @@ creating table, you can create a table using storage handler at Hive side, and u inputFormat, outputFormat - These 2 options specify the name of a corresponding `InputFormat` and `OutputFormat` class as a string literal, - e.g. `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`. These 2 options must be appeared in a pair, and you can not - specify them if you already specified the `fileFormat` option. + These 2 options specify the name of a corresponding InputFormat and OutputFormat class as a string literal, + e.g. org.apache.hadoop.hive.ql.io.orc.OrcInputFormat. These 2 options must be appeared in a pair, and you can not + specify them if you already specified the fileFormat option. serde - This option specifies the name of a serde class. When the `fileFormat` option is specified, do not specify this option - if the given `fileFormat` already include the information of serde. Currently "sequencefile", "textfile" and "rcfile" + This option specifies the name of a serde class. When the fileFormat option is specified, do not specify this option + if the given fileFormat already include the information of serde. Currently "sequencefile", "textfile" and "rcfile" don't include the serde information and you can use this option with these 3 fileFormats. diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md index c3502cbdea..b0d37b11c7 100644 --- a/docs/sql-data-sources-jdbc.md +++ b/docs/sql-data-sources-jdbc.md @@ -60,7 +60,7 @@ the following case-insensitive options: The JDBC table that should be read from or written into. Note that when using it in the read path anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses. It is not - allowed to specify `dbtable` and `query` options at the same time. + allowed to specify dbtable and query options at the same time. @@ -72,10 +72,10 @@ the following case-insensitive options: SELECT <columns> FROM (<user_specified_query>) spark_gen_alias

Below are a couple of restrictions while using this option.
    -
  1. It is not allowed to specify `dbtable` and `query` options at the same time.
  2. -
  3. It is not allowed to specify `query` and `partitionColumn` options at the same time. When specifying - `partitionColumn` option is required, the subquery can be specified using `dbtable` option instead and - partition columns can be qualified using the subquery alias provided as part of `dbtable`.
    +
  4. It is not allowed to specify dbtable and query options at the same time.
  5. +
  6. It is not allowed to specify query and partitionColumn options at the same time. When specifying + partitionColumn option is required, the subquery can be specified using dbtable option instead and + partition columns can be qualified using the subquery alias provided as part of dbtable.
    Example:
    spark.read.format("jdbc")
    diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md index b5309870f4..53a1111cd8 100644 --- a/docs/sql-data-sources-parquet.md +++ b/docs/sql-data-sources-parquet.md @@ -280,12 +280,12 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession spark.sql.parquet.compression.codec snappy - Sets the compression codec used when writing Parquet files. If either `compression` or - `parquet.compression` is specified in the table-specific options/properties, the precedence would be - `compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include: + Sets the compression codec used when writing Parquet files. If either compression or + parquet.compression is specified in the table-specific options/properties, the precedence would be + compression, parquet.compression, spark.sql.parquet.compression.codec. Acceptable values include: none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd. - Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop 2.9.0, `brotli` requires - `BrotliCodec` to be installed. + Note that zstd requires ZStandardCodec to be installed before Hadoop 2.9.0, brotli requires + BrotliCodec to be installed. diff --git a/docs/structured-streaming-kafka-integration.md b/docs/structured-streaming-kafka-integration.md index badf042954..8c17de92f3 100644 --- a/docs/structured-streaming-kafka-integration.md +++ b/docs/structured-streaming-kafka-integration.md @@ -473,8 +473,8 @@ The following configurations are optional: Desired minimum number of partitions to read from Kafka. By default, Spark has a 1-1 mapping of topicPartitions to Spark partitions consuming from Kafka. If you set this option to a value greater than your topicPartitions, Spark will divvy up large - Kafka partitions to smaller pieces. Please note that this configuration is like a `hint`: the - number of Spark tasks will be **approximately** `minPartitions`. It can be less or more depending on + Kafka partitions to smaller pieces. Please note that this configuration is like a hint: the + number of Spark tasks will be approximately minPartitions. It can be less or more depending on rounding errors or Kafka partitions that didn't receive any new data. @@ -482,7 +482,7 @@ The following configurations are optional: string spark-kafka-source streaming and batch - Prefix of consumer group identifiers (`group.id`) that are generated by structured streaming + Prefix of consumer group identifiers (group.id) that are generated by structured streaming queries. If "kafka.group.id" is set, this option will be ignored. diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 2a405f36fd..4abdf2afcb 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1717,7 +1717,7 @@ Here is the compatibility matrix. Append, Update, Complete Append mode uses watermark to drop old aggregation state. But the output of a - windowed aggregation is delayed the late threshold specified in `withWatermark()` as by + windowed aggregation is delayed the late threshold specified in withWatermark() as by the modes semantics, rows can be added to the Result Table only once after they are finalized (i.e. after watermark is crossed). See the Late Data section for more details. @@ -2324,7 +2324,7 @@ Here are the different kinds of triggers that are supported. One-time micro-batch - The query will execute *only one* micro-batch to process all the available data and then + The query will execute only one micro-batch to process all the available data and then stop on its own. This is useful in scenarios you want to periodically spin up a cluster, process everything that is available since the last period, and then shutdown the cluster. In some case, this may lead to significant cost savings.