diff --git a/docs/sql-data-sources-hive-tables.md b/docs/sql-data-sources-hive-tables.md index e4ce3e938b..f99b064949 100644 --- a/docs/sql-data-sources-hive-tables.md +++ b/docs/sql-data-sources-hive-tables.md @@ -88,17 +88,17 @@ creating table, you can create a table using storage handler at Hive side, and u
inputFormat, outputFormat
InputFormat
and OutputFormat
class as a string literal,
+ e.g. org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
. These 2 options must be appeared in a pair, and you can not
+ specify them if you already specified the fileFormat
option.
serde
fileFormat
option is specified, do not specify this option
+ if the given fileFormat
already include the information of serde. Currently "sequencefile", "textfile" and "rcfile"
don't include the serde information and you can use this option with these 3 fileFormats.
FROM
clause of a SQL query can be used.
For example, instead of a full table you could also use a subquery in parentheses. It is not
- allowed to specify `dbtable` and `query` options at the same time.
+ allowed to specify dbtable
and query
options at the same time.
SELECT <columns> FROM (<user_specified_query>) spark_gen_alias
dbtable
and query
options at the same time. query
and partitionColumn
options at the same time. When specifying
+ partitionColumn
option is required, the subquery can be specified using dbtable
option instead and
+ partition columns can be qualified using the subquery alias provided as part of dbtable
.
spark.read.format("jdbc")
diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md
index b5309870f4..53a1111cd8 100644
--- a/docs/sql-data-sources-parquet.md
+++ b/docs/sql-data-sources-parquet.md
@@ -280,12 +280,12 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
spark.sql.parquet.compression.codec
snappy
- Sets the compression codec used when writing Parquet files. If either `compression` or
- `parquet.compression` is specified in the table-specific options/properties, the precedence would be
- `compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include:
+ Sets the compression codec used when writing Parquet files. If either compression
or
+ parquet.compression
is specified in the table-specific options/properties, the precedence would be
+ compression
, parquet.compression
, spark.sql.parquet.compression.codec
. Acceptable values include:
none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
- Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop 2.9.0, `brotli` requires
- `BrotliCodec` to be installed.
+ Note that zstd
requires ZStandardCodec
to be installed before Hadoop 2.9.0, brotli
requires
+ BrotliCodec
to be installed.
hint
: the
+ number of Spark tasks will be approximately minPartitions
. It can be less or more depending on
rounding errors or Kafka partitions that didn't receive any new data.group.id
) that are generated by structured streaming
queries. If "kafka.group.id" is set, this option will be ignored.withWatermark()
as by
the modes semantics, rows can be added to the Result Table only once after they are
finalized (i.e. after watermark is crossed). See the
Late Data section for more details.
@@ -2324,7 +2324,7 @@ Here are the different kinds of triggers that are supported.