spark-instrumented-optimizer/python/docs/source/reference/pyspark.sql.rst
Dominik Gehl e7a210e5ed [SPARK-36178][PYTHON] List pyspark.sql.catalog APIs in documentation
### What changes were proposed in this pull request?
The pyspark.sql.catalog APIs were missing from the documentation. PR fixes this omission.

### Why are the changes needed?
Documentation consistency

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Documentation change only.

Closes #33392 from dominikgehl/feature/SPARK-36178.

Authored-by: Dominik Gehl <dog@open.ch>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit fe4db74da4)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-07-19 19:49:22 +09:00

634 lines
11 KiB
ReStructuredText

.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
=========
Spark SQL
=========
Core Classes
------------
.. currentmodule:: pyspark.sql
.. autosummary::
:toctree: api/
SparkSession
Catalog
DataFrame
Column
Row
GroupedData
PandasCogroupedOps
DataFrameNaFunctions
DataFrameStatFunctions
Window
Spark Session APIs
------------------
.. currentmodule:: pyspark.sql
The entry point to programming Spark with the Dataset and DataFrame API.
To create a Spark session, you should use ``SparkSession.builder`` attribute.
See also :class:`SparkSession`.
.. autosummary::
:toctree: api/
SparkSession.builder.appName
SparkSession.builder.config
SparkSession.builder.enableHiveSupport
SparkSession.builder.getOrCreate
SparkSession.builder.master
SparkSession.catalog
SparkSession.conf
SparkSession.createDataFrame
SparkSession.getActiveSession
SparkSession.newSession
SparkSession.range
SparkSession.read
SparkSession.readStream
SparkSession.sparkContext
SparkSession.sql
SparkSession.stop
SparkSession.streams
SparkSession.table
SparkSession.udf
SparkSession.version
Configuration
-------------
.. currentmodule:: pyspark.sql.conf
.. autosummary::
:toctree: api/
RuntimeConfig
Input and Output
----------------
.. currentmodule:: pyspark.sql
.. autosummary::
:toctree: api/
DataFrameReader.csv
DataFrameReader.format
DataFrameReader.jdbc
DataFrameReader.json
DataFrameReader.load
DataFrameReader.option
DataFrameReader.options
DataFrameReader.orc
DataFrameReader.parquet
DataFrameReader.schema
DataFrameReader.table
DataFrameWriter.bucketBy
DataFrameWriter.csv
DataFrameWriter.format
DataFrameWriter.insertInto
DataFrameWriter.jdbc
DataFrameWriter.json
DataFrameWriter.mode
DataFrameWriter.option
DataFrameWriter.options
DataFrameWriter.orc
DataFrameWriter.parquet
DataFrameWriter.partitionBy
DataFrameWriter.save
DataFrameWriter.saveAsTable
DataFrameWriter.sortBy
DataFrameWriter.text
DataFrame APIs
--------------
.. currentmodule:: pyspark.sql
.. autosummary::
:toctree: api/
DataFrame.agg
DataFrame.alias
DataFrame.approxQuantile
DataFrame.cache
DataFrame.checkpoint
DataFrame.coalesce
DataFrame.colRegex
DataFrame.collect
DataFrame.columns
DataFrame.corr
DataFrame.count
DataFrame.cov
DataFrame.createGlobalTempView
DataFrame.createOrReplaceGlobalTempView
DataFrame.createOrReplaceTempView
DataFrame.createTempView
DataFrame.crossJoin
DataFrame.crosstab
DataFrame.cube
DataFrame.describe
DataFrame.distinct
DataFrame.drop
DataFrame.dropDuplicates
DataFrame.drop_duplicates
DataFrame.dropna
DataFrame.dtypes
DataFrame.exceptAll
DataFrame.explain
DataFrame.fillna
DataFrame.filter
DataFrame.first
DataFrame.foreach
DataFrame.foreachPartition
DataFrame.freqItems
DataFrame.groupBy
DataFrame.head
DataFrame.hint
DataFrame.inputFiles
DataFrame.intersect
DataFrame.intersectAll
DataFrame.isLocal
DataFrame.isStreaming
DataFrame.join
DataFrame.limit
DataFrame.localCheckpoint
DataFrame.mapInPandas
DataFrame.na
DataFrame.orderBy
DataFrame.persist
DataFrame.printSchema
DataFrame.randomSplit
DataFrame.rdd
DataFrame.registerTempTable
DataFrame.repartition
DataFrame.repartitionByRange
DataFrame.replace
DataFrame.rollup
DataFrame.sameSemantics
DataFrame.sample
DataFrame.sampleBy
DataFrame.schema
DataFrame.select
DataFrame.selectExpr
DataFrame.semanticHash
DataFrame.show
DataFrame.sort
DataFrame.sortWithinPartitions
DataFrame.stat
DataFrame.storageLevel
DataFrame.subtract
DataFrame.summary
DataFrame.tail
DataFrame.take
DataFrame.toDF
DataFrame.toJSON
DataFrame.toLocalIterator
DataFrame.toPandas
DataFrame.transform
DataFrame.union
DataFrame.unionAll
DataFrame.unionByName
DataFrame.unpersist
DataFrame.where
DataFrame.withColumn
DataFrame.withColumnRenamed
DataFrame.withWatermark
DataFrame.write
DataFrame.writeStream
DataFrame.writeTo
DataFrame.to_pandas_on_spark
DataFrameNaFunctions.drop
DataFrameNaFunctions.fill
DataFrameNaFunctions.replace
DataFrameStatFunctions.approxQuantile
DataFrameStatFunctions.corr
DataFrameStatFunctions.cov
DataFrameStatFunctions.crosstab
DataFrameStatFunctions.freqItems
DataFrameStatFunctions.sampleBy
Column APIs
-----------
.. currentmodule:: pyspark.sql
.. autosummary::
:toctree: api/
Column.alias
Column.asc
Column.asc_nulls_first
Column.asc_nulls_last
Column.astype
Column.between
Column.bitwiseAND
Column.bitwiseOR
Column.bitwiseXOR
Column.cast
Column.contains
Column.desc
Column.desc_nulls_first
Column.desc_nulls_last
Column.dropFields
Column.endswith
Column.eqNullSafe
Column.getField
Column.getItem
Column.isNotNull
Column.isNull
Column.isin
Column.like
Column.name
Column.otherwise
Column.over
Column.rlike
Column.startswith
Column.substr
Column.when
Column.withField
Data Types
----------
.. currentmodule:: pyspark.sql.types
.. autosummary::
:template: autosummary/class_with_docs.rst
:toctree: api/
ArrayType
BinaryType
BooleanType
ByteType
DataType
DateType
DecimalType
DoubleType
FloatType
IntegerType
LongType
MapType
NullType
ShortType
StringType
StructField
StructType
TimestampType
Row
---
.. currentmodule:: pyspark.sql
.. autosummary::
:toctree: api/
Row.asDict
Functions
---------
.. currentmodule:: pyspark.sql.functions
.. autosummary::
:toctree: api/
abs
acos
acosh
add_months
aggregate
approxCountDistinct
approx_count_distinct
array
array_contains
array_distinct
array_except
array_intersect
array_join
array_max
array_min
array_position
array_remove
array_repeat
array_sort
array_union
arrays_overlap
arrays_zip
asc
asc_nulls_first
asc_nulls_last
ascii
asin
asinh
assert_true
atan
atanh
atan2
avg
base64
bin
bitwise_not
bitwiseNOT
broadcast
bround
bucket
cbrt
ceil
coalesce
col
collect_list
collect_set
column
concat
concat_ws
conv
corr
cos
cosh
count
count_distinct
countDistinct
covar_pop
covar_samp
crc32
create_map
cume_dist
current_date
current_timestamp
date_add
date_format
date_sub
date_trunc
datediff
dayofmonth
dayofweek
dayofyear
days
decode
degrees
dense_rank
desc
desc_nulls_first
desc_nulls_last
element_at
encode
exists
exp
explode
explode_outer
expm1
expr
factorial
filter
first
flatten
floor
forall
format_number
format_string
from_csv
from_json
from_unixtime
from_utc_timestamp
get_json_object
greatest
grouping
grouping_id
hash
hex
hour
hours
hypot
initcap
input_file_name
instr
isnan
isnull
json_tuple
kurtosis
lag
last
last_day
lead
least
length
levenshtein
lit
locate
log
log10
log1p
log2
lower
lpad
ltrim
map_concat
map_entries
map_filter
map_from_arrays
map_from_entries
map_keys
map_values
map_zip_with
max
md5
mean
min
minute
monotonically_increasing_id
month
months
months_between
nanvl
next_day
nth_value
ntile
overlay
pandas_udf
percent_rank
percentile_approx
posexplode
posexplode_outer
pow
product
quarter
radians
raise_error
rand
randn
rank
regexp_extract
regexp_replace
repeat
reverse
rint
round
row_number
rpad
rtrim
schema_of_csv
schema_of_json
second
sentences
sequence
sha1
sha2
shiftleft
shiftright
shiftrightunsigned
shuffle
signum
sin
sinh
size
skewness
slice
sort_array
soundex
spark_partition_id
split
sqrt
stddev
stddev_pop
stddev_samp
struct
substring
substring_index
sum
sum_distinct
sumDistinct
tan
tanh
timestamp_seconds
toDegrees
toRadians
to_csv
to_date
to_json
to_timestamp
to_utc_timestamp
transform
transform_keys
transform_values
translate
trim
trunc
udf
unbase64
unhex
unix_timestamp
upper
var_pop
var_samp
variance
weekofyear
when
window
xxhash64
year
years
zip_with
.. currentmodule:: pyspark.sql.avro.functions
.. autosummary::
:toctree: api/
from_avro
to_avro
Window
------
.. currentmodule:: pyspark.sql
.. autosummary::
:toctree: api/
Window.currentRow
Window.orderBy
Window.partitionBy
Window.rangeBetween
Window.rowsBetween
Window.unboundedFollowing
Window.unboundedPreceding
WindowSpec.orderBy
WindowSpec.partitionBy
WindowSpec.rangeBetween
WindowSpec.rowsBetween
Grouping
--------
.. currentmodule:: pyspark.sql
.. autosummary::
:toctree: api/
GroupedData.agg
GroupedData.apply
GroupedData.applyInPandas
GroupedData.avg
GroupedData.cogroup
GroupedData.count
GroupedData.max
GroupedData.mean
GroupedData.min
GroupedData.pivot
GroupedData.sum
PandasCogroupedOps.applyInPandas
Catalog APIs
------------
.. currentmodule:: pyspark.sql
.. autosummary::
:toctree: api/
Catalog.cacheTable
Catalog.clearCache
Catalog.createExternalTable
Catalog.createTable
Catalog.currentDatabase
Catalog.dropGlobalTempView
Catalog.dropTempView
Catalog.isCached
Catalog.listColumns
Catalog.listDatabases
Catalog.listFunctions
Catalog.listTables
Catalog.recoverPartitions
Catalog.refreshByPath
Catalog.refreshTable
Catalog.registerFunction
Catalog.setCurrentDatabase
Catalog.uncacheTable