spark-instrumented-optimizer/python/pyspark/sql
Huon Wilson b67d369572 [SPARK-27099][SQL] Add 'xxhash64' for hashing arbitrary columns to Long
## What changes were proposed in this pull request?

This introduces a new SQL function 'xxhash64' for getting a 64-bit hash of an arbitrary number of columns.

This is designed to exactly mimic the 32-bit `hash`, which uses
MurmurHash3. The name is designed to be more future-proof than the
'hash', by indicating the exact algorithm used, similar to md5 and the
sha hashes.

## How was this patch tested?

The tests for the existing `hash` function were duplicated to run with `xxhash64`.

Closes #24019 from huonw/hash64.

Authored-by: Huon Wilson <Huon.Wilson@data61.csiro.au>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2019-03-20 16:34:34 +08:00
..
avro [SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs 2019-03-11 10:15:07 +09:00
tests [SPARK-26979][PYTHON][FOLLOW-UP] Make binary math/string functions take string as columns as well 2019-03-20 08:06:10 +09:00
__init__.py [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark 2017-11-02 15:22:52 +01:00
catalog.py [SPARK-24665][PYSPARK][FOLLOWUP] Use SQLConf in PySpark to manage all sql configs 2018-08-17 10:18:08 +08:00
column.py [SPARK-23847][PYTHON][SQL] Add asc_nulls_first, asc_nulls_last to PySpark 2018-04-08 12:09:06 +08:00
conf.py [SPARK-23698][PYTHON] Resolve undefined names in Python 3 2018-08-22 10:06:59 -07:00
context.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
dataframe.py [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r 2019-03-16 13:04:54 +09:00
functions.py [SPARK-27099][SQL] Add 'xxhash64' for hashing arbitrary columns to Long 2019-03-20 16:34:34 +08:00
group.py [SPARK-24722][SQL] pivot() with Column type argument 2018-08-04 14:17:32 +08:00
readwriter.py [SPARK-26016][DOCS] Clarify that text DataSource read/write, and RDD methods that read text, always use UTF-8 2019-03-05 08:03:39 +09:00
session.py [SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF 2019-03-07 08:52:24 -08:00
streaming.py [SPARK-26016][DOCS] Clarify that text DataSource read/write, and RDD methods that read text, always use UTF-8 2019-03-05 08:03:39 +09:00
types.py [SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF 2019-03-07 08:52:24 -08:00
udf.py [SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF 2019-03-07 08:52:24 -08:00
utils.py [SPARK-24721][SQL] Exclude Python UDFs filters in FileSourceStrategy 2018-08-28 10:57:13 +08:00
window.py [SPARK-26860][PYSPARK][SPARKR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation 2019-03-11 08:53:09 -05:00