[SPARK-31331][SQL][DOCS] Document Spark integration with Hive UDFs/UDAFs/UDTFs

### What changes were proposed in this pull request? Document Spark integration with Hive UDFs/UDAFs/UDTFs ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1031" alt="Screen Shot 2020-04-02 at 2 22 42 PM" src="https://user-images.githubusercontent.com/13592258/78301971-cc7cf080-74ee-11ea-93c8-7d4c75213b47.png"> ### How was this patch tested? Manually build and check Closes #28104 from huaxingao/hive-udfs. Lead-authored-by: Huaxin Gao <huaxing@us.ibm.com> Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Sean Owen <srowen@gmail.com>
2020-04-09 13:28:01 -05:00 · 2020-04-09 13:28:01 -05:00 · 61f903fa7a
parent 1354d2d0de
commit 61f903fa7a
1 changed files with 87 additions and 1 deletions
--- a/docs/sql-ref-functions-udf-hive.md
+++ b/docs/sql-ref-functions-udf-hive.md
@ -19,4 +19,90 @@ license: |
  limitations under the License.
 ---
-Integration with Hive UDFs/UDAFs/UDTFs
+### Description
 Spark SQL supports integration of Hive UDFs, UDAFs and UDTFs. Similar to Spark UDFs and UDAFs, Hive UDFs work on a single row as input and generate a single row as output, while Hive UDAFs operate on multiple rows and return a single aggregated row as a result. In addition, Hive also supports UDTFs (User Defined Tabular Functions) that act on one row as input and return multiple rows as output. To use Hive UDFs/UDAFs/UTFs, the user should register them in Spark, and then use them in Spark SQL queries.
 ### Examples
 Hive has two UDF interfaces: [UDF](https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDF.java) and [GenericUDF](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java).
 An example below uses [GenericUDFAbs](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAbs.java) derived from `GenericUDF`.
 {% highlight sql %}
 -- Register `GenericUDFAbs` and use it in Spark SQL.
 -- Note that, if you use your own programmed one, you need to add a JAR containig it
 -- into a classpath,
 -- e.g., ADD JAR yourHiveUDF.jar;
 CREATE TEMPORARY FUNCTION testUDF AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs';
 SELECT * FROM t;
  +-----+
  |value|
  +-----+
  | -1.0|
  |  2.0|
  | -3.0|
  +-----+
 SELECT testUDF(value) FROM t;
  +--------------+
  |testUDF(value)|
  +--------------+
  |           1.0|
  |           2.0|
  |           3.0|
  +--------------+
 {% endhighlight %}
 An example below uses [GenericUDTFExplode](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFExplode.java) derived from [GenericUDTF](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java).
 {% highlight sql %}
 -- Register `GenericUDTFExplode` and use it in Spark SQL
 CREATE TEMPORARY FUNCTION hiveUDTF
    AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode';
 SELECT * FROM t;
  +------+
  | value|
  +------+
  |[1, 2]|
  |[3, 4]|
  +------+
 SELECT hiveUDTF(value) FROM t;
  +---+
  |col|
  +---+
  |  1|
  |  2|
  |  3|
  |  4|
  +---+
 {% endhighlight %}
 Hive has two UDAF interfaces: [UDAF](https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDAF.java) and [GenericUDAFResolver](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver.java).
 An example below uses [GenericUDAFSum](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java) derived from `GenericUDAFResolver`.
 {% highlight sql %}
 -- Register `GenericUDAFSum` and use it in Spark SQL
 CREATE TEMPORARY FUNCTION hiveUDAF
    AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum';
 SELECT * FROM t;
  +---+-----+
  |key|value|
  +---+-----+
  |  a|    1|
  |  a|    2|
  |  b|    3|
  +---+-----+
 SELECT key, hiveUDAF(value) FROM t GROUP BY key;
  +---+---------------+
  |key|hiveUDAF(value)|
  +---+---------------+
  |  b|              3|
  |  a|              3|
  +---+---------------+
 {% endhighlight %}