fa47b7faf7
### What changes were proposed in this pull request? This PR update document for make Hive 2.3 dependency by default. ### Why are the changes needed? The documentation is incorrect. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A Closes #26919 from wangyum/SPARK-30280. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
102 lines
4.2 KiB
Markdown
102 lines
4.2 KiB
Markdown
---
|
|
layout: global
|
|
title: Distributed SQL Engine
|
|
displayTitle: Distributed SQL Engine
|
|
license: |
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
---
|
|
|
|
* Table of contents
|
|
{:toc}
|
|
|
|
Spark SQL can also act as a distributed query engine using its JDBC/ODBC or command-line interface.
|
|
In this mode, end-users or applications can interact with Spark SQL directly to run SQL queries,
|
|
without the need to write any code.
|
|
|
|
## Running the Thrift JDBC/ODBC server
|
|
|
|
The Thrift JDBC/ODBC server implemented here corresponds to the [`HiveServer2`](https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2)
|
|
in built-in Hive. You can test the JDBC server with the beeline script that comes with either Spark or compatible Hive.
|
|
|
|
To start the JDBC/ODBC server, run the following in the Spark directory:
|
|
|
|
./sbin/start-thriftserver.sh
|
|
|
|
This script accepts all `bin/spark-submit` command line options, plus a `--hiveconf` option to
|
|
specify Hive properties. You may run `./sbin/start-thriftserver.sh --help` for a complete list of
|
|
all available options. By default, the server listens on localhost:10000. You may override this
|
|
behaviour via either environment variables, i.e.:
|
|
|
|
{% highlight bash %}
|
|
export HIVE_SERVER2_THRIFT_PORT=<listening-port>
|
|
export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
|
|
./sbin/start-thriftserver.sh \
|
|
--master <master-uri> \
|
|
...
|
|
{% endhighlight %}
|
|
|
|
or system properties:
|
|
|
|
{% highlight bash %}
|
|
./sbin/start-thriftserver.sh \
|
|
--hiveconf hive.server2.thrift.port=<listening-port> \
|
|
--hiveconf hive.server2.thrift.bind.host=<listening-host> \
|
|
--master <master-uri>
|
|
...
|
|
{% endhighlight %}
|
|
|
|
Now you can use beeline to test the Thrift JDBC/ODBC server:
|
|
|
|
./bin/beeline
|
|
|
|
Connect to the JDBC/ODBC server in beeline with:
|
|
|
|
beeline> !connect jdbc:hive2://localhost:10000
|
|
|
|
Beeline will ask you for a username and password. In non-secure mode, simply enter the username on
|
|
your machine and a blank password. For secure mode, please follow the instructions given in the
|
|
[beeline documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients).
|
|
|
|
Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.
|
|
|
|
You may also use the beeline script that comes with Hive.
|
|
|
|
Thrift JDBC server also supports sending thrift RPC messages over HTTP transport.
|
|
Use the following setting to enable HTTP mode as system property or in `hive-site.xml` file in `conf/`:
|
|
|
|
hive.server2.transport.mode - Set this to value: http
|
|
hive.server2.thrift.http.port - HTTP port number to listen on; default is 10001
|
|
hive.server2.http.endpoint - HTTP endpoint; default is cliservice
|
|
|
|
To test, use beeline to connect to the JDBC/ODBC server in http mode with:
|
|
|
|
beeline> !connect jdbc:hive2://<host>:<port>/<database>?hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint>
|
|
|
|
If you closed a session and do CTAS, you must set `fs.%s.impl.disable.cache` to true in `hive-site.xml`.
|
|
See more details in [[SPARK-21067]](https://issues.apache.org/jira/browse/SPARK-21067).
|
|
|
|
## Running the Spark SQL CLI
|
|
|
|
The Spark SQL CLI is a convenient tool to run the Hive metastore service in local mode and execute
|
|
queries input from the command line. Note that the Spark SQL CLI cannot talk to the Thrift JDBC server.
|
|
|
|
To start the Spark SQL CLI, run the following in the Spark directory:
|
|
|
|
./bin/spark-sql
|
|
|
|
Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.
|
|
You may run `./bin/spark-sql --help` for a complete list of all available options.
|