[SPARK-7758] [SQL] Override more configs to avoid failure when connect to a postgre sql

https://issues.apache.org/jira/browse/SPARK-7758

When initializing `executionHive`, we only masks
`javax.jdo.option.ConnectionURL` to override metastore location.  However,
other properties that relates to the actual Hive metastore data source are not
masked.  For example, when using Spark SQL with a PostgreSQL backed Hive
metastore, `executionHive` actually tries to use settings read from
`hive-site.xml`, which talks about PostgreSQL, to connect to the temporary
Derby metastore, thus causes error.

To fix this, we need to mask all metastore data source properties.
Specifically, according to the code of [Hive `ObjectStore.getDataSourceProps()`
method] [1], all properties whose name mentions "jdo" and "datanucleus" must be
included.

[1]: https://github.com/apache/hive/blob/release-0.13.1/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L288

Have tested using postgre sql as metastore, it worked fine.

Author: WangTaoTheTonic <wangtao111@huawei.com>

Closes #6314 from WangTaoTheTonic/SPARK-7758 and squashes the following commits:

ca7ae7c [WangTaoTheTonic] add comments
86caf2c [WangTaoTheTonic] delete unused import
e4f0feb [WangTaoTheTonic] block more data source related property
92a81fa [WangTaoTheTonic] fix style check
e3e683d [WangTaoTheTonic] override more configs to avoid failuer connecting to postgre sql

(cherry picked from commit 31d5d463e7)
Signed-off-by: Michael Armbrust <michael@databricks.com>
This commit is contained in:
WangTaoTheTonic 2015-05-22 14:43:16 -07:00 committed by Michael Armbrust
parent 2904d3f8bd
commit 40989cea0d
2 changed files with 16 additions and 4 deletions

View file

@ -1884,7 +1884,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
*
* @param f the closure to clean
* @param checkSerializable whether or not to immediately check <tt>f</tt> for serializability
* @throws <tt>SparkException<tt> if <tt>checkSerializable</tt> is set but <tt>f</tt> is not
* @throws SparkException if <tt>checkSerializable</tt> is set but <tt>f</tt> is not
* serializable
*/
private[spark] def clean[F <: AnyRef](f: F, checkSerializable: Boolean = true): F = {

View file

@ -25,6 +25,7 @@ import org.apache.hadoop.hive.ql.parse.VariableSubstitution
import org.apache.spark.sql.catalyst.ParserDialect
import scala.collection.JavaConversions._
import scala.collection.mutable.HashMap
import scala.language.implicitConversions
import org.apache.hadoop.fs.{FileSystem, Path}
@ -153,7 +154,7 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
* Hive 13 as this is the version of Hive that is packaged with Spark SQL. This copy of the
* client is used for execution related tasks like registering temporary functions or ensuring
* that the ThreadLocal SessionState is correctly populated. This copy of Hive is *not* used
* for storing peristent metadata, and only point to a dummy metastore in a temporary directory.
* for storing persistent metadata, and only point to a dummy metastore in a temporary directory.
*/
@transient
protected[hive] lazy val executionHive: ClientWrapper = {
@ -507,8 +508,19 @@ private[hive] object HiveContext {
def newTemporaryConfiguration(): Map[String, String] = {
val tempDir = Utils.createTempDir()
val localMetastore = new File(tempDir, "metastore").getAbsolutePath
Map(
"javax.jdo.option.ConnectionURL" -> s"jdbc:derby:;databaseName=$localMetastore;create=true")
val propMap: HashMap[String, String] = HashMap()
// We have to mask all properties in hive-site.xml that relates to metastore data source
// as we used a local metastore here.
HiveConf.ConfVars.values().foreach { confvar =>
if (confvar.varname.contains("datanucleus") || confvar.varname.contains("jdo")) {
propMap.put(confvar.varname, confvar.defaultVal)
}
}
propMap.put("javax.jdo.option.ConnectionURL",
s"jdbc:derby:;databaseName=$localMetastore;create=true")
propMap.put("datanucleus.rdbms.datastoreAdapterClassName",
"org.datanucleus.store.rdbms.adapter.DerbyAdapter")
propMap.toMap
}
protected val primitiveTypes =