spark-instrumented-optimizer

History

Wing Yew Poon c72f88b0ba [SPARK-17398][SQL] Fix ClassCastException when querying partitioned JSON table ### What changes were proposed in this pull request? When querying a partitioned table with format `org.apache.hive.hcatalog.data.JsonSerDe` and more than one task runs in each executor concurrently, the following exception is encountered: `java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.apache.hive.hcatalog.data.HCatRecord` The exception occurs in `HadoopTableReader.fillObject`. `org.apache.hive.hcatalog.data.JsonSerDe#initialize` populates a `cachedObjectInspector` field by calling `HCatRecordObjectInspectorFactory.getHCatRecordObjectInspector`, which is not thread-safe; this `cachedObjectInspector` is returned by `JsonSerDe#getObjectInspector`. We protect against this Hive bug by synchronizing on an object when we need to call `initialize` on `org.apache.hadoop.hive.serde2.Deserializer` instances (which may be `JsonSerDe` instances). By doing so, the `ObjectInspector` for the `Deserializer` of the partitions of the JSON table and that of the table `SerDe` are the same cached `ObjectInspector` and `HadoopTableReader.fillObject` then works correctly. (If the `ObjectInspector`s are different, then a bug in `HCatRecordObjectInspector` causes an `ArrayList` to be created instead of an `HCatRecord`, resulting in the `ClassCastException` that is seen.) ### Why are the changes needed? To avoid HIVE-15773 / HIVE-21752. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Tested manually on a cluster with a partitioned JSON table and running a query using more than one core per executor. Before this change, the ClassCastException happens consistently. With this change it does not happen. Closes #26895 from wypoon/SPARK-17398. Authored-by: Wing Yew Poon <wypoon@cloudera.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>		2019-12-20 10:39:26 -08:00
..
benchmarks	[SPARK-29141][SQL][TEST] Use SqlBasedBenchmark in SQL benchmarks	2019-09-18 17:52:23 -07:00
compatibility/src/test/scala/org/apache/spark/sql/hive/execution	[SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax	2019-12-07 02:15:25 +08:00
src	[SPARK-17398][SQL] Fix ClassCastException when querying partitioned JSON table	2019-12-20 10:39:26 -08:00
pom.xml	[INFRA] Reverts commit `56dcd79` and `c216ef1`	2019-12-16 19:57:44 -07:00