From abfc267f0cc38d792f68923946a83877d07dee27 Mon Sep 17 00:00:00 2001
From: chenliang <southernriver@163.com>
Date: Wed, 18 Dec 2019 15:12:32 -0800
Subject: [PATCH] [SPARK-30262][SQL] Avoid NumberFormatException when totalSize
 is empty
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

### What changes were proposed in this pull request?

We could get the Partitions Statistics Info.But in some specail case, The Info  like  totalSize，rawDataSize，rowCount maybe empty. When we do some ddls like
`desc formatted partition` ,the NumberFormatException is showed as below:
```
spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23');
19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')]
java.lang.NumberFormatException: Zero length BigInteger
at java.math.BigInteger.(BigInteger.java:411)
at java.math.BigInteger.(BigInteger.java:597)
at scala.math.BigInt$.apply(BigInt.scala:77)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
```
Although we can use 'Analyze table partition ' to update the totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw NumberFormatException for Empty totalSize.We should fix the empty case when readHiveStats.

### Why are the changes needed?

This is a related to the robustness of the code and may lead to unexpected exception in some unpredictable situation.Here is the case:
<img width="981" alt="image" src="https://user-images.githubusercontent.com/20614350/70845771-7b88b400-1e8d-11ea-95b0-df5c58097d7d.png">

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

manual

Closes #26892 from southernriver/SPARK-30262.

Authored-by: chenliang <southernriver@163.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
---
 .../org/apache/spark/sql/hive/client/HiveClientImpl.scala  | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
index 700c0884dd..f196e94a83 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
@@ -1189,9 +1189,10 @@ private[hive] object HiveClientImpl {
    * Note that this statistics could be overridden by Spark's statistics if that's available.
    */
   private def readHiveStats(properties: Map[String, String]): Option[CatalogStatistics] = {
-    val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
-    val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
-    val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_))
+    val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).filter(_.nonEmpty).map(BigInt(_))
+    val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).filter(_.nonEmpty)
+      .map(BigInt(_))
+    val rowCount = properties.get(StatsSetupConst.ROW_COUNT).filter(_.nonEmpty).map(BigInt(_))
     // NOTE: getting `totalSize` directly from params is kind of hacky, but this should be
     // relatively cheap if parameters for the table are populated into the metastore.
     // Currently, only totalSize, rawDataSize, and rowCount are used to build the field `stats`