[SPARK-30262][SQL] Avoid NumberFormatException when totalSize is empty

### What changes were proposed in this pull request?

We could get the Partitions Statistics Info.But in some specail case, The Info  like  totalSize,rawDataSize,rowCount maybe empty. When we do some ddls like
`desc formatted partition` ,the NumberFormatException is showed as below:
```
spark-sql> desc formatted table1 partition(year='2019', month='10', day='17', hour='23');
19/10/19 00:02:40 ERROR SparkSQLDriver: Failed in [desc formatted table1 partition(year='2019', month='10', day='17', hour='23')]
java.lang.NumberFormatException: Zero length BigInteger
at java.math.BigInteger.(BigInteger.java:411)
at java.math.BigInteger.(BigInteger.java:597)
at scala.math.BigInt$.apply(BigInt.scala:77)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$31.apply(HiveClientImpl.scala:1056)
```
Although we can use 'Analyze table partition ' to update the totalSize,rawDataSize or rowCount, it's unresonable for normal SQL to throw NumberFormatException for Empty totalSize.We should fix the empty case when readHiveStats.

### Why are the changes needed?

This is a related to the robustness of the code and may lead to unexpected exception in some unpredictable situation.Here is the case:
<img width="981" alt="image" src="https://user-images.githubusercontent.com/20614350/70845771-7b88b400-1e8d-11ea-95b0-df5c58097d7d.png">

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

manual

Closes #26892 from southernriver/SPARK-30262.

Authored-by: chenliang <southernriver@163.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This commit is contained in:
chenliang 2019-12-18 15:12:32 -08:00 committed by Dongjoon Hyun
parent 0945633844
commit abfc267f0c

View file

@ -1189,9 +1189,10 @@ private[hive] object HiveClientImpl {
* Note that this statistics could be overridden by Spark's statistics if that's available. * Note that this statistics could be overridden by Spark's statistics if that's available.
*/ */
private def readHiveStats(properties: Map[String, String]): Option[CatalogStatistics] = { private def readHiveStats(properties: Map[String, String]): Option[CatalogStatistics] = {
val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_)) val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).filter(_.nonEmpty).map(BigInt(_))
val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_)) val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).filter(_.nonEmpty)
val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)) .map(BigInt(_))
val rowCount = properties.get(StatsSetupConst.ROW_COUNT).filter(_.nonEmpty).map(BigInt(_))
// NOTE: getting `totalSize` directly from params is kind of hacky, but this should be // NOTE: getting `totalSize` directly from params is kind of hacky, but this should be
// relatively cheap if parameters for the table are populated into the metastore. // relatively cheap if parameters for the table are populated into the metastore.
// Currently, only totalSize, rawDataSize, and rowCount are used to build the field `stats` // Currently, only totalSize, rawDataSize, and rowCount are used to build the field `stats`