## What changes were proposed in this pull request?
As reported on the jira, insert overwrite statement runs much slower in Spark, compared with hive-client.
It seems there is a patch [HIVE-11940](ba21806b77) which largely improves insert overwrite performance on Hive. HIVE-11940 is patched after Hive 2.0.0.
Because Spark SQL uses older Hive library, we can not benefit from such improvement.
The reporter verified that there is also a big performance gap between Hive 1.2.1 (520.037 secs) and Hive 2.0.1 (35.975 secs) on insert overwrite execution.
Instead of upgrading to Hive 2.0 in Spark SQL, which might not be a trivial task, this patch provides an approach to delete the partition before asking Hive to load data files into the partition.
Note: The case reported on the jira is insert overwrite to partition. Since `Hive.loadTable` also uses the function to replace files, insert overwrite to table should has the same issue. We can take the same approach to delete the table first. I will upgrade this to include this.
## How was this patch tested?
Jenkins tests.
There are existing tests using insert overwrite statement. Those tests should be passed. I added a new test to specially test insert overwrite into partition.
For performance issue, as I don't have Hive 2.0 environment, this needs the reporter to verify it. Please refer to the jira.
Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#15667 from viirya/improve-hive-insertoverwrite.