spark-instrumented-optimizer/sql/catalyst/src
Richard Chen 6c6291b3f6 [SPARK-36836][SQL] Fix incorrect result in sha2 expression
### What changes were proposed in this pull request?

`sha2(input, bit_length)` returns incorrect results when `bit_length == 224` for all inputs.
This error can be reproduced by running `spark.sql("SELECT sha2('abc', 224)").show()`, for instance, in spark-shell.

Spark currently returns
```
#\t}"4�"�B�w��U�*��你���l��
```
while the expected result is
```
23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7
```

This appears to happen because the `MessageDigest.digest()` function appears to return bytes intended to be interpreted as a `BigInt` rather than a string. Thus, the output of `MessageDigest.digest()` must first be interpreted as a `BigInt` and then transformed into a hex string rather than directly being interpreted as a hex string.

### Why are the changes needed?

`sha2(input, bit_length)` with a `bit_length` input of `224` would previously return the incorrect result.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new test to `HashExpressionsSuite.scala` which previously failed and now pass

Closes #34086 from richardc-db/sha224.

Authored-by: Richard Chen <r.chen@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
2021-09-28 18:38:20 +08:00
..
main [SPARK-36836][SQL] Fix incorrect result in sha2 expression 2021-09-28 18:38:20 +08:00
test [SPARK-36836][SQL] Fix incorrect result in sha2 expression 2021-09-28 18:38:20 +08:00