spark-instrumented-optimizer/python/pyspark/pandas/tests/test_utils.py
Yikun Jiang 44b7931936 [SPARK-35176][PYTHON] Standardize input validation error type
### What changes were proposed in this pull request?
This PR corrects some exception type when the function input params are failed to validate due to TypeError.
In order to convenient to review, there are 3 commits in this PR:
- Standardize input validation error type on sql
- Standardize input validation error type on ml
- Standardize input validation error type on pandas

### Why are the changes needed?
As suggestion from Python exception doc [1]: "Raised when an operation or function is applied to an object of inappropriate type.", but there are many Value error are raised in some pyspark code, this patch fix them.

[1] https://docs.python.org/3/library/exceptions.html#TypeError

Note that: this patch only addresses the exsiting some wrong raise type for input validation, the input validation decorator/framework which mentioned in [SPARK-35176](https://issues.apache.org/jira/browse/SPARK-35176), would be submited in a speparated patch.

### Does this PR introduce _any_ user-facing change?
Yes, code can raise the right TypeError instead of ValueError.

### How was this patch tested?
Existing test case and UT

Closes #32368 from Yikun/SPARK-35176.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-05-03 15:34:24 +09:00

106 lines
3.6 KiB
Python

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import pandas as pd
from pyspark.pandas.utils import (
lazy_property,
validate_arguments_and_invoke_function,
validate_bool_kwarg,
)
from pyspark.testing.pandasutils import PandasOnSparkTestCase
from pyspark.testing.sqlutils import SQLTestUtils
some_global_variable = 0
class UtilsTest(PandasOnSparkTestCase, SQLTestUtils):
# a dummy to_html version with an extra parameter that pandas does not support
# used in test_validate_arguments_and_invoke_function
def to_html(self, max_rows=None, unsupported_param=None):
args = locals()
pdf = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, index=[0, 1, 3])
validate_arguments_and_invoke_function(pdf, self.to_html, pd.DataFrame.to_html, args)
def to_clipboard(self, sep=",", **kwargs):
args = locals()
pdf = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, index=[0, 1, 3])
validate_arguments_and_invoke_function(
pdf, self.to_clipboard, pd.DataFrame.to_clipboard, args
)
# Support for **kwargs
self.to_clipboard(sep=",", index=False)
def test_validate_arguments_and_invoke_function(self):
# This should pass and run fine
self.to_html()
self.to_html(unsupported_param=None)
self.to_html(max_rows=5)
# This should fail because we are explicitly setting an unsupported param
# to a non-default value
with self.assertRaises(TypeError):
self.to_html(unsupported_param=1)
def test_lazy_property(self):
obj = TestClassForLazyProp()
# If lazy prop is not working, the second test would fail (because it'd be 2)
self.assert_eq(obj.lazy_prop, 1)
self.assert_eq(obj.lazy_prop, 1)
def test_validate_bool_kwarg(self):
# This should pass and run fine
koalas = True
self.assert_eq(validate_bool_kwarg(koalas, "koalas"), True)
koalas = False
self.assert_eq(validate_bool_kwarg(koalas, "koalas"), False)
koalas = None
self.assert_eq(validate_bool_kwarg(koalas, "koalas"), None)
# This should fail because we are explicitly setting a non-boolean value
koalas = "true"
with self.assertRaisesRegex(
TypeError, 'For argument "koalas" expected type bool, received type str.'
):
validate_bool_kwarg(koalas, "koalas")
class TestClassForLazyProp:
def __init__(self):
self.some_variable = 0
@lazy_property
def lazy_prop(self):
self.some_variable += 1
return self.some_variable
if __name__ == "__main__":
import unittest
from pyspark.pandas.tests.test_utils import * # noqa: F401
try:
import xmlrunner # type: ignore[import]
testRunner = xmlrunner.XMLTestRunner(output='target/test-reports', verbosity=2)
except ImportError:
testRunner = None
unittest.main(testRunner=testRunner, verbosity=2)