spark-instrumented-optimizer/python/pyspark/pandas
itholic 3a18864c5f [SPARK-35809][PYTHON] Add index_col argument for ps.sql
### What changes were proposed in this pull request?

This PR proposes adding an argument `index_col` for `ps.sql` function, to preserve the index when users want.

NOTE that the `reset_index()` have to be performed before using `ps.sql` with `index_col`.

```python
>>> psdf
   A  B
a  1  4
b  2  5
c  3  6
>>> psdf_reset_index = psdf.reset_index()
>>> ps.sql("SELECT * from {psdf_reset_index} WHERE A > 1", index_col="index")
       A  B
index
b      2  5
c      3  6
```

Otherwise, the index is always lost.

```python
>>> ps.sql("SELECT * from {psdf} WHERE A > 1")
   A  B
0  2  5
1  3  6
```

### Why are the changes needed?

Index is one of the key object for the existing pandas users, so we should provide the way to keep the index after computing the `ps.sql`.

### Does this PR introduce _any_ user-facing change?

Yes, the new argument is added.

### How was this patch tested?

Add a unit test and manually check the build pass.

Closes #33450 from itholic/SPARK-35809.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 6578f0b135)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-07-22 17:08:42 +09:00
..
data_type_ops [SPARK-36167][PYTHON][3.2] Revisit more InternalField managements 2021-07-20 09:30:35 +09:00
indexes [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex 2021-07-22 17:06:25 +09:00
missing [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex 2021-07-22 17:06:25 +09:00
plot [SPARK-35344][PYTHON] Support creating a Column of numpy literals in pandas API on Spark 2021-06-28 19:03:42 -07:00
spark [SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark 2021-06-29 10:52:24 -07:00
tests [SPARK-35809][PYTHON] Add index_col argument for ps.sql 2021-07-22 17:08:42 +09:00
typedef [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-16 11:41:53 +09:00
usage_logging [SPARK-35499][PYTHON] Apply black to pandas API on Spark codes 2021-06-06 17:30:07 -07:00
__init__.py [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package 2021-07-22 14:21:53 +09:00
_typing.py [SPARK-35944][PYTHON] Introduce Name and Label type aliases 2021-07-01 09:40:07 +09:00
accessors.py [SPARK-35944][PYTHON] Introduce Name and Label type aliases 2021-07-01 09:40:07 +09:00
base.py [SPARK-35615][PYTHON] Make unary and comparison operators data-type-based 2021-07-07 13:47:04 -07:00
categorical.py [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex 2021-07-22 17:06:25 +09:00
config.py [SPARK-35499][PYTHON] Apply black to pandas API on Spark codes 2021-06-06 17:30:07 -07:00
datetimes.py [SPARK-35453][PYTHON] Move Koalas accessor to pandas_on_spark accessor 2021-06-01 10:33:10 +09:00
exceptions.py [SPARK-35465][PYTHON] Set up the mypy configuration to enable disallow_untyped_defs check for pandas APIs on Spark module 2021-05-21 11:03:35 -07:00
extensions.py [SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark 2021-06-29 10:52:24 -07:00
frame.py [SPARK-36167][PYTHON][3.2] Revisit more InternalField managements 2021-07-20 09:30:35 +09:00
generic.py [SPARK-35806][PYTHON] Mapping the mode argument to pandas in DataFrame.to_csv 2021-07-19 19:58:19 +09:00
groupby.py [SPARK-35944][PYTHON] Introduce Name and Label type aliases 2021-07-01 09:40:07 +09:00
indexing.py [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-16 11:41:53 +09:00
internal.py [SPARK-36167][PYTHON][3.2] Revisit more InternalField managements 2021-07-20 09:30:35 +09:00
ml.py [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-16 11:41:53 +09:00
mlflow.py [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-16 11:41:53 +09:00
namespace.py [SPARK-35810][PYTHON] Deprecate ps.broadcast API 2021-07-19 10:45:16 +09:00
numpy_compat.py [SPARK-35344][PYTHON] Support creating a Column of numpy literals in pandas API on Spark 2021-06-28 19:03:42 -07:00
series.py [SPARK-36167][PYTHON][3.2] Revisit more InternalField managements 2021-07-20 09:30:35 +09:00
sql_processor.py [SPARK-35809][PYTHON] Add index_col argument for ps.sql 2021-07-22 17:08:42 +09:00
strings.py [SPARK-35761][PYTHON] Use type-annotation based pandas_udf or avoid specifying udf types to suppress warnings 2021-06-15 11:17:56 +09:00
utils.py [SPARK-35806][PYTHON] Mapping the mode argument to pandas in DataFrame.to_csv 2021-07-19 19:58:19 +09:00
window.py [SPARK-35859][PYTHON] Cleanup type hints in pandas-on-Spark 2021-06-29 10:52:24 -07:00