56c211bd6a
Implement `CategoricalIndex.map` and `DatetimeIndex.map`
`MultiIndex.map` cannot be implemented in the same way as the `map` of other indexes. It should be taken care of separately if necessary.
Mapping values using input correspondence is a common operation that is supported in pandas. We shall support that as well.
Yes. `CategoricalIndex.map` and `DatetimeIndex.map` can be used now.
- CategoricalIndex.map
```py
>>> idx = ps.CategoricalIndex(['a', 'b', 'c'])
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'], ordered=False, dtype='category')
>>> idx.map(lambda x: x.upper())
CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'], ordered=False, dtype='category')
>>> pser = pd.Series([1, 2, 3], index=pd.CategoricalIndex(['a', 'b', 'c'], ordered=True))
>>> idx.map(pser)
CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=True, dtype='category')
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
CategoricalIndex(['first', 'second', 'third'], categories=['first', 'second', 'third'], ordered=False, dtype='category')
```
- DatetimeIndex.map
```py
>>> pidx = pd.date_range(start="2020-08-08", end="2020-08-10")
>>> psidx = ps.from_pandas(pidx)
>>> mapper_dict = {
... datetime.datetime(2020, 8, 8): datetime.datetime(2021, 8, 8),
... datetime.datetime(2020, 8, 9): datetime.datetime(2021, 8, 9),
... }
>>> psidx.map(mapper_dict)
DatetimeIndex(['2021-08-08', '2021-08-09', 'NaT'], dtype='datetime64[ns]', freq=None)
>>> mapper_pser = pd.Series([1, 2, 3], index=pidx)
>>> psidx.map(mapper_pser)
Int64Index([1, 2, 3], dtype='int64')
>>> psidx
DatetimeIndex(['2020-08-08', '2020-08-09', '2020-08-10'], dtype='datetime64[ns]', freq=None)
>>> psidx.map(lambda x: x.strftime("%B %d, %Y, %r"))
Index(['August 08, 2020, 12:00:00 AM', 'August 09, 2020, 12:00:00 AM',
'August 10, 2020, 12:00:00 AM'],
dtype='object')
```
Unit tests.
Closes #33756 from xinrong-databricks/other_indexes_map.
Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit
|
||
---|---|---|
.. | ||
cloudpickle | ||
ml | ||
mllib | ||
pandas | ||
resource | ||
sql | ||
streaming | ||
testing | ||
tests | ||
__init__.py | ||
__init__.pyi | ||
_globals.py | ||
_typing.pyi | ||
accumulators.py | ||
accumulators.pyi | ||
broadcast.py | ||
broadcast.pyi | ||
conf.py | ||
conf.pyi | ||
context.py | ||
context.pyi | ||
daemon.py | ||
files.py | ||
files.pyi | ||
find_spark_home.py | ||
install.py | ||
java_gateway.py | ||
join.py | ||
profiler.py | ||
profiler.pyi | ||
py.typed | ||
rdd.py | ||
rdd.pyi | ||
rddsampler.py | ||
resultiterable.py | ||
resultiterable.pyi | ||
serializers.py | ||
shell.py | ||
shuffle.py | ||
statcounter.py | ||
statcounter.pyi | ||
status.py | ||
status.pyi | ||
storagelevel.py | ||
storagelevel.pyi | ||
taskcontext.py | ||
taskcontext.pyi | ||
traceback_utils.py | ||
util.py | ||
util.pyi | ||
version.py | ||
version.pyi | ||
worker.py |