DOC: Document that astype() for Series and Dataframe can accept a series of dtypes

### Pandas version checks

- [X] I have checked that the issue still exists on the latest versions of the docs on `main` [here](https://pandas.pydata.org/docs/dev/)


### Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html

https://pandas.pydata.org/docs/reference/api/pandas.Series.astype.html

### Documentation problem

The documentation for `DataFrame` states that `dtype` can be a `dict` mapping from column label to new type, or a scalar type. However, `dtype` can also be a `pd.Series` whose index contains a subset of the dataframe's labels (though perhaps in a different order) and whose values are dtypes. For example:

```python
import pandas as pd

df = pd.DataFrame([[1, 2]])
print(f"original dtypes:\n{df.dtypes}")
new_df = df.astype(pd.Series(["string", "object"], index=df.columns[::-1]))
print(f"new dtypes:\n{new_df.dtypes}")
```

In case the frame has duplicate column labels, the index of the new series of dtypes may still be a subset of the column labels. However, it appears that the only way that the new series can have a non-unique index is if that index is exactly equal to the dataframe's `columns`. For example, this works:

```python
import pandas as pd

df = pd.DataFrame([[1, 2, 3]], columns=[0, 0, 1])
print(f"original dtypes:\n{df.dtypes}")
new_df = df.astype(pd.Series(["string", "object", "float"], index=df.columns))
print(f"new dtypes:\n{new_df.dtypes}")
```

but giving the series an index with all the columns in a different order

```python
import pandas as pd

df = pd.DataFrame([[1, 2, 3]], columns=[0, 0, 1])
print(f"original dtypes:\n{df.dtypes}")
new_df = df.astype(pd.Series(["string", "object", "float"], index=df.columns[::-1]))
print(f"new dtypes:\n{new_df.dtypes}")
```

raises `ValueError: cannot reindex on an axis with duplicate labels`:

<details>
<summary>Show stack trace</summary>

```
ValueError                                Traceback (most recent call last)
Input In [14], in <module>
      3 df = pd.DataFrame([[1, 2, 3]], columns=[0, 0, 1])
      4 print(f"original dtypes:\n{df.dtypes}")
----> 5 new_df = df.astype(pd.Series(["string", "object", "float"], index=df.columns[::-1]))
      6 print(f"new dtypes:\n{new_df.dtypes}")

File /usr/local/lib/python3.9/site-packages/pandas/core/generic.py:5898, in NDFrame.astype(self, dtype, copy, errors)
   5895 from pandas import Series
   5897 dtype_ser = Series(dtype, dtype=object)
-> 5898 dtype_ser = dtype_ser.reindex(self.columns, fill_value=None, copy=False)
   5900 results = []
   5901 for i, (col_name, col) in enumerate(self.items()):

File /usr/local/lib/python3.9/site-packages/pandas/core/series.py:4672, in Series.reindex(self, *args, **kwargs)
   4668         raise TypeError(
   4669             "'index' passed as both positional and keyword argument"
   4670         )
   4671     kwargs.update({"index": index})
-> 4672 return super().reindex(**kwargs)

File /usr/local/lib/python3.9/site-packages/pandas/core/generic.py:4974, in NDFrame.reindex(self, *args, **kwargs)
   4971     return self._reindex_multi(axes, copy, fill_value)
   4973 # perform the reindex on the axes
-> 4974 return self._reindex_axes(
   4975     axes, level, limit, tolerance, method, fill_value, copy
   4976 ).__finalize__(self, method="reindex")

File /usr/local/lib/python3.9/site-packages/pandas/core/generic.py:4994, in NDFrame._reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4989 new_index, indexer = ax.reindex(
   4990     labels, level=level, limit=limit, tolerance=tolerance, method=method
   4991 )
   4993 axis = self._get_axis_number(a)
-> 4994 obj = obj._reindex_with_indexers(
   4995     {axis: [new_index, indexer]},
   4996     fill_value=fill_value,
   4997     copy=copy,
   4998     allow_dups=False,
   4999 )
   5000 # If we've made a copy once, no need to make another one
   5001 copy = False

File /usr/local/lib/python3.9/site-packages/pandas/core/generic.py:5040, in NDFrame._reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   5037     indexer = ensure_platform_int(indexer)
   5039 # TODO: speed up on homogeneous DataFrame objects (see _reindex_multi)
-> 5040 new_data = new_data.reindex_indexer(
   5041     index,
   5042     indexer,
   5043     axis=baxis,
   5044     fill_value=fill_value,
   5045     allow_dups=allow_dups,
   5046     copy=copy,
   5047 )
   5048 # If we've made a copy once, no need to make another one
   5049 copy = False

File /usr/local/lib/python3.9/site-packages/pandas/core/internals/managers.py:679, in BaseBlockManager.reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate, only_slice, use_na_proxy)
    677 # some axes don't allow reindexing with dups
    678 if not allow_dups:
--> 679     self.axes[axis]._validate_can_reindex(indexer)
    681 if axis >= self.ndim:
    682     raise IndexError("Requested axis not found in manager")

File /usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py:4107, in Index._validate_can_reindex(self, indexer)
   4105 # trying to reindex on an axis with duplicates
   4106 if not self._index_as_unique and len(indexer):
-> 4107     raise ValueError("cannot reindex on an axis with duplicate labels")

ValueError: cannot reindex on an axis with duplicate labels
```
</details>

and so does passing a series of dtypes for just the first two columns:

```python
import pandas as pd

df = pd.DataFrame([[1, 2, 3]], columns=[0, 0, 1])
print(f"original dtypes:\n{df.dtypes}")
new_df = df.astype(pd.Series(["string", "object"], index=df.columns[:2]))
print(f"new dtypes:\n{new_df.dtypes}")
```

although passing a type for just duplicate column label `0` converts the first two columns to the same new type:


```python
import pandas as pd

df = pd.DataFrame([[1, 2, 3]], columns=[0, 0, 1])
print(f"original dtypes:\n{df.dtypes}")
new_df = df.astype(pd.Series(["string"], index=[0]))
print(f"new dtypes:\n{new_df.dtypes}")
```

Finally, it's possible to pass a series of a single dtype to `Series.astype`. In that case, the one value in the series index must be the series name, e.g.:

```python
import pandas as pd
s = pd.Series(1)
s.astype(pd.Series(["int64"], index=[s.name]))
```



### Suggested fix for documentation

I don't know whether all of this is an intended feature, or just a bug caused by converting dict-like dtypes to `Series` [here](https://github.com/pandas-dev/pandas/blob/48d515958d5805f0e62e34b7424097e5575089a8/pandas/core/generic.py#L5901). If it's a feature, it should be documented in the documentation for `astype` for both `DataFrame` and `Series`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Document that astype() for Series and Dataframe can accept a series of dtypes #46353

Pandas version checks

Location of the documentation

Documentation problem

Suggested fix for documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DOC: Document that astype() for Series and Dataframe can accept a series of dtypes #46353

Description

Pandas version checks

Location of the documentation

Documentation problem

Suggested fix for documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions