Skip to content

df.append should retain columns type if same type #18359

Closed
@topper-123

Description

@topper-123

Currently df.append loses columns index type, if the columns is a CategoricalIndex:

>>> idx = pd.CategoricalIndex('a b'.split())
>>> df = pd.DataFrame([[1, 2]], columns=idx)
>>> ser = pd.Series([3, 4], index=idx, name=1)
>>> df.append(ser).columns
Index(['a', 'b'], dtype='object')

df.append(ser).columns should return a CategoricalIndex equal to idx.

pandas 0.21 has the new CategoricalDtype, so it's now easy to compare CategoricalIndex instances for strict type equality. Hence this issue should be much easier to solve than previously.

Solution proposal

In frame.py::DataFrame.append there is this line:

combined_columns = self.columns.tolist() + self.columns.union(
                    other.index).difference(self.columns).tolist()

This line converts CategoricalIndex columns to normal indexes. So by making some checks for types and dtypes it should be easy return the correct index. So if the above would be something like this instead:

same_types = type(self.columns) == type(other.index)
same_dtypes = self.columns.dtype == other.index.dtype
if same_types and same_dtypes:
    combined_columns = self.columns.union(other.index)
else:
    combined_columns = self.columns.tolist() + self.columns.union(
        other.index).difference(self.columns).tolist()

and I think this issue can be solved (haven't checked yet all details, maybe some adjustments have to be made). I'd appreciate comments if this approach is ok.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Compatpandas objects compatability with Numpy or Python functionsDtype ConversionsUnexpected or buggy dtype conversionsReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions