Description
Currently df.append
loses columns index type, if the columns is a CategoricalIndex
:
>>> idx = pd.CategoricalIndex('a b'.split())
>>> df = pd.DataFrame([[1, 2]], columns=idx)
>>> ser = pd.Series([3, 4], index=idx, name=1)
>>> df.append(ser).columns
Index(['a', 'b'], dtype='object')
df.append(ser).columns
should return a CategoricalIndex
equal to idx
.
pandas 0.21 has the new CategoricalDtype
, so it's now easy to compare CategoricalIndex
instances for strict type equality. Hence this issue should be much easier to solve than previously.
Solution proposal
In frame.py::DataFrame.append
there is this line:
combined_columns = self.columns.tolist() + self.columns.union(
other.index).difference(self.columns).tolist()
This line converts CategoricalIndex columns to normal indexes. So by making some checks for types and dtypes it should be easy return the correct index. So if the above would be something like this instead:
same_types = type(self.columns) == type(other.index)
same_dtypes = self.columns.dtype == other.index.dtype
if same_types and same_dtypes:
combined_columns = self.columns.union(other.index)
else:
combined_columns = self.columns.tolist() + self.columns.union(
other.index).difference(self.columns).tolist()
and I think this issue can be solved (haven't checked yet all details, maybe some adjustments have to be made). I'd appreciate comments if this approach is ok.