Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
"""
Metadata wipeout example.
"""
import pandas as pd
class S(pd.Series):
_metadata = ['foo']
@property
def _constructor(self):
return S
@property
def _constructor_expanddim(self):
return DF
class DF(pd.DataFrame):
def __repr__(self):
foos = {k: getattr(v, 'foo', None) for k, v in self.items()}
return super().__repr__() + f'\n{foos=}'
@property
def _constructor(self):
return DF
@property
def _constructor_sliced(self):
return S
if __name__ == '__main__':
df = DF({'a': [1]})
df['a'].foo = 'bar'
df.copy() # ``DataFrame.copy`` mutates the data!
assert hasattr(df['a'], 'foo'), "Dataframe mutated by ``copy`` method"
Issue Description
This behavior is a regression. In version 1.0.x, Pandas supported Series
extension by listing attribute names in _metadata
. These attributes were preserved after DataFrame.copy
. Currently, DataFrame.copy
will discard the extended Series
attributes, not only on the new dataframe returned, but on the original as well!
Expected Behavior
DataFrame.copy
should preserve the metadata attributes of each member Series
.
Installed Versions
INSTALLED VERSIONS
commit : 945c9ed
python : 3.9.7.final.0
python-bits : 64
OS : Darwin
OS-release : 21.1.0
Version : Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:24 PDT 2021; root:xnu-8019.41.5~1/RELEASE_ARM64_T8101
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.4
numpy : 1.21.4
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 58.5.3
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.2
IPython : 7.29.0
pandas_datareader: None
bs4 : None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None