Skip to content

BUG: Subclassed DataFrame doesn't persist _metadata properties across binary operations #34177

Open
@clausmith

Description

@clausmith
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


When subclassing a DataFrame, fields added to the _metadata property are only persisted across some operations (such as slicing) and not others (such as any arithmetic operation).

I would expect any properties defined on the subclass to persist whenever the result of an operation is an instance of the subclass.

The following is the example taken from the "Extending Pandas" docs: https://pandas.pydata.org/pandas-docs/stable/development/extending.html

import pandas as pd
class SubclassedDataFrame2(pd.DataFrame): 
 
    # temporary properties 
    _internal_names = pd.DataFrame._internal_names + ['internal_cache'] 
    _internal_names_set = set(_internal_names) 
 
    # normal properties 
    _metadata = ['added_property'] 
 
    @property 
    def _constructor(self): 
        return SubclassedDataFrame2 

df = SubclassedDataFrame2({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
df.internal_cache = "cached"
df.added_property = "property"

With the above setup, here's how to reproduce the problem:

>>> df.added_property
'property'

>>> df[["A", "B"]].added_property # this works as expected
'property'

>>> (df * 2).added_property # I would expect this to work
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/clausmith/Developer/pandas-bug/pandas/pandas/core/generic.py", line 5220, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'SubclassedDataFrame2' object has no attribute 'added_property'

Problem description

The current behavior means that you can almost never rely on custom properties to persist on a subclassed DataFrame. This substantially reduces the utility of these custom properties.

Expected Output

I would expect the added_property property in the example above to persist after performing the arithmetic operation on the DataFrame. Especially because the result of (df * 2) is still an instance of SubclassedDataFrame2.

Output of pd.show_versions()

commit           : 507cb1548d36bbf48c3084a78d59af2fed78a9d1
python           : 3.7.3.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 18.7.0
Version          : Darwin Kernel Version 18.7.0: Mon Feb 10 21:08:45 PST 2020; root:xnu-4903.278.28~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.0.dev0+1576.g507cb1548
numpy            : 1.18.4
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 19.0.3
setuptools       : 40.8.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : None
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNumeric OperationsArithmetic, Comparison, and Logical operationsSubclassingSubclassing pandas objectsmetadata_metadata, .attrs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions