Skip to content

Transfer _metadata from Subclassed DataFrame to Subclassed Series #19850

Open
@jaumebonet

Description

@jaumebonet

Code Sample, a copy-pastable example if possible

Let's assume the following subclassing case:

import pandas as pd

# Define a subclass of Series
class ExtendedSeries( pd.Series ):
     _metadata = ['_reference']

     def __init__( self, *args, **kwargs ):
        reference = kwargs.pop('reference', {})
        super(ExtendedSeries, self).__init__(*args, **kwargs)
        self._reference = reference

    @property
    def _constructor(self):
        return ExtendedSeries

# Define a subclass of DataFrame that slices into the ExtendedSeries class
class ExtendedFrame( pd.DataFrame ):

     _metadata = ['_reference']

    def __init__( self, *args, **kwargs ):
        reference = kwargs.pop('reference', {})
        super(ExtendedFrame, self).__init__(*args, **kwargs)
        self._reference = reference

    @property
    def _constructor(self):
        return ExtendedFrame
    @property
    def _constructor_sliced(self):
        return ExtendedSeries

Problem description

This works fine, but does not allow to transfer extended metadata from the ExtendedFrame to the ExtendedSeries.
As far as I understand, writing the _constructor_sliced as follows should work:

import pandas as pd

# Define a subclass of DataFrame that slices into the ExtendedSeries class
class ExtendedFrame( pd.DataFrame ):

     _metadata = ['_reference']

    def __init__( self, *args, **kwargs ):
        reference = kwargs.pop('reference', {})
        super(ExtendedFrame, self).__init__(*args, **kwargs)
        self._reference = reference

    @property
    def _constructor(self):
        return ExtendedFrame
    @property
    def _constructor_sliced(self):
        a = ExtendedSeries([], reference=self._reference)
        return a.__init__

this would allow to first set the metadata and then return the object to initialice its data. Isn't it?
But defining it this way gives errors in core/frame:2166 and core/frame:2563. In both cases this is due to the call self._constructor_sliced._from_array().

Seeing that Series.from_array has been labeled as deprecated and that Series._from_array calls the class' constructor. Couldn't it be possible to just change the two instances of self._constructor_sliced._from_array() to self._constructor_sliced()?
If I'm seeing this correctly, wouldn't this change allow for this level of flexibility in subclassing without affecting the regular functionality?

Output of pd.show_versions()

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.21.1
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.14.0
scipy: 0.18.1
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: 1.6.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions