Description
Code Sample, a copy-pastable example if possible
Let's assume the following subclassing case:
import pandas as pd
# Define a subclass of Series
class ExtendedSeries( pd.Series ):
_metadata = ['_reference']
def __init__( self, *args, **kwargs ):
reference = kwargs.pop('reference', {})
super(ExtendedSeries, self).__init__(*args, **kwargs)
self._reference = reference
@property
def _constructor(self):
return ExtendedSeries
# Define a subclass of DataFrame that slices into the ExtendedSeries class
class ExtendedFrame( pd.DataFrame ):
_metadata = ['_reference']
def __init__( self, *args, **kwargs ):
reference = kwargs.pop('reference', {})
super(ExtendedFrame, self).__init__(*args, **kwargs)
self._reference = reference
@property
def _constructor(self):
return ExtendedFrame
@property
def _constructor_sliced(self):
return ExtendedSeries
Problem description
This works fine, but does not allow to transfer extended metadata from the ExtendedFrame
to the ExtendedSeries
.
As far as I understand, writing the _constructor_sliced
as follows should work:
import pandas as pd
# Define a subclass of DataFrame that slices into the ExtendedSeries class
class ExtendedFrame( pd.DataFrame ):
_metadata = ['_reference']
def __init__( self, *args, **kwargs ):
reference = kwargs.pop('reference', {})
super(ExtendedFrame, self).__init__(*args, **kwargs)
self._reference = reference
@property
def _constructor(self):
return ExtendedFrame
@property
def _constructor_sliced(self):
a = ExtendedSeries([], reference=self._reference)
return a.__init__
this would allow to first set the metadata and then return the object to initialice its data. Isn't it?
But defining it this way gives errors in core/frame:2166
and core/frame:2563
. In both cases this is due to the call self._constructor_sliced._from_array()
.
Seeing that Series.from_array
has been labeled as deprecated and that Series._from_array
calls the class' constructor. Couldn't it be possible to just change the two instances of self._constructor_sliced._from_array()
to self._constructor_sliced()
?
If I'm seeing this correctly, wouldn't this change allow for this level of flexibility in subclassing without affecting the regular functionality?
Output of pd.show_versions()
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.21.1
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.14.0
scipy: 0.18.1
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: 1.6.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None