Description
Pandas provides the mechanism for subclasses to customize how metadata propagation is handled through _metadata
and __finalize__
. However, this got more complicated with the introduction of attrs
and flags
, which are also handled in __finalize__
.
Concrete example: GeoPandas implements a custom GeoDataFrame.__finalize__
to be able to have custom logic about how metadata is propagated in methods.
See here for the current implementation: https://github.com/geopandas/geopandas/blob/7044aa479dbae72c14dfb647a5a782a01cff81d1/geopandas/geodataframe.py#L1148-L1162 (it specifically checks for method="merge"|"concat"
to ensure we pass through _metadata
of the first (concat) or left (merge) object).
For a long time, pandas only add a very simple __finalize__
implementation (which just propagated _metadata
unconditionally):
def __finalize__(self, other, method=None, **kwargs):
"""
Propagate metadata from other to self.
Parameters
----------
other : the object from which to get the attributes that we are going
to propagate
method : optional, a passed method name ; possibly to take different
types of propagation actions based on this
"""
if isinstance(other, NDFrame):
for name in self._metadata:
object.__setattr__(self, name, getattr(other, name, None))
return self
However, with the recent changes, the default __finalize__
now also does (code link to current finalize impl):
- propagate
attrs
- propagate
flags.allows_duplicate_labels
and have custom logic formethod="concat"
for this
As a result of GeoPandas overriding __finalize__
, the attrs
and flags
features currently don't work for GeoDataFrames. See geopandas/geopandas#1654 for the bug report on the GeoPandas side (cc @martinfleis)
While we certainly want to enable those features for GeoDataFrames as well.
So the main question is: given this additional logic in the default __finalize__
, how can we enable that subclasses can still override a part of that logic (eg only _metadata
).
GeoPandas could in principle copy the implementation of pandas (and adds its own customizations to it) and keep up to date with it. But, that doesn't feel like the ideal solution, certainly if pandas is going to add more functionality to it (eg more flags).