Skip to content

API: interaction between subclasses (implementing __finalize__) and attrs/flags handling in __finalize__ #37099

Open
@jorisvandenbossche

Description

@jorisvandenbossche

Pandas provides the mechanism for subclasses to customize how metadata propagation is handled through _metadata and __finalize__. However, this got more complicated with the introduction of attrs and flags, which are also handled in __finalize__.

Concrete example: GeoPandas implements a custom GeoDataFrame.__finalize__ to be able to have custom logic about how metadata is propagated in methods.
See here for the current implementation: https://github.com/geopandas/geopandas/blob/7044aa479dbae72c14dfb647a5a782a01cff81d1/geopandas/geodataframe.py#L1148-L1162 (it specifically checks for method="merge"|"concat" to ensure we pass through _metadata of the first (concat) or left (merge) object).

For a long time, pandas only add a very simple __finalize__ implementation (which just propagated _metadata unconditionally):

    def __finalize__(self, other, method=None, **kwargs):
        """
        Propagate metadata from other to self.
        Parameters
        ----------
        other : the object from which to get the attributes that we are going
            to propagate
        method : optional, a passed method name ; possibly to take different
            types of propagation actions based on this
        """
        if isinstance(other, NDFrame):
            for name in self._metadata:
                object.__setattr__(self, name, getattr(other, name, None))
        return self

However, with the recent changes, the default __finalize__ now also does (code link to current finalize impl):

  • propagate attrs
  • propagate flags.allows_duplicate_labels and have custom logic for method="concat" for this

As a result of GeoPandas overriding __finalize__, the attrs and flags features currently don't work for GeoDataFrames. See geopandas/geopandas#1654 for the bug report on the GeoPandas side (cc @martinfleis)
While we certainly want to enable those features for GeoDataFrames as well.


So the main question is: given this additional logic in the default __finalize__, how can we enable that subclasses can still override a part of that logic (eg only _metadata).

GeoPandas could in principle copy the implementation of pandas (and adds its own customizations to it) and keep up to date with it. But, that doesn't feel like the ideal solution, certainly if pandas is going to add more functionality to it (eg more flags).

cc @TomAugspurger

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions