Skip to content

DataFrame.__finalize__ not called in pd.concat #6927

Closed
@wcbeard

Description

@wcbeard

When I assign metadata to a df

import numpy as np
import pandas as pd
np.random.seed(10)

pd.DataFrame._metadata = ['filename']
df1 = pd.DataFrame(np.random.randint(0, 4, (3, 2)), columns=list('ab'))
df1.filenames = {'a': 'f1', 'b': 'f2'}
df1
       a  b
    0  1  1
    1  0  3
    2  0  1

and define a __finalize__ that prints when it's called

def finalize_df(self, other, method=None, **kwargs):
    print 'finalize called'
    for name in self._metadata:
        object.__setattr__(self, name, getattr(other, name, None))
    return self

pd.DataFrame.__finalize__ = finalize_df

nothing is preserved when pd.concat is called:

stacked = pd.concat([df1, df1])  # Nothing printed
stacked
       a  b
    0  1  1
    1  0  3
    2  0  1
    0  1  1
    1  0  3
    2  0  1
stacked.finalize  # => AttributeError 

For this specific case it seems reasonable that __finalize__ should be used since all of the elements are from the same dataframe, though I'm not sure about the general use since concat can also take types other than a DataFrame. But should we/do we have some method to stack dataframes that preserves metadata?

Similar to #6923.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions