Skip to content

REGR: concat on index with duplicate labels fails #35238

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Noticed from failing geopandas tests:

df1 = pd.DataFrame({'a': [0, 1]}) 
df2 = pd.DataFrame({'b': [0, 1, 2]}, index=[0, 0, 1]) 
pd.concat([df1, df2], axis=1)

this started failing recently on master:

In [3]: pd.concat([df1, df2], axis=1) 
...
~/scipy/pandas/pandas/core/internals/managers.py in _verify_integrity(self)
    314         for block in self.blocks:
    315             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
--> 316                 raise construction_error(tot_items, block.shape[1:], self.axes)
    317         if len(self.items) != tot_items:
    318             raise AssertionError(

ValueError: Shape of passed values is (3, 2), indices imply (2, 2)

While the expected result is (with automatic reindex of df1 because concat aligns on the index):

In [2]: pd.concat([df1, df2], axis=1)
Out[2]: 
   a  b
0  0  0
0  0  1
1  1  2

Metadata

Metadata

Assignees

No one assigned

    Labels

    BlockerBlocking issue or pull request for an upcoming releaseRegressionFunctionality that used to work in a prior pandas versionReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions