Skip to content

REGR: concat of unaligned empty DataFrames failing #39037

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

While trying to get dask's CI passing (dask/dask#6996), I noticed another error related to concat. Dask concatenates the empty "meta" dataframe to know the shape/dtypes of the resulting dataframe, and something is failing in there now.

Small reproducer without dask:

>>> df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
>>> df2 = pd.DataFrame({'b': [1, 2, 3], 'c': [4, 5, 6]})

>>> pd.concat([df1[0:0], df2[0:0], df1[0:0]])
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-151-b1206da866d9> in <module>
----> 1 pd.concat([df1[0:0], df2[0:0], df1[0:0]])

~/scipy/pandas/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    297     )
    298 
--> 299     return op.get_result()
    300 
    301 

~/scipy/pandas/pandas/core/reshape/concat.py in get_result(self)
    518 
    519             new_data = concatenate_block_managers(
--> 520                 mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy
    521             )
    522             if not self.copy:

~/scipy/pandas/pandas/core/internals/concat.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
     89         else:
     90             b = make_block(
---> 91                 _concatenate_join_units(join_units, concat_axis, copy=copy),
     92                 placement=placement,
     93                 ndim=len(axes),

~/scipy/pandas/pandas/core/internals/concat.py in _concatenate_join_units(join_units, concat_axis, copy)
    325         join_units = nonempties
    326 
--> 327     empty_dtype, upcasted_na = _get_empty_dtype_and_na(join_units)
    328 
    329     to_concat = [

~/scipy/pandas/pandas/core/internals/concat.py in _get_empty_dtype_and_na(join_units)
    436 
    437     msg = "invalid dtype determination in get_concat_dtype"
--> 438     raise AssertionError(msg)
    439 
    440 

AssertionError: invalid dtype determination in get_concat_dtype

It occurs when reindexing happens (not fully aligned dataframes), and apparently at least 3 dataframes are needed to trigger it (the same example with only 2 dataframes passed to concat doesn't fail).

I suppose this might be related to #38843 (change only on master, so not a target for 1.2.1) cc @jbrockmendel

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TestsUnit test(s) needed to prevent regressionsReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions