Skip to content

BUG: df fails when columns arg is a list containing dupes #2079

Closed
@ghost

Description

In [1]: DataFrame(data,columns=["a","a"])

...
pandas/pandas/core/internals.pyc in _stack_dict(dct, ref_items, dtype)
1344 stacked = np.empty(shape, dtype=dtype)
1345 for i, item in enumerate(items):
-> 1346 stacked[i] = _asarray_compat(dct[item])
1347
1348 # stacked = np.vstack([_asarray_compat(dct[k]) for k in items])

IndexError: index out of bounds

5e6db32 is a failing test for this.

it looks like _to_sdict threads down to a call to _convert_object_array which builds a dict
keyed on column names, so dupe columns get squashed and you end up with a mismatch
between the length of the columns arg to df.__init__ and the data.
_to_sdict is not used for ndarrays so this doesn't haoppen, I was able to reuse
_init_ndarray for the case of columns being a flat list and have things work as expected.

still, too much code touching this, better left to the core devs to decide how to handle this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions