Closed
Description
splitting up #22236.
Let's have
df = pd.DataFrame(np.random.randn(5, 5), columns=list('ABCDE'))
The error handling of df.set_index
can be improved in at least three cases:
df.set_index(['A', 'A'], drop=False)
works, while
df.set_index(['A', 'A'], drop=True)
yields
KeyError: 'A'
- Objects of unknown type yield
KeyError
instead ofTypeError
:
df.set_index(map(str, df.A))
KeyError: "None of [Index([...], dtype='object')] are in the [columns]"
df.set_index(['foo', 'bar', 'baz'])
only shows one missing key
KeyError: 'foo'
(in a huge stacktrace)
Better would be:
- gracefully handle duplicate column names when
drop=True
- raise better error message, e.g.
TypeError: only allowed types are: ...
- Show all missing keys:
KeyError: "['foo', 'bar', 'baz']"