Open
Description
As part of #30588, we now raise when trying to create a 2D index. This introduces a behavior change when you call DataFrame.set_index with duplicate data.
Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1, 2, 3]], columns=['a', 'a', 'b'])
In [3]: result = df.set_index('a')
On pandas 0.25.3, that gives back a DataFrame with a broken Index. Some DataFrame operations will work, but even things like printing the repr will fail
# 0.25.3
In [17]: type(result)
Out[17]: pandas.core.frame.DataFrame
In [18]: result.shape
Out[18]: (1, 1)
With 1.0.0rc0, that raises
~/sandbox/pandas/pandas/core/indexes/numeric.py in __new__(cls, data, dtype, copy, name)
76 if subarr.ndim > 1:
77 # GH#13601, GH#20285, GH#27125
---> 78 raise ValueError("Index data must be 1-dimensional")
79
80 name = maybe_extract_name(name, data, cls)
ValueError: Index data must be 1-dimensional
Problem description
The old output is clearly broken, so I wouldn't consider this a (major) regression. And I don't think people should be doing this in the first place. But I wanted to ask, should DataFrame.set_index(scalar)
return a MultiIndex when scalar
is a duplicate label?