Closed
Description
When not specifying a dtype (inferring the type), construction of Index
or Series
from dict keys goes fine:
>>> pd.options.future.infer_string = True
>>> d = {"a": 1, "b": 2}
>>> pd.Index(d.keys())
Index(['a', 'b'], dtype='str')
But if you explicitly specify the dtype, then it fails:
>>> pd.Index(d.keys(), dtype="str")
...
File ~/scipy/repos/pandas/pandas/core/arrays/string_arrow.py:206, in ArrowStringArray._from_sequence(cls, scalars, dtype, copy)
203 return cls(pc.cast(scalars, pa.large_string()))
205 # convert non-na-likes to str
--> 206 result = lib.ensure_string_array(scalars, copy=copy)
207 return cls(pa.array(result, type=pa.large_string(), from_pandas=True))
File lib.pyx:727, in pandas._libs.lib.ensure_string_array()
File lib.pyx:822, in pandas._libs.lib.ensure_string_array()
ValueError: Buffer has wrong number of dimensions (expected 1, got 0)
The reason is that at that point we pass the data directly to the dtype's array _from_sequence
instead of first pre-processing the data into a numpy array, and _from_sequence
calling ensure_string_array
directly doesn't seem to be able to handle dict keys (although we do call np.asarray(..)
inside ensure_string_array
, so not entirely sure what is going wrong)