Skip to content

BUG (string): contruction of Series / Index fails from dict keys when "str" dtype is specified explicitly #60343

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

When not specifying a dtype (inferring the type), construction of Index or Series from dict keys goes fine:

>>> pd.options.future.infer_string = True
>>> d = {"a": 1, "b": 2}
>>> pd.Index(d.keys())
Index(['a', 'b'], dtype='str')

But if you explicitly specify the dtype, then it fails:

>>> pd.Index(d.keys(), dtype="str")
...

File ~/scipy/repos/pandas/pandas/core/arrays/string_arrow.py:206, in ArrowStringArray._from_sequence(cls, scalars, dtype, copy)
    203     return cls(pc.cast(scalars, pa.large_string()))
    205 # convert non-na-likes to str
--> 206 result = lib.ensure_string_array(scalars, copy=copy)
    207 return cls(pa.array(result, type=pa.large_string(), from_pandas=True))

File lib.pyx:727, in pandas._libs.lib.ensure_string_array()

File lib.pyx:822, in pandas._libs.lib.ensure_string_array()

ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

The reason is that at that point we pass the data directly to the dtype's array _from_sequence instead of first pre-processing the data into a numpy array, and _from_sequence calling ensure_string_array directly doesn't seem to be able to handle dict keys (although we do call np.asarray(..) inside ensure_string_array, so not entirely sure what is going wrong)

Metadata

Metadata

Assignees

Labels

BugConstructorsSeries/DataFrame/Index/pd.array ConstructorsStringsString extension data type and string data

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions