Skip to content

DataFrame.to_dict returning numpy scalars in certain cases #23753

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

I think in general we try to return python scalars instead of numpy scalars in to_dict (similar as in tolist or iteration).

Eg:

In [27]: df = pd.DataFrame({'a': [1, 2], 'b': [.1, .2]})

In [28]: df.to_dict()
Out[28]: {'a': {0: 1, 1: 2}, 'b': {0: 0.1, 1: 0.2}}

In [29]: type(df.to_dict()['a'][0])
Out[29]: int

However, this is not consistent, and eg when using orient='records':

In [31]: df.to_dict(orient='records')
Out[31]: [{'a': 1.0, 'b': 0.10000000000000001}, {'a': 2.0, 'b': 0.20000000000000001}]

In [32]: type(df.to_dict(orient='records')[0]['a'])
Out[32]: numpy.float64

In this case, that is because of iterating over self.values in the 'records' implementation (which also means that if you have a string column, self.values will be object dtype, and you actually get python scalars)

There are a bunch of other issues related to iteration (eg #20791, #13468), but didn't see one specifically related to to_dict.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDataFrameDataFrame data structureDtype ConversionsUnexpected or buggy dtype conversionsNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions