Skip to content

BUG: Interchange protocol uses u for string format code but offets are 8 bytes #56754

Closed
@WillAyd

Description

@WillAyd

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

When inspecting the offsets buffer of a string sent via the interchange protocol the buffer uses 8 bytes per entry to store an offset even when the format code is "u"

LargeString should use 8 bytes per entry, but has a different format code of "U"

Issue Description

see above

Expected Behavior

u should have 4 byte offsets, U should have 8

Installed Versions

main

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugInterchangeDataframe Interchange ProtocolStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions