Skip to content

BUG: merging on int32 platforms with large blocks #13193

Open
@randomgambit

Description

@randomgambit

Hello everyone,

I am trying to merge a ridiculously large dataframe with another ridiculously smaller one and I get

df=df.merge(slave,left_on='buyer',right_on='NAME',how='left')
OverflowError: Python int too large to convert to C long

Ram is filled at 56% prior to the merge. Am I hitting some limitations here?

master dataframe

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 80162624 entries, 0 to 90320839
Data columns (total 38 columns):
index                      int64

dtypes: datetime64[ns](2), float32(1), int64(3), object(32)
memory usage: 23.0+ GB
dataframe I would like to merge to the master

slave.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55394 entries, 0 to 55393
Data columns (total 6 columns):
dtypes: object(6)
memory usage: 2.5+ MB

I am using the latest Anaconda distribution (that is, with Pandas 18.0)
Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    32bit32-bit systemsBugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions