Skip to content

col.replace(dict) takes too much memory #6697

Open
@jankatins

Description

@jankatins

I've code which run fine about ~10 month ago and now fails due to running out of memory. The code basically does this:

groups = [[1,2,3][222,333,444, 555], ...]  # from a 1.4MB json file
# about 30k such groups, results in ~70k replacements
for replace_list in groups:
        replacement = sorted(replace_list)[0]
        for replace_id in replace_list:
            if replace_id == replacement:
                continue
            replace_dict[replace_id] = replacement
# len(_col) == 974757
_col = df[column].astype("int64")
_col.replace(replace_dict, inplace=True)

I've now split the replacement_dict into 2k chunks and and this works (takes about 20 seconds for each chunk).

Metadata

Metadata

Assignees

Labels

PerformanceMemory or execution speed performancereplacereplace method

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions