Open
Description
I've code which run fine about ~10 month ago and now fails due to running out of memory. The code basically does this:
groups = [[1,2,3][222,333,444, 555], ...] # from a 1.4MB json file
# about 30k such groups, results in ~70k replacements
for replace_list in groups:
replacement = sorted(replace_list)[0]
for replace_id in replace_list:
if replace_id == replacement:
continue
replace_dict[replace_id] = replacement
# len(_col) == 974757
_col = df[column].astype("int64")
_col.replace(replace_dict, inplace=True)
I've now split the replacement_dict into 2k chunks and and this works (takes about 20 seconds for each chunk).