Skip to content

DataFrame.copy(), at least, should be threadsafe #2728

Open
@bshanks

Description

@bshanks

dataframe.copy() should happen atomically/be threadsafe, meaning that it should produce a consistent dataframe even if the call to .copy() is made while another thread is deleting entries from the dataframe, or if another thread calls a deletion method while the call to .copy() is working (in other words, i guess .copy() should acquire a lock that prevents mutation during the copy). That is, the following code, which crashes in 0.7.3, should succeed:

import pandas
import threading

df = pandas.DataFrame()

def mutateDf(df):
    while True:
        df[0] = pandas.Series([1,2,3])
        del df[0]

def readDf(df):
    while True:
        dfCopy = df.copy()
        if 0 in dfCopy and 1 in dfCopy[0]:
            a = dfCopy[0][1]

t1 = threading.Thread(target=mutateDf, args=(df,))
t2 = threading.Thread(target=readDf, args=(df,))

t1.start()
t2.start()
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "<ipython-input-5-8aef72c7f1b4>", line 4, in readDf
    if 0 in dfCopy and 1 in dfCopy[0]:
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.7.3-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 1458, in __getitem__
    return self._get_item_cache(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.7.3-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 294, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.7.3-py2.7-linux-x86_64.egg/pandas/core/internals.py", line 625, in get
    _, block = self._find_block(item)
TypeError: 'NoneType' object is not iterable

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions