Skip to content

BUG: Setting column on empty DataFrame with loc / avoiding SettingWithCopyWarning for potentially empty DataFrames/copies/views #41891

Closed
@klieret

Description

@klieret
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd


df = pd.DataFrame()

# Option 1: works on empty dataframes (adding an empty column)
# but shows the SettingWithCopyWarning for non-empty views/copies
df["a"] = 1

# Option 2: works without warning for views/copies but raises ValueError on empty dataframe
df.loc[:, "a"] = 1

Problem description

Let's consider a function add_column that adds a column.

  • If we use df[column] = value (Option 1), then the function will throw the SettingWithCopyWarning whenever it is called on a copy/view (even if we don't care about propagating the change to the original dataframe).
  • The recommended workaround for this warning is to use df.loc[:, column] = value (Option 2). However, this throws as soon as the dataframe is empty, i.e. doesn't contain any rows

This then requires ugly solutions like the following

def add_column(df):
    if df.empty:
        # Still want to make sure to add the column to avoid KeyErrors later
        df["column"] = 1  # doesn't show SettingWithCopyWarning
        return
    df.loc[:, "column"] = 1

whenever we might be dealing with dataframes or their copies/views that are possibly empty.

INSTALLED VERSIONS

commit : 2cb9652
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-64-generic
Version : #58-Ubuntu SMP Fri Jul 10 19:33:51 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.4
numpy : 1.18.1
pytz : 2019.2
dateutil : 2.7.3
pip : 20.3.3
setuptools : 41.1.0
Cython : None
pytest : 5.3.2
hypothesis : None
sphinx : 3.0.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 5.8.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.16
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.17.0
xlrd : 2.0.1
xlwt : None
numba : 0.51.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCopy / view semanticsIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds DiscussionRequires discussion from core team before further actionWarningsWarnings that appear or should be added to pandas

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions