Skip to content

BUG: dataframe with datetimeindex as index, when indexing columns interprets some strings as datetimes #47006

Open
@JamesHowse

Description

@JamesHowse

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
index = pd.DatetimeIndex([pd.to_datetime("2035-01-01 01:00:00"), pd.to_datetime("2036-01-01 00:00:00")])
df = pd.DataFrame(index=index)
df.loc[:, "110735"] = 0
print(df)

Issue Description

Under certain situations assignment is broken

  • Dataframe has a DatetimeIndex as its index.
  • You want to assign to an entire column (either with .loc or directly with [ ])
  • The column name you want to assign is a string which can be interpreted as a datetime within the range of datetime values in the datetimeindex. In the example above "110735" is interpreted as 2035-11-07.

If these conditions are met, the assignment fails and the column is not populated. Pandas is interpreting the string as a datetime and seems to think you are attempting to access the row "110735".

In recent versions a warning is produced, but the script does not crash. This warning indicates that pandas thinks we have tried to use indexing like frame[string], however we have used frame.loc[:, string] which should not have this issue.
This FutureWarning is not valid as the assignment fails completely and no changes are made to the dataframe.

FutureWarning: Indexing a DataFrame with a datetimelike index using a single string to slice the rows, like frame[string], is deprecated and will be removed in a future version. Use frame.loc[string] instead.
self.obj[key] = value

In recent versions if you run with "df.loc[df.index, "110735"] = 0" you get a crash with this error:

pandas/core/indexing.py:1684: FutureWarning: Indexing a DataFrame with a datetimelike index using a single string to slice the rows, like frame[string], is deprecated and will be removed in a future version. Use frame.loc[string] instead.
self.obj[key] = infer_fill_value(value)
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '110735'

In older pandas versions it crashes with a KeyError when "df.loc[:, "110735"] = 0" is run :

Traceback (most recent call last):
File "base.py", line 2898, in get_loc
return self._engine.get_loc(casted_key)
File "pandas_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '110735'

Expected Behavior

Expected behaviour is that a column "110735" is populated with 0 in all rows.

Installed Versions

Remote test with a more recent pandas version:

INSTALLED VERSIONS

commit : 06d2301
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.13.0-1017-aws
Version : #19~20.04.1-Ubuntu SMP Mon Mar 7 12:53:12 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.1
numpy : 1.22.2
pytz : 2021.3
dateutil : 2.8.2
pip : 22.0.3
setuptools : 45.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : 1.0.2
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.0.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.1.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.3.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
None

Local test with an older pandas version:

INSTALLED VERSIONS

commit : b5958ee
python : 3.6.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.1.5
numpy : 1.19.1
pytz : 2018.6
dateutil : 2.7.3
pip : 21.3.1
setuptools : 40.4.3
Cython : 0.29
pytest : 6.1.1
hypothesis : 3.79.3
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.1.2
lxml.etree : 4.2.5
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.3.0
pandas_datareader: None
bs4 : 4.6.3
bottleneck : None
fsspec : 2022.01.0
fastparquet : 0.8.0
gcsfs : None
matplotlib : 3.0.0
numexpr : None
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.4
sqlalchemy : 1.4.21
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeIndexingRelated to indexing on series/frames, not to indexes themselvesRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions