Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
index = pd.DatetimeIndex([pd.to_datetime("2035-01-01 01:00:00"), pd.to_datetime("2036-01-01 00:00:00")])
df = pd.DataFrame(index=index)
df.loc[:, "110735"] = 0
print(df)
Issue Description
Under certain situations assignment is broken
- Dataframe has a DatetimeIndex as its index.
- You want to assign to an entire column (either with .loc or directly with [ ])
- The column name you want to assign is a string which can be interpreted as a datetime within the range of datetime values in the datetimeindex. In the example above "110735" is interpreted as 2035-11-07.
If these conditions are met, the assignment fails and the column is not populated. Pandas is interpreting the string as a datetime and seems to think you are attempting to access the row "110735".
In recent versions a warning is produced, but the script does not crash. This warning indicates that pandas thinks we have tried to use indexing like frame[string]
, however we have used frame.loc[:, string]
which should not have this issue.
This FutureWarning is not valid as the assignment fails completely and no changes are made to the dataframe.
FutureWarning: Indexing a DataFrame with a datetimelike index using a single string to slice the rows, like
frame[string]
, is deprecated and will be removed in a future version. Useframe.loc[string]
instead.
self.obj[key] = value
In recent versions if you run with "df.loc[df.index, "110735"] = 0" you get a crash with this error:
pandas/core/indexing.py:1684: FutureWarning: Indexing a DataFrame with a datetimelike index using a single string to slice the rows, like
frame[string]
, is deprecated and will be removed in a future version. Useframe.loc[string]
instead.
self.obj[key] = infer_fill_value(value)
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '110735'
In older pandas versions it crashes with a KeyError when "df.loc[:, "110735"] = 0" is run :
Traceback (most recent call last):
File "base.py", line 2898, in get_loc
return self._engine.get_loc(casted_key)
File "pandas_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '110735'
Expected Behavior
Expected behaviour is that a column "110735" is populated with 0 in all rows.
Installed Versions
Remote test with a more recent pandas version:
INSTALLED VERSIONS
commit : 06d2301
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.13.0-1017-aws
Version : #19~20.04.1-Ubuntu SMP Mon Mar 7 12:53:12 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.4.1
numpy : 1.22.2
pytz : 2021.3
dateutil : 2.8.2
pip : 22.0.3
setuptools : 45.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : 1.0.2
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.0.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.1.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.3.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
None
Local test with an older pandas version:
INSTALLED VERSIONS
commit : b5958ee
python : 3.6.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.1.5
numpy : 1.19.1
pytz : 2018.6
dateutil : 2.7.3
pip : 21.3.1
setuptools : 40.4.3
Cython : 0.29
pytest : 6.1.1
hypothesis : 3.79.3
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.1.2
lxml.etree : 4.2.5
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.3.0
pandas_datareader: None
bs4 : 4.6.3
bottleneck : None
fsspec : 2022.01.0
fastparquet : 0.8.0
gcsfs : None
matplotlib : 3.0.0
numexpr : None
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.4
sqlalchemy : 1.4.21
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : None