Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
from datetime import datetime
import pandas as pd
def main():
df = pd.DataFrame({
'field': pd.to_datetime([datetime(2022, 1, 20), datetime(2022, 1, 22)]),
'update': [True, False],
})
print("Before:", df.info())
print(df.head(2))
print()
df_to_update = df[df['update']]
df.loc[df['update'], ['field']] = df_to_update['field']
print("After:", df.info())
print(df.head(2))
if __name__ == "__main__":
main()
Issue Description
When doing a partial assignment to a view created by .loc[predicate, [column]]
in a column with dtype datetime64[ns]
(naive datetime) the column dtype changes to object
and the datetimes assigned are represented as float
s.
With non-naive datetimes it works as expected, maintaining the dtype as datetime64[ns, timezone]
.
The reproducible example below prints the following:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 field 2 non-null datetime64[ns]
1 update 2 non-null bool
dtypes: bool(1), datetime64[ns](1)
memory usage: 146.0 bytes
Before: None
field update
0 2022-01-20 True
1 2022-01-22 False
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 field 2 non-null object
1 update 2 non-null bool
dtypes: bool(1), object(1)
memory usage: 146.0+ bytes
After: None
field update
0 1642636800000000000 True
1 2022-01-22 00:00:00 False
Expected Behavior
The assignment shouldn't change the values nor the dtype of the column.
As an example, see what's shown by the reproducible example when we add tzinfo=timezone.utc
to the values:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 field 2 non-null datetime64[ns, UTC]
1 update 2 non-null bool
dtypes: bool(1), datetime64[ns, UTC](1)
memory usage: 146.0 bytes
Before: None
field update
0 2022-01-20 00:00:00+00:00 True
1 2022-01-22 00:00:00+00:00 False
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 field 2 non-null datetime64[ns, UTC]
1 update 2 non-null bool
dtypes: bool(1), datetime64[ns, UTC](1)
memory usage: 146.0 bytes
After: None
field update
0 2022-01-20 00:00:00+00:00 True
1 2022-01-22 00:00:00+00:00 False
As you can see, with a timezone the column doesn't change the dtype and the values are interpreted as datetime
s, not float
s.
Installed Versions
❯ python
Python 3.8.14 (default, Oct 10 2022, 16:44:50)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import pandas as pd
pd>>> pd.show_versions()
INSTALLED VERSIONS
commit : 91111fd
python : 3.8.14.final.0
python-bits : 64
OS : Linux
OS-release : 5.19.13-arch1-1
Version : #1 SMP PREEMPT_DYNAMIC Tue, 04 Oct 2022 14:36:58 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : es_ES.UTF-8
LOCALE : es_ES.UTF-8
pandas : 1.5.1
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 56.0.0
pip : 22.0.4
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None