Skip to content

BUG: Assignment to a .loc view of a naive datetime column changes its dtype to object #49837

Closed
@Terseus

Description

@Terseus

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from datetime import datetime

import pandas as pd


def main():
    df = pd.DataFrame({
        'field': pd.to_datetime([datetime(2022, 1, 20), datetime(2022, 1, 22)]),
        'update': [True, False],
    })
    print("Before:", df.info())
    print(df.head(2))
    print()
    df_to_update = df[df['update']]
    df.loc[df['update'], ['field']] = df_to_update['field']
    print("After:", df.info())
    print(df.head(2))


if __name__ == "__main__":
    main()

Issue Description

When doing a partial assignment to a view created by .loc[predicate, [column]] in a column with dtype datetime64[ns] (naive datetime) the column dtype changes to object and the datetimes assigned are represented as floats.

With non-naive datetimes it works as expected, maintaining the dtype as datetime64[ns, timezone].

The reproducible example below prints the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      datetime64[ns]
 1   update  2 non-null      bool
dtypes: bool(1), datetime64[ns](1)
memory usage: 146.0 bytes
Before: None
       field  update
0 2022-01-20    True
1 2022-01-22   False

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      object
 1   update  2 non-null      bool
dtypes: bool(1), object(1)
memory usage: 146.0+ bytes
After: None
                 field  update
0  1642636800000000000    True
1  2022-01-22 00:00:00   False

Expected Behavior

The assignment shouldn't change the values nor the dtype of the column.

As an example, see what's shown by the reproducible example when we add tzinfo=timezone.utc to the values:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      datetime64[ns, UTC]
 1   update  2 non-null      bool
dtypes: bool(1), datetime64[ns, UTC](1)
memory usage: 146.0 bytes
Before: None
                      field  update
0 2022-01-20 00:00:00+00:00    True
1 2022-01-22 00:00:00+00:00   False

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      datetime64[ns, UTC]
 1   update  2 non-null      bool
dtypes: bool(1), datetime64[ns, UTC](1)
memory usage: 146.0 bytes
After: None
                      field  update
0 2022-01-20 00:00:00+00:00    True
1 2022-01-22 00:00:00+00:00   False

As you can see, with a timezone the column doesn't change the dtype and the values are interpreted as datetimes, not floats.

Installed Versions

❯ python
Python 3.8.14 (default, Oct 10 2022, 16:44:50)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import pandas as pd
pd>>> pd.show_versions()

INSTALLED VERSIONS

commit : 91111fd
python : 3.8.14.final.0
python-bits : 64
OS : Linux
OS-release : 5.19.13-arch1-1
Version : #1 SMP PREEMPT_DYNAMIC Tue, 04 Oct 2022 14:36:58 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : es_ES.UTF-8
LOCALE : es_ES.UTF-8

pandas : 1.5.1
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 56.0.0
pip : 22.0.4
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

Metadata

Metadata

Assignees

Labels

Dtype ConversionsUnexpected or buggy dtype conversionsIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds TestsUnit test(s) needed to prevent regressions

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions