Skip to content

DataArray.dt.seconds returns incorrect value for negative timedelta64[ns] #5937

Closed
@leifdenby

Description

@leifdenby

What happened:

For a negative timedelta64[ns] of 42 nanoseconds DataArray.dt.seconds returned a non-zero value (the returned value was 86399). When I pass in a positive 42 nanosecond timedelta64[ns] with the the TimeDeltaAccessor correctly returns zero. I would have expected both assertions in the example below to have passed, but the second fails. This seems to be a general issue with negative timedelta64[ns].

<xarray.DataArray 'seconds' (dim_0: 1)>
array([0])
Dimensions without coordinates: dim_0
<xarray.DataArray 'seconds' (dim_0: 1)>
array([86399])
Dimensions without coordinates: dim_0
Traceback (most recent call last):
  File "bug_dt_seconds.py", line 15, in <module>
    assert da.dt.seconds == 0
AssertionError

What you expected to happen:

<xarray.DataArray 'seconds' (dim_0: 1)>
array([0])
Dimensions without coordinates: dim_0
<xarray.DataArray 'seconds' (dim_0: 1)>
array([0])
Dimensions without coordinates: dim_0

Minimal Complete Verifiable Example:

# coding: utf-8

import xarray as xr
import numpy as np

# number of nanoseconds
value = 42

da = xr.DataArray([np.timedelta64(value, "ns")])
print(da.dt.seconds)
assert da.dt.seconds == 0

da = xr.DataArray([np.timedelta64(-value, "ns")])
print(da.dt.seconds)
assert da.dt.seconds == 0

Anything else we need to know?:

I've narrowed this down to the call to pd.Series(values.ravel()) in xarray.core.accessor_dt._access_through_series:

ipdb> pd.Series(values.ravel())
0   -1 days +23:59:59.999999958
dtype: timedelta64[ns]

I think the issue arises because pandas turns the numpy timedelta64 into a "minus one day plus a time". This actually does have a number of "seconds" in it, but the "total_seconds" has the expected value:

ipdb> pd.Series(values.ravel()).dt.total_seconds()
0   -4.200000e-08
dtype: float64

Which would correctly round to zero.

I don't think the issue is in pandas, although the output from pandas is counter-intuitive:

ipdb> pd.Series(values.ravel()).dt.seconds
0    86399
dtype: int64

Maybe we should handle this as a special case by taking the absolute value before passing the values to pandas (and then applying the original sign again afterwards)?

Environment:

Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.7 (default, May  6 2020, 04:59:01)
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 19.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_GB.UTF-8
LANG: None
LOCALE: ('en_GB', 'UTF-8')
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.18.2
pandas: 1.3.4
numpy: 1.19.1
scipy: 1.5.0
netCDF4: 1.4.2
pydap: installed
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.10.1
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.09.1
distributed: 2021.09.1
matplotlib: 3.2.2
cartopy: 0.18.0
seaborn: 0.10.1
numbagg: None
fsspec: 2021.06.1
cupy: None
pint: 0.18
sparse: None
setuptools: 46.4.0.post20200518
pip: 21.1.2
conda: None
pytest: 6.0.1
IPython: 7.16.1
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions