Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import datetime as dt
import numpy as np
# Sorry for the convoluted definition of the index, but I didn't find a better way
# to initialize a DatetimeIndex *with* nanosecond precision *and* timezone information.
d = pd.DataFrame({'testcol': pd.Series([12, 13], index=[dt.datetime(2023,1,1,0,0,0,000000, tzinfo=dt.timezone.utc), dt.datetime(2023,1,1,0,0,1,000000, tzinfo=dt.timezone.utc)])})
d.index = d.index + np.diff(d.index)/3
d.to_json(date_format='iso', date_unit='ns')
Issue Description
I have a DataFrame with a DatetimeIndex of nanosecond precision and timezone information (the application I use works like that). I want to save the DataFrame as json, which will be later loaded into a DataFrame. Ideally it should be the same as the original. To save the fact that the index has timezone information, I have to specify date_format=iso
. The resulting json string only has microsecond information and misses the nanosecond part.
In the submitted example, the result is:
>>> d.to_json(date_format='iso', date_unit='ns')
'{"testcol":{"2023-01-01T00:00:00.333333000Z":12,"2023-01-01T00:00:01.333333000Z":13}}'
This leads to the following test to fail:
>>> pd.testing.assert_frame_equal(d, pd.read_json(d.to_json(date_unit='ns', orient='columns', date_format='iso')))
AssertionError: DataFrame.index are different
DataFrame.index values are different (100.0 %)
[left]: DatetimeIndex(['2023-01-01 00:00:00.333333333+00:00', '2023-01-01 00:00:01.333333333+00:00'], dtype='datetime64[ns, UTC]', freq=None)
[right]: DatetimeIndex(['2023-01-01 00:00:00.333333+00:00', '2023-01-01 00:00:01.333333+00:00'], dtype='datetime64[ns, UTC]', freq=None)
At positional index 0, first diff: 2023-01-01T00:00:00.333333333 != 2023-01-01T00:00:00.333333000
If the DatetimeIndex is initialized without timezone info, the resulting json string is correct, and the test passes.
>>> d = pd.DataFrame({'testcol': pd.Series([12, 13], index=[np.datetime64('2023-01-01T00:00:00.333333333'), np.datetime64('2023-01-01T00:00:01.333333333')])})
>>> d.to_json(date_format='iso', date_unit='ns')
'{"testcol":{"2023-01-01T00:00:00.333333333":12,"2023-01-01T00:00:01.333333333":13}}'
>>> pd.testing.assert_frame_equal(d, pd.read_json(d.to_json(date_unit='ns', orient='columns', date_format='iso')))
Expected Behavior
The resulting json string should retain all the decimals up to nanoseconds, as specified in the argument. The index of the example is:
>>> d.index
DatetimeIndex(['2023-01-01 00:00:00.333333333+00:00', '2023-01-01 00:00:01.333333333+00:00'], dtype='datetime64[ns, UTC]', freq=None)
and thus the json string should be:
>>> d.to_json(date_format='iso', date_unit='ns')
'{"testcol":{"2023-01-01T00:00:00.333333333Z":12,"2023-01-01T00:00:01.333333333Z":13}}'
Installed Versions
INSTALLED VERSIONS
commit : 965ceca
python : 3.10.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.12.14-195-default
Version : #1 SMP Tue May 7 10:55:11 UTC 2019 (8fba516)
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.0.2
numpy : 1.24.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 63.2.0
pip : 22.2.2
Cython : None
pytest : 7.3.1
hypothesis : None
sphinx : 7.0.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.13.2
pandas_datareader: None
bs4 : 4.12.2
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None