Description
import pandas as pd
import pandas._testing as tm
from io import StringIO
df = pd.DataFrame(
{
"A": pd.to_datetime(["2013-01-01", "2013-01-02"]).as_unit("s"),
"B": [3.5, 3.5],
}
)
written = df.to_json(orient="split")
>>> written
'{"A":{"0":1356,"1":1357},"B":{"0":3.5,"1":3.5}}'
result = pd.read_json(StringIO(written), orient="split", convert_dates=["A"])
>>> result
A B
0 1356 3.5
1 1357 3.5
tm.assert_frame_equal(result, df) # <- fails
The example here is based on test_frame_non_unique_columns, altered by 1) making the columns into ["A", "B"] and 2) changing the dtype for the first column from M8[ns]
to M8[s]
.
This goes through a check in _try_convert_to_date
:
# ignore numbers that are out of range
if issubclass(new_data.dtype.type, np.number):
in_range = (
isna(new_data._values)
| (new_data > self.min_stamp)
| (new_data._values == iNaT)
)
if not in_range.all():
return data, False
when the json is produced from M8[s] (or M8[ms]) data, these values are all under self.min_stamp
, so this check causes us to short-circuit and not go through the pd.to_datetime conversion that comes just after (which itself looks sketchy but that can wait for another day).
cc @WillAyd my best guess is that there is nothing we can do at the reading stage and we should convert non-nano to nano at the writing stage, or maybe just warn users that they are doing something that doesn't round-trip?
Surfaced while implementing #55564 (which will cause users to get non-nano in many cases where they currently get nano).