Description
Description of the problem
Came across this today when annotating some artefacts in data using the current dev branch. Generated the annotations using the raw data browser and saved them, but when reading them back in and applying to the raw data, the annotations had been shifted towards the start of the recording.
Turns out the annotations I'd read in had orig_time=None
even though this information was present in the annotations before saving and was present in the csv file that had been saved. In short, the datetime format saved in the csv files by Annotations.save()
is sometimes not compatible with the datetime format expected when initialising an Annotations
object, causing it to default to None
.
Did some digging and found no problems in <=v1.6. Seems this was introduced in v1.7 with #12289.
The orig_time
saved in the csv file is read fine by read_annotations()
as the onset time of the first annotation. This gets passed to an Annotations
object, where _handle_meas_date()
attempts to convert the orig_time
string into a datetime object according to ISO8601 using datetime.strptime
(here meas_date=orig_time
). If the orig_time
string does not conform to this format, a ValueError
is raised and orig_time
gets set to None
(L1006-1011).
Lines 997 to 1029 in e4cc4e2
What happens since v1.7 is that the datetime format of annotations saved in the csv files can sometimes not conform to the ISO8601 format. Specifically, ISO8601 expects 6 decimal places for the sub-second info (e.g. 2025-02-10 10:50:20.123456
), but the times saved in the csv files can have >6 decimal places (e.g. 2025-02-10 10:50:20.123456789
). When _handle_meas_date()
checks if orig_time
complies with ISO8601, the extra decimal places prevent a regex match in datetime.strptime
, the ValueError
is raised, and orig_time
gets incorrectly set to None
.
These extra decimal places get written to the csv file because in #12289, Annotations.to_data_frame()
was updated to allow for different time formats, introducing a call to mne.utils.dataframe._convert_times()
and pd.to_timedelta
(L49). If the onset times of annotations have <=6 decimal places, the times returned from pd.to_timedelta
will have 6 decimal places, in line with ISO8601. However, if the onset times have >6 decimal places, pd.to_timedelta
returns times with >6 decimal places. The dataframe of times is what gets saved to csv.
mne-python/mne/utils/dataframe.py
Lines 39 to 50 in e4cc4e2
I can't see a case in the unit tests where onset times have >6 decimals, so it could easily have slipped under the radar.
Below is some code to reproduce this. It runs fine on v1.6, but v1.7+ fails when the number of decimal places in the onset time is >6.
Steps to reproduce
import mne
import numpy as np
from datetime import datetime, timezone
# Create orig_time for annotations
orig_time = datetime.now()
orig_time = orig_time.replace(tzinfo=timezone.utc)
def check_orig_time_roundtrip(onset: float):
# Create annotations object
annotations = mne.Annotations(
onset=[onset], description=["bad"], duration=[1.0], orig_time=orig_time
)
# Save and reload annotations
annotations.save("test_annotations.csv", overwrite=True)
annotations_read = mne.read_annotations("test_annotations.csv")
print(f"Time stored in original annotations object : {annotations.orig_time}")
print(f"Time stored in loaded annotations object : {annotations_read.orig_time}")
if annotations_read.orig_time is None:
raise TypeError(f"Bad onset = {onset}")
onset = 5.0123456789 # >6 decimal places
for n in range(10):
check_orig_time_roundtrip(onset=np.round(onset, decimals=n))
Link to data
No response
Expected results
Annotations.orig_time
should always get read from the csv file and not be assigned None
.
Actual results
Annotations.orig_time
gets assigned None
in v1.7+ when the onset times saved in the csv file have >6 decimal places (causes >6 decimal places to be saved in the onset times in the csv files).
Additional information
What is the best way to handle this?
- Sanitise the onset times from
Annotations.to_data_frame("datetime")
to have at most 6 decimals? - Only sanitise in dataframes being written to a csv?
- Sanitise the datetime formats when reading from the csv?
- Or should
orig_time
handling inAnnotations
initialisation be more lenient to >6 decimals? (but then it wouldn't be ISO8601 compliant)
In any case, perhaps a RuntimeWarning
could be raised when an orig_time
string is set to None
to reduce chances of people accidentally misaligning annotations in their data.