Skip to content

ENH: Add unixtime accessor to Timestamp and DatetimeIndex objects #43975

Closed
@fergalm

Description

@fergalm

Pain Point

I feel that extracting unixtimes from date strings is less graceful than it could be, and relies on the user knowing more about the internal format of datetime objects than they should have to.

Currently (at least as of version 1.3.3), a user needs to cast a Timestamp object as an integer to extract the unixtime (in nanoseconds)

import pandas as pd
import numpy as np

x = pd.to_datetime("2020")
unixtime_sec = x.astype(np.int64) / 1e9

This makes sense if you understand how a Timestamp object is storing its information internally. However, I would argue that this requires the end-user to understand an implementation detail that they shouldn't need to. I shouldn't need to care whether the time is represented internally by unixtime, TAI, GPS time etc. I should be able to ask for unixtime without relying on knowledge of the implementation details.

I would also argue that the current approach also makes code harder to read. While other attributes of time can be obtained by asking for them directly, unixtime requires an more indirect request

hour_of_day = x.hour     #Easy to read
unixtime_ns = x.astype(np.int64)  #Harder to read 

To add to the confusion, the DatetimeIndex object requires the user needs to use view instead of astype (see #38544). I don't understand quite why the interface for Timestamps and DatetimeIndices needs to be different .

x = pd.date_range("2020", "2021")
unixtime_sec = x.view(np.int64) / 1e9

Proposed solution

Timestamps and DatetimeIndex objects should have a unixtime accessor consistent with the interface used to access hour of day, day of month, etc.

x = pd.to_datetime("2020")
hour_of_day = x.hour  # Currently exists
day_of_month = x.day   # Currently exists
unixtime_sec = x.unixtime  # This proposal

x = pd.date_range("2020", "2021")
unixtime_sec = x.unixtime  # This proposal

The accessor should return a floating point number for a timestamp, or an iterable (e.g a Series) of floating point numbers for a DatetimeIndex. This float should represent the number of seconds elapsed from the epoch. The unit should be seconds, not nanoseconds, to be consistent with the definition of unixtime (https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap04.html#tag_21_04_16).

API breaking implications

This will not break any other features of the API, although deprecating the astype() and view() accessors could be considered. If these accessors are deprecated, that would make changing the internal representation in a Timestamp object easier in future.

Describe alternatives you've considered

The current methods work, although they are unsatisfactory because

  1. They require the end user have knowledge of what is an implementation detail
  2. They result in code that is harder to read
  3. It is cumbersome to write code that works for both Timestamps and DatetimeIndex objects

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions