Description
-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
Question about pandas
I posted my question with a bit more context on Stack Overflow here, and got an impressive answer, but I doubt (or want to doubt) that the answer I got is the simplest way to set the freq
attribute of a DateTime index that's part of a MultiIndex
Given a MultiIndexed Series loaded from a CSV that looks like this...
# generate example data
users = ['A', 'B', 'C', 'D']
#dates = pd.date_range("2020-02-01 00:00:00", "2020-04-04 20:00:00", freq="H")
dates = pd.date_range("2020-02-01 00:00:00", "2020-02-04 20:00:00", freq="H")
idx = pd.MultiIndex.from_product([users, dates])
idx.names = ["user", "datehour"]
y = pd.Series(np.random.choice(a=[0, 1], size=len(idx)), index=idx).rename('y')
# write to csv and reload (turns out this matters)
y.to_csv('reprod_example.csv')
y = pd.read_csv('reprod_example.csv', parse_dates=['datehour'])
y = y.set_index(['user', 'datehour']).y
>>> y.head()
user datehour
A 2020-02-01 00:00:00 0
2020-02-01 01:00:00 0
2020-02-01 02:00:00 1
2020-02-01 03:00:00 0
2020-02-01 04:00:00 0
Name: y, dtype: int64
...is there a way to set the freq
attribute of the DateTime index level that's simpler / more intuitive than this answer I received?...
y = pd.read_csv('reprod_example.csv', parse_dates=['datehour'])
y = y.groupby('user').apply(lambda df: df.set_index('datehour').asfreq('H')).y
... setting an index inside a groupby/apply/lambda in order to update the freq
attribute is not what I expected this would take. Certainly not the first thing I (or the answerer) tried.
Thanks! Love Pandas, just thought this was maybe a good opportunity to check if there's a more idiomatic way to do this, since the use case seems not super atypical.