Skip to content

QST: Grouping at fixed time intervals (hours and minutes) regardeless the date and the first row time #57642

Open
@fede72bari

Description

@fede72bari

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/78062460/grouping-in-minutes-intervals-starting-at-fixed-time-regardless-the-first-row-ti

Question about pandas

I have this code intended to count occurrences in 30-minute intervals; the requirement is to have these intervals at fixed starting points, minute 00 and minute 30 of each hour. Regretfully, despite every attempt of mine, the second group is aligned to minute 03 and minute 33.

I suspect that both the groups are aligned with the first time row and that the first one is correct just by chance. How can I tell the grouper to force the alignment to minutes 00 and 30?

# load and prepare data
df_long_forecast = pd.read_csv('df_long_forecast - reduced.csv')
df_long_forecast['after_12_max_datetime'] = pd.to_datetime(df_long_forecast['after_12_max_datetime'])
df_long_forecast['after_12_max_time'] = (df_long_forecast['after_12_max_datetime'] - df_long_forecast['after_12_max_datetime'].dt.normalize())  # timedelta64[ns]

# count the number of maxes happening after 12am (absolute and percentage)
hist_max = df_long_forecast.groupby(pd.Grouper(key='after_12_max_time', freq='30T',  offset='0T', origin=origin))['Date'].count()
percent_max = round(hist_max / hist_max.sum() * 100, 2)
display(hist_max)

# count the number of maxes after 12 that result in a profit trade (cannot be MORE than the previous ones)
df_long_forecast_profit = df_long_forecast[ df_long_forecast['after_12_max_>_9_to_12_high'] > 0 ]
profit_long = df_long_forecast_profit.groupby(pd.Grouper(key='after_12_max_time', freq='30T',  offset='0T', origin=origin))['Date'].count()
percent_profit_long = round(profit_long / hist_max.sum() * 100, 2)
display(profit_long)

Here is the print of the hist_max df

after_12_max_time
0 days 12:00:00    24
0 days 12:30:00     5
0 days 13:00:00     7
0 days 13:30:00     5
0 days 14:00:00     5
0 days 14:30:00     4
0 days 15:00:00     4
0 days 15:30:00     1
0 days 16:00:00     5
0 days 16:30:00     7
0 days 17:00:00     1
0 days 17:30:00     6
0 days 18:00:00     1
0 days 18:30:00     1
0 days 19:00:00     1
0 days 19:30:00     6
0 days 20:00:00     3
0 days 20:30:00     0
0 days 21:00:00     6
0 days 21:30:00    19
0 days 22:00:00     8
Freq: 30T, Name: Date, dtype: int64

and this is profit_long

after_12_max_time
0 days 12:03:00     8
0 days 12:33:00     4
0 days 13:03:00     5
0 days 13:33:00     4
0 days 14:03:00     5
0 days 14:33:00     4
0 days 15:03:00     3
0 days 15:33:00     2
0 days 16:03:00     5
0 days 16:33:00     6
0 days 17:03:00     2
0 days 17:33:00     5
0 days 18:03:00     1
0 days 18:33:00     2
0 days 19:03:00     0
0 days 19:33:00     5
0 days 20:03:00     3
0 days 20:33:00     0
0 days 21:03:00     6
0 days 21:33:00    21
0 days 22:03:00     3
Freq: 30T, Name: Date, dtype: int64

The CSV file with just the pertinent columns can be downloaded from this link.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs TriageIssue that has not been reviewed by a pandas team memberUsage Question

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions