Skip to content

ENH: merge_asof threshold minimum #61164

Open
@Lituchy

Description

@Lituchy

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I often find myself using the merge_asof function on time series data. The tolerance and allow_exact_matches fields are very useful in filtering data, but it would be very useful to have more granular control over this tolerance. Being able to supply a minimum tolerance in addition to the currently existing maximum tolerance would be very beneficial in giving the user more control over this function.

Feature Description

A current example for this function is the following:

We only asof within 10ms between the quote time and the trade time
and we exclude exact matches on time. However prior data will
propagate forward

>>> pd.merge_asof(
...     trades,
...     quotes,
...     on="time",
...     by="ticker",
...     tolerance=pd.Timedelta("10ms"),
...     allow_exact_matches=False
... )
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75     NaN     NaN
1 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048   GOOG  720.77       100     NaN     NaN
3 2016-05-25 13:30:00.048   GOOG  720.92       100     NaN     NaN
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

I am envisining a version where we could have

We only asof within 10ms between the quote time and the trade time but more than 2ms between the quote time and the trade time and we exclude exact matches on time. However prior data will propagate forward

>>> pd.merge_asof(
...     trades,
...     quotes,
...     on="time",
...     by="ticker",
...     tolerance=pd.Timedelta("10ms"),
...     mininum_tolerance=pd.Timedelta("2ms"),
...     allow_exact_matches=False
... )
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75     NaN     NaN
1 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048   GOOG  720.77       100     NaN     NaN
3 2016-05-25 13:30:00.048   GOOG  720.92       100     NaN     NaN
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

Alternative Solutions

Another solution to this problem to augment the currently existing tolerance argument to accept a single datetimelike object, or a tuple of datetmelike objects which could act as a lower and upper bound, respectively.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds DiscussionRequires discussion from core team before further actionNeeds InfoClarification about behavior needed to assess issueReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions