Skip to content

perf(aci): Load slow conditions for all DataConditionGroups at once #92494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

kcons
Copy link
Member

@kcons kcons commented May 29, 2025

Add new slow condition helper to do 1 query instead of len(DataConditionGroups) queries, and use it in delayed_workflow.
Saves (2*num_dcgs - 2) fast (~2ms) queries per delayed_workflow. Not a lot, but easy to trim.

@kcons kcons requested a review from a team as a code owner May 29, 2025 18:46
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label May 29, 2025
Copy link
Member

@cathteng cathteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty 🙏

Copy link
Contributor

@saponifi3d saponifi3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - but might be better to have a caching layer / manager + a batch fetcher for all conditions. (then we can reuse that a lot in the processing methods -- we could probably really reduce the number of queries there 😓)

Comment on lines 49 to 51
conditions = DataCondition.objects.filter(
condition_group__in=dcg_ids,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a caching layer to this to reduce the number of fetches? i was thinking it'd be cool to have a method like this to fetch all the condition groups -- then we could reuse that method here and just have the bits from 53 onward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that'd be great, but I'll first have to learn how to do that.
A custom Manager for DataCondition that offers aDataCondition.objects.get_slow_conditions_for_groups(Sequence[DataConditionGroup]) with caching based on DCG ID?
Does that work with non-PK ids? Seems hard to invalidate.

oh, I think you're saying a @cache_func_for_models(get_conditions_for_dcgs).. that seems easier, and it'd hit once per run as the code currently is.. I just don't have any sense of how many cache hits we'd see outside of a single task.. could also potentially use get_data_conditions_for_group and trade off more queries on cache miss for more generality?

Copy link
Member

@wedamija wedamija May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if you could extend cache_func_for_models to provide a way to batch pass parameters to it.

Basically, at the moment, it caches for a single param, like
get_data_conditions_for_group(data_condition_group_id: int)

The decorator could potentially add a batch call function to this, kind of like how we have task.delay(<args>).

So here, you'd end up calling get_data_conditions_for_group.batch([(1, ), (503), ...]). Then it would batch fetch values from the cache, and for any misses it would bulk fetch them and fill the individual caches using the function call. This still means that the db calls are serial, but it should be ok I think, since most of the time they should actually be cached.

You might be able to change how it works so that we can also batch fetch using the query, but that might be a bit more work...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've made a batched variant. The way the original does arg/kwarg abstraction doesn't really play well with batching, so I had to simplify the types there to make it reasonable (I think it's better that way).
This now unifies our datacondition caching, and I think has basically the caching properties we're looking for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of something more like #92593, so we don't have to decorate things separately

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with something like that, but I wanted to avoid N queries in a batch interface if I could, and from there it ultimately seemed easier to just have a batch thing than to try to support both in the same decorator. I was doing the callable-also-with-a-method thing, but pared it down to just the callable to keep it as simple as possible.
But if cache hit rates are high enough (I don't know) seems like extending the existing type is more than good enough and avoids a New Thing.
I must admit I didn't try to support batch param tuple invocations because some weeks ago I wasted a few hours trying to understand a hole in mypy's analysis of that sort of thing that lead to our code typing it as Any in some cases; there are some holes in the analysis there that I couldn't remember the details of and preferred to sidestep.

cache_invalidators: list[tuple[type[S], Callable[[S], Key]]],
cache_ttl: timedelta | None = None,
recalculate: bool = True,
) -> Callable[[Callable[[list[Key]], list[R]]], Callable[[list[Key]], list[R]]]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further reflection, it should probably be a Callable[[set[Key]], dict[Key, R]].
Makes the singleton case maybe a little weirder, but avoids oddness with duplicates, is easier to undertand, and realistically anyone calling this is going to have to do a mapping, so it's probably also more convenient.

@kcons
Copy link
Member Author

kcons commented May 30, 2025

ok then, I'm going to wait for #92593 and use that to make this change trivial.

Copy link

codecov bot commented May 30, 2025

Codecov Report

Attention: Patch coverage is 96.15385% with 5 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/utils/function_cache.py 88.37% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #92494       +/-   ##
===========================================
+ Coverage   46.25%   87.91%   +41.65%     
===========================================
  Files       10216    10239       +23     
  Lines      585495   587401     +1906     
  Branches    22785    22785               
===========================================
+ Hits       270820   516394   +245574     
+ Misses     314245    70577   -243668     
  Partials      430      430               

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants