Skip to content

CachedSupplier should cache exceptions briefly to limit load on downstream systems #5690

Open
@dbottini

Description

@dbottini

Describe the feature

CachedSupplier used by various other AWS components to cache the results from expensive AWS operations, such as fetching STS credentials. It should be able to provide some amount of "negative cacheing" on exceptions to be able to limit how much load can be pressed against systems like STS. Otherwise, under high load, there will effectively be an unending torrent of requests, as the Lock will queue up requests to be attempted immediately after the previous request failed

Use Case

In our situation (large enterprise AWS account), it wraps STS:AssumeRole. On the happy path with 2xxes, it works great, but we've found that it is over-eager to hammer APIs on exceptions. In our case, it was 403 because a role was misconfigured.
Typical use of STS:AssumeRole is to use the StsAssumeRoleCredentialsProvider, inserted into a given Aws Client (in our case Kinesis). When there is no entry/no non-stale entry, every request will get passed through to the CachedSupplier. The existing lock is most effective when there are a few slow requests, and if total throughput/waits on the request including backoffs is less than five seconds. If the cumulative time per request, times number of requests, exceeds five seconds, the lock basically becomes a no-op and allows unfettered hammering of the APIs.
As the cached supplier's concurrency controls start to fail under high, continuous request rates, common AWS failure strategies like retry+backoff+jitter start to fail as each parallel request lacks context about how many other of these expensive requests are failing elsewhere.
We could solve this with our own circuit breakers around any client that incorporates StsAssumeRoleCredentialsProvider but I believe it would be more effective to stem the requests at a more base level.

Proposed Solution

master...dbottini:aws-sdk-java-v2:dbottini/cached-supplier-caches-exceptions
I have created a branch with a proposed solution that will briefly cache exceptions when there is no non-stale value.

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

AWS Java SDK version used

2.25.7

JDK version used

Corretto-21.0.3.9.1 (build 21.0.3+9-LTS)

Operating System and version

macOS Sonoma 14.5 (23F79)

Metadata

Metadata

Assignees

Labels

cross-sdkfeature-requestA feature should be added or improved.p2This is a standard priority issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions