Description
Describe the feature
CachedSupplier used by various other AWS components to cache the results from expensive AWS operations, such as fetching STS credentials. It should be able to provide some amount of "negative cacheing" on exceptions to be able to limit how much load can be pressed against systems like STS. Otherwise, under high load, there will effectively be an unending torrent of requests, as the Lock will queue up requests to be attempted immediately after the previous request failed
Use Case
In our situation (large enterprise AWS account), it wraps STS:AssumeRole. On the happy path with 2xxes, it works great, but we've found that it is over-eager to hammer APIs on exceptions. In our case, it was 403 because a role was misconfigured.
Typical use of STS:AssumeRole is to use the StsAssumeRoleCredentialsProvider, inserted into a given Aws Client (in our case Kinesis). When there is no entry/no non-stale entry, every request will get passed through to the CachedSupplier. The existing lock is most effective when there are a few slow requests, and if total throughput/waits on the request including backoffs is less than five seconds. If the cumulative time per request, times number of requests, exceeds five seconds, the lock basically becomes a no-op and allows unfettered hammering of the APIs.
As the cached supplier's concurrency controls start to fail under high, continuous request rates, common AWS failure strategies like retry+backoff+jitter start to fail as each parallel request lacks context about how many other of these expensive requests are failing elsewhere.
We could solve this with our own circuit breakers around any client that incorporates StsAssumeRoleCredentialsProvider but I believe it would be more effective to stem the requests at a more base level.
Proposed Solution
master...dbottini:aws-sdk-java-v2:dbottini/cached-supplier-caches-exceptions
I have created a branch with a proposed solution that will briefly cache exceptions when there is no non-stale value.
Other Information
No response
Acknowledgements
- I may be able to implement this feature request
- This feature might incur a breaking change
AWS Java SDK version used
2.25.7
JDK version used
Corretto-21.0.3.9.1 (build 21.0.3+9-LTS)
Operating System and version
macOS Sonoma 14.5 (23F79)