Closed
Description
Describe the bug
NTH does not retry request over AWS SDK API to retrieve SQS queue message.
Steps to reproduce
Close firewall to SQS AWS endpoint and try to monitor for SQS events.
Expected outcome
The network layer cannot be guaranteed to be reliable so need to implement retry logic here.
Application Logs
WRN There was a problem monitoring for events error="RequestError: send request failed\ncaused by: Post \"https://sqs.us-west-2.amazonaws.com/\": read tcp 100.100.xx.xx:xxxx->10.xx.xx.xx:443: read: connection reset by peer" event_type=SQS_TERMINATE
Environment
- NTH App Version: v1.16.3
- NTH Mode (IMDS/Queue processor): Queue processor
- OS/Arch: Linux
- Kubernetes version: v1.21.14-eks
- Installation method: deployment
The check that denies retry is here
For V1 of AWS SDK the fix should be custom retryer which re-implements should retry, and custom retryer should be injected here, however upgrade to V2 of AWS SKD should fix this issue automatically, because it does not make distinction between different kinds of connection reset and retries them all which is desired behavior here.