Skip to content

No SQS retry on "read: connection reset by peer" #717

Closed
@eugenea

Description

@eugenea

Describe the bug
NTH does not retry request over AWS SDK API to retrieve SQS queue message.

Steps to reproduce
Close firewall to SQS AWS endpoint and try to monitor for SQS events.

Expected outcome
The network layer cannot be guaranteed to be reliable so need to implement retry logic here.

Application Logs

WRN There was a problem monitoring for events error="RequestError: send request failed\ncaused by: Post \"https://sqs.us-west-2.amazonaws.com/\": read tcp 100.100.xx.xx:xxxx->10.xx.xx.xx:443: read: connection reset by peer" event_type=SQS_TERMINATE

Environment

  • NTH App Version: v1.16.3
  • NTH Mode (IMDS/Queue processor): Queue processor
  • OS/Arch: Linux
  • Kubernetes version: v1.21.14-eks
  • Installation method: deployment

The check that denies retry is here
For V1 of AWS SDK the fix should be custom retryer which re-implements should retry, and custom retryer should be injected here, however upgrade to V2 of AWS SKD should fix this issue automatically, because it does not make distinction between different kinds of connection reset and retries them all which is desired behavior here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions