Closed
Description
Currently the entire transaction is continually retied for 24 hours according to #1009.
I think this could get dataflow jobs to stuck or run for long time. Also if the transaction finally get retried successfully after 23 hours, it doesn't give any error or information for customer to debug the issue from their end.
For example, I have saw one case, transactions have been retried for 24 hours again and again due to following error and dataflow jobs gets stuck due to this :
Error message from worker: com.google.cloud.spanner.AbortedException: ABORTED: io.grpc.StatusRuntimeException: ABORTED: Transaction was aborted. Idle for over 10 seconds. retry_delay { seconds: 254 nanos: 433062132 }
Does it make sense support custom retry and timeout settings on the transaction retry?