Description
I can't reopen this one - #1900 - so I create a new issue.
Suggested workaround doesn't work well in combination with Apache Beam's Spanner IO.
I've been always wondering why it never hits those aborted transaction retry loops in SpannerIO
, the answer is clear: because it never gets thrown out.
It is very limiting in circumstances, where Spanner can't keep up with the write demand that you put on it. It just keeps aborting transactions with longer and longer delays, which eventually lead to the fact that Dataflow workers start to die out, because they never exit the loop and don't respond to status update requests or never respond with the bundle processing result.
I want to be in control of how I handle transaction aborts, but I can't. It is quite surprising that I can't just disable that and do whatever I wish to do with that. I still want to get Spanner's exceptions back to see if they're retryable, or just generally to understand what is the root cause of the issues.
Instead I get a bunch of cryptic Dataflow errors like "worker stopped responding" and only deep in a thread dumps I can see that those were just looping indefinitely in SpannerRetryHelper.