Skip to content

Workflow Stuck Waiting for Greylisted Event #722

Open
@dflor003

Description

@dflor003

Hi, we have an interesting scenario where we need to coordinate events across multiple workflows and we are having some issues where events are getting greylisted and sometimes never get processed.

First, let me describe our setup:

We have 2 different types of workflows, lets say the first one is called CoordinatorWorkflow and the second is called SubTaskWorkflow. For a given task, there will be exactly 1 CoordinatorWorkflow spun up and N SubTaskWorkflows.

When CoordinatorWorkflow is spun up, it knows how many SubTaskWorkflows there are that correspond to it and has a unique identifier for each of the SubTaskWorkflow in the "set".

As an example, let's say we have a set of 2 SubTaskWorkflows identified by SubtaskA and SubtaskB that both start around the same time and CoordinatorWorkflow is passed a set of ["SubTaskA", "SubTaskB"].

The very first thing that CoordinatorWorkflow does is go into a for loop on ["SubTaskA", "SubTaskB"] and waits for an event of type SubTaskFinished with the key being each of the identifiers SubTaskA and SubTaskB.

Eventually each of the SubTaskWorkflows gets to a certain point and publishes an event of type SubTaskFinished with its identifier (SubTaskA or SubTaskB) as the key. At this point they wait for an event of type CoordinationFinished.

Once CoordinatorWorkflow received the events from each of the SubTaskWorkflow, it then proceeds to do some work and then fires off a CoordinationFinished event and completes.

Each SubTaskWorkflow then gets the CoordinationFinished event and then proceeds to do some more work and completes.

This is roughly the flow we are trying to achieve and have gotten most of the way there (Note: These aren't the actual names of the workflows, but I've tried to make them somewhat generic and domain-agnostic to simplify it).

The problem we are getting, however, is that at some point in the process, we get stuck waiting for events that never arrive. The events do indeed get published but we see a bunch of messages in the logs like the following and the workflows waiting for the events never have a chance to process them.

[16:06:44 DBG] Got greylisted event evt:{Id}

Any ideas as to what we may be doing wrong? Is this a known issue? If so, any work arounds?

Here are a few other observations that my team saw while troubleshooting this:

  • This happens sporadically (but fairly frequently) when running locally with 1 workflow node, but we do not see this in our development and production environments where we run 2 and 4 workflow engine nodes respectively. I have a feeling this is because IGreyList is registered as an in-memory singleton and the other workflow processes don't have it grey listed so they are free to pick up those events.
  • We recently added some integration tests around the process outlined above and we also ran into this issue. Work around in the integration tests was to provide our own fake implementation of IGreyList that basically ignored any items prefixed with evt.
  • I did some digging in the code and it looks like while there is something that inserts events into the grey list, there doesn't seem to be anything that removes events from the greylist.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions