Description
Describe the bug
EntityFrameworkPersistenceProvider class methods for retrieving WorkflowInstance do not generate optimal database queries.
As an example, this code generates a query, which leads to duplicate data from Workflow table.
Code:
public async Task<WorkflowInstance> GetWorkflowInstance(string Id, CancellationToken cancellationToken = default)
{
using (var db = ConstructDbContext())
{
var uid = new Guid(Id);
var raw = await db.Set<PersistedWorkflow>()
.Where(x => x.InstanceId == uid)
.Include(wf => wf.ExecutionPointers)
.ThenInclude(ep => ep.ExtensionAttributes)
.Include(wf => wf.ExecutionPointers)
.FirstAsync(cancellationToken);
if (raw == null)
return null;
return raw.ToWorkflowInstance();
}
}
Query:
SELECT
[t].[PersistenceId],
[t].[CompleteTime],
[t].[CreateTime],
[t].[Data], -- this is the cause of large dataflow (duplication)
[t].[Description],
[t].[InstanceId],
[t].[NextExecution],
[t].[Reference],
[t].[Status],
[t].[Version],
[t].[WorkflowDefinitionId],
[t0].[PersistenceId],
[t0].[Active],
[t0].[Children],
[t0].[ContextItem],
[t0].[EndTime],
[t0].[EventData],
[t0].[EventKey],
[t0].[EventName],
[t0].[EventPublished],
[t0].[Id],
[t0].[Outcome],
[t0].[PersistenceData],
[t0].[PredecessorId],
[t0].[RetryCount],
[t0].[Scope],
[t0].[SleepUntil],
[t0].[StartTime],
[t0].[Status],
[t0].[StepId],
[t0].[StepName],
[t0].[WorkflowId],
[t0].[PersistenceId0],
[t0].[AttributeKey],
[t0].[AttributeValue],
[t0].[ExecutionPointerId]
FROM (
SELECT TOP(1)
[w].[PersistenceId],
[w].[CompleteTime],
[w].[CreateTime],
[w].[Data],
[w].[Description],
[w].[InstanceId],
[w].[NextExecution],
[w].[Reference],
[w].[Status],
[w].[Version],
[w].[WorkflowDefinitionId]
FROM [wfc].[Workflow] AS [w]
WHERE [w].[InstanceId] = {uid}
) AS [t]
LEFT JOIN (
SELECT
[e].[PersistenceId],
[e].[Active],
[e].[Children],
[e].[ContextItem],
[e].[EndTime],
[e].[EventData],
[e].[EventKey],
[e].[EventName],
[e].[EventPublished],
[e].[Id],
[e].[Outcome],
[e].[PersistenceData],
[e].[PredecessorId],
[e].[RetryCount],
[e].[Scope],
[e].[SleepUntil],
[e].[StartTime],
[e].[Status],
[e].[StepId],
[e].[StepName],
[e].[WorkflowId],
[e0].[PersistenceId] AS [PersistenceId0],
[e0].[AttributeKey],
[e0].[AttributeValue],
[e0].[ExecutionPointerId]
FROM [wfc].[ExecutionPointer] AS [e]
LEFT JOIN [wfc].[ExtensionAttribute] AS [e0] ON [e].[PersistenceId] = [e0].[ExecutionPointerId]
) AS [t0] ON [t].[PersistenceId] = [t0].[WorkflowId]
ORDER BY [t].[PersistenceId], [t0].[PersistenceId], [t0].[PersistenceId0];
Our case contains a large amount of information in the Data column. It leads to performance degradation due to the large amount of duplicate data when the query is executed.
To avoid this side effect, you can split the query.
Code:
var raw = await db.Set<PersistedWorkflow>()
.Where(x => x.InstanceId == uid)
.Include(wf => wf.ExecutionPointers)
.ThenInclude(ep => ep.ExtensionAttributes)
.Include(wf => wf.ExecutionPointers)
.AsSplitQuery() // new line
.FirstOrDefaultAsync(cancellationToken);
Queries:
SELECT TOP(1)
[w].[PersistenceId],
[w].[CompleteTime],
[w].[CreateTime],
[w].[Data],
[w].[Description],
[w].[InstanceId],
[w].[NextExecution],
[w].[Reference],
[w].[Status],
[w].[Version],
[w].[WorkflowDefinitionId]
FROM [wfc].[Workflow] AS [w]
WHERE [w].[InstanceId] = {uid}
ORDER BY [w].[PersistenceId]
SELECT
[e0].[PersistenceId],
[e0].[AttributeKey],
[e0].[AttributeValue],
[e0].[ExecutionPointerId],
[t].[PersistenceId],
[e].[PersistenceId]
FROM (
SELECT TOP(1) [w].[PersistenceId]
FROM [wfc].[Workflow] AS [w]
WHERE [w].[InstanceId] = {uid}
) AS [t]
INNER JOIN [wfc].[ExecutionPointer] AS [e] ON [t].[PersistenceId] = [e].[WorkflowId]
INNER JOIN [wfc].[ExtensionAttribute] AS [e0] ON [e].[PersistenceId] = [e0].[ExecutionPointerId]
ORDER BY [t].[PersistenceId], [e].[PersistenceId]
SELECT
[e].[PersistenceId],
[e].[Active],
[e].[Children],
[e].[ContextItem],
[e].[EndTime],
[e].[EventData],
[e].[EventKey],
[e].[EventName],
[e].[EventPublished],
[e].[Id],
[e].[Outcome],
[e].[PersistenceData],
[e].[PredecessorId],
[e].[RetryCount],
[e].[Scope],
[e].[SleepUntil],
[e].[StartTime],
[e].[Status],
[e].[StepId],
[e].[StepName],
[e].[WorkflowId],
[t].[PersistenceId]
FROM (
SELECT TOP(1) [w].[PersistenceId]
FROM [wfc].[Workflow] AS [w]
WHERE [w].[InstanceId] = {uid}
) AS [t]
INNER JOIN [wfc].[ExecutionPointer] AS [e] ON [t].[PersistenceId] = [e].[WorkflowId]
ORDER BY [t].[PersistenceId], [e].[PersistenceId]
Test
From SSMS
[Data] size = 507kB;
Execution pointer count = 1000;
Impl | [Data] total size | Execution time |
---|---|---|
left join (current) | 507 000 kB | ~ 6m36s520ms |
split queries | 507kB | ~ 1s300ms |
From external source
Count - number of execution pointers;
DataSize - the size of the information in the Data column;
SingleQuery - single query (current impl);
SplitQuery - split queries;
BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.2965/22H2/2022Update)
AMD Ryzen 7 4800HS with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
[Host] : .NET 6.0.11 (6.0.1122.52304), X64 RyuJIT AVX2
Job-DFXIRM : .NET 6.0.11 (6.0.1122.52304), X64 RyuJIT AVX2
IterationCount=3 LaunchCount=1 WarmupCount=1
Method | Id | Count | DataSize | Mean | Error | StdDev | Ratio | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SingleQuery | 6A8F3(...)8476E [36] | 108 | 7.71 kB | 2,571.3 ms | 735.28 ms | 40.30 ms | 1.00 | 7000.0000 | - | - | 14.83 MB | 1.00 |
SplitQuery | 6A8F3(...)8476E [36] | 108 | 7.71 kB | 269.0 ms | 4.84 ms | 0.27 ms | 0.10 | 500.0000 | - | - | 2.25 MB | 0.15 |
SingleQuery | C0EDE(...)29FBD [36] | 210 | 14.81 kB | 9,141.0 ms | 579.62 ms | 31.77 ms | 1.00 | 34000.0000 | 2000.0000 | - | 72.47 MB | 1.00 |
SplitQuery | C0EDE(...)29FBD [36] | 210 | 14.81 kB | 408.3 ms | 34.98 ms | 1.92 ms | 0.04 | 1000.0000 | - | - | 4.42 MB | 0.06 |
SingleQuery | C10E8(...)F4DBE [36] | 333 | 23.38 kB | 22,677.3 ms | 5,308.82 ms | 290.99 ms | 1.00 | 69000.0000 | 68000.0000 | 67000.0000 | 242.66 MB | 1.00 |
SplitQuery | C10E8(...)F4DBE [36] | 333 | 23.38 kB | 633.7 ms | 34.22 ms | 1.88 ms | 0.03 | 1000.0000 | - | - | 7.12 MB | 0.03 |
SingleQuery | 4B444(...)B85E5 [36] | 503 | 211.38 kB | 308,729.0 ms | 67,444.26 ms | 3,696.85 ms | 1.00 | 4583000.0000 | 4528000.0000 | 4453000.0000 | 22366.06 MB | 1.000 |
SplitQuery | 4B444(...)B85E5 [36] | 503 | 211.38 kB | 4,273.6 ms | 480.21 ms | 26.32 ms | 0.01 | 19000.0000 | 15000.0000 | 9000.0000 | 69.83 MB | 0.003 |
Expected behavior
Fetch data without unnecessary degradation.