Skip to content

Nonoptimal database query #1169

Open
Open
@thefrozenfruit

Description

@thefrozenfruit

Describe the bug

EntityFrameworkPersistenceProvider class methods for retrieving WorkflowInstance do not generate optimal database queries.

As an example, this code generates a query, which leads to duplicate data from Workflow table.

Code:

public async Task<WorkflowInstance> GetWorkflowInstance(string Id, CancellationToken cancellationToken = default)
{
    using (var db = ConstructDbContext())
    {
        var uid = new Guid(Id);
        var raw = await db.Set<PersistedWorkflow>()
            .Where(x => x.InstanceId == uid)
            .Include(wf => wf.ExecutionPointers)
            .ThenInclude(ep => ep.ExtensionAttributes)
            .Include(wf => wf.ExecutionPointers)
            .FirstAsync(cancellationToken);

        if (raw == null)
            return null;

        return raw.ToWorkflowInstance();
    }
}

Query:

SELECT
  [t].[PersistenceId],
  [t].[CompleteTime],
  [t].[CreateTime],
  [t].[Data], -- this is the cause of large dataflow (duplication)
  [t].[Description],
  [t].[InstanceId],
  [t].[NextExecution],
  [t].[Reference],
  [t].[Status],
  [t].[Version],
  [t].[WorkflowDefinitionId],
  [t0].[PersistenceId],
  [t0].[Active],
  [t0].[Children],
  [t0].[ContextItem],
  [t0].[EndTime],
  [t0].[EventData],
  [t0].[EventKey],
  [t0].[EventName],
  [t0].[EventPublished],
  [t0].[Id],
  [t0].[Outcome],
  [t0].[PersistenceData],
  [t0].[PredecessorId],
  [t0].[RetryCount],
  [t0].[Scope],
  [t0].[SleepUntil],
  [t0].[StartTime],
  [t0].[Status],
  [t0].[StepId],
  [t0].[StepName],
  [t0].[WorkflowId],
  [t0].[PersistenceId0],
  [t0].[AttributeKey],
  [t0].[AttributeValue],
  [t0].[ExecutionPointerId]
FROM (
    SELECT TOP(1)
        [w].[PersistenceId],
        [w].[CompleteTime],
        [w].[CreateTime],
        [w].[Data],
        [w].[Description],
        [w].[InstanceId],
        [w].[NextExecution],
        [w].[Reference],
        [w].[Status],
        [w].[Version],
        [w].[WorkflowDefinitionId]
    FROM [wfc].[Workflow] AS [w]
    WHERE [w].[InstanceId] = {uid}
) AS [t]
LEFT JOIN (
    SELECT
        [e].[PersistenceId],
        [e].[Active],
        [e].[Children],
        [e].[ContextItem],
        [e].[EndTime],
        [e].[EventData],
        [e].[EventKey],
        [e].[EventName],
        [e].[EventPublished],
        [e].[Id],
        [e].[Outcome],
        [e].[PersistenceData],
        [e].[PredecessorId],
        [e].[RetryCount],
        [e].[Scope],
        [e].[SleepUntil],
        [e].[StartTime],
        [e].[Status],
        [e].[StepId],
        [e].[StepName],
        [e].[WorkflowId],
        [e0].[PersistenceId] AS [PersistenceId0],
        [e0].[AttributeKey],
        [e0].[AttributeValue],
        [e0].[ExecutionPointerId]
    FROM [wfc].[ExecutionPointer] AS [e]
    LEFT JOIN [wfc].[ExtensionAttribute] AS [e0] ON [e].[PersistenceId] = [e0].[ExecutionPointerId]
) AS [t0] ON [t].[PersistenceId] = [t0].[WorkflowId]
ORDER BY [t].[PersistenceId], [t0].[PersistenceId], [t0].[PersistenceId0];

Our case contains a large amount of information in the Data column. It leads to performance degradation due to the large amount of duplicate data when the query is executed.

To avoid this side effect, you can split the query.

Code:

var raw = await db.Set<PersistedWorkflow>()
    .Where(x => x.InstanceId == uid)
    .Include(wf => wf.ExecutionPointers)
    .ThenInclude(ep => ep.ExtensionAttributes)
    .Include(wf => wf.ExecutionPointers)
    .AsSplitQuery() // new line
    .FirstOrDefaultAsync(cancellationToken);

Queries:

SELECT TOP(1)
    [w].[PersistenceId],
    [w].[CompleteTime],
    [w].[CreateTime],
    [w].[Data],
    [w].[Description],
    [w].[InstanceId],
    [w].[NextExecution],
    [w].[Reference],
    [w].[Status],
    [w].[Version],
    [w].[WorkflowDefinitionId]
FROM [wfc].[Workflow] AS [w]
WHERE [w].[InstanceId] = {uid}
ORDER BY [w].[PersistenceId]
SELECT
    [e0].[PersistenceId],
    [e0].[AttributeKey],
    [e0].[AttributeValue],
    [e0].[ExecutionPointerId],
    [t].[PersistenceId],
    [e].[PersistenceId]
FROM (
    SELECT TOP(1) [w].[PersistenceId]
    FROM [wfc].[Workflow] AS [w]
    WHERE [w].[InstanceId] = {uid}
) AS [t]
INNER JOIN [wfc].[ExecutionPointer] AS [e] ON [t].[PersistenceId] = [e].[WorkflowId]
INNER JOIN [wfc].[ExtensionAttribute] AS [e0] ON [e].[PersistenceId] = [e0].[ExecutionPointerId]
ORDER BY [t].[PersistenceId], [e].[PersistenceId]
SELECT
    [e].[PersistenceId],
    [e].[Active],
    [e].[Children],
    [e].[ContextItem],
    [e].[EndTime],
    [e].[EventData],
    [e].[EventKey],
    [e].[EventName],
    [e].[EventPublished],
    [e].[Id],
    [e].[Outcome],
    [e].[PersistenceData],
    [e].[PredecessorId],
    [e].[RetryCount],
    [e].[Scope],
    [e].[SleepUntil],
    [e].[StartTime],
    [e].[Status],
    [e].[StepId],
    [e].[StepName],
    [e].[WorkflowId],
    [t].[PersistenceId]
FROM (
    SELECT TOP(1) [w].[PersistenceId]
    FROM [wfc].[Workflow] AS [w]
    WHERE [w].[InstanceId] = {uid}
) AS [t]
INNER JOIN [wfc].[ExecutionPointer] AS [e] ON [t].[PersistenceId] = [e].[WorkflowId]
ORDER BY [t].[PersistenceId], [e].[PersistenceId]

Test

From SSMS

[Data] size = 507kB;
Execution pointer count = 1000;

Impl [Data] total size Execution time
left join (current) 507 000 kB ~ 6m36s520ms
split queries 507kB ~ 1s300ms

From external source

Count - number of execution pointers;
DataSize - the size of the information in the Data column;

SingleQuery - single query (current impl);
SplitQuery - split queries;

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.2965/22H2/2022Update)
AMD Ryzen 7 4800HS with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 6.0.11 (6.0.1122.52304), X64 RyuJIT AVX2
  Job-DFXIRM : .NET 6.0.11 (6.0.1122.52304), X64 RyuJIT AVX2

IterationCount=3  LaunchCount=1  WarmupCount=1  
Method Id Count DataSize Mean Error StdDev Ratio Gen0 Gen1 Gen2 Allocated Alloc Ratio
SingleQuery 6A8F3(...)8476E [36] 108 7.71 kB 2,571.3 ms 735.28 ms 40.30 ms 1.00 7000.0000 - - 14.83 MB 1.00
SplitQuery 6A8F3(...)8476E [36] 108 7.71 kB 269.0 ms 4.84 ms 0.27 ms 0.10 500.0000 - - 2.25 MB 0.15
SingleQuery C0EDE(...)29FBD [36] 210 14.81 kB 9,141.0 ms 579.62 ms 31.77 ms 1.00 34000.0000 2000.0000 - 72.47 MB 1.00
SplitQuery C0EDE(...)29FBD [36] 210 14.81 kB 408.3 ms 34.98 ms 1.92 ms 0.04 1000.0000 - - 4.42 MB 0.06
SingleQuery C10E8(...)F4DBE [36] 333 23.38 kB 22,677.3 ms 5,308.82 ms 290.99 ms 1.00 69000.0000 68000.0000 67000.0000 242.66 MB 1.00
SplitQuery C10E8(...)F4DBE [36] 333 23.38 kB 633.7 ms 34.22 ms 1.88 ms 0.03 1000.0000 - - 7.12 MB 0.03
SingleQuery 4B444(...)B85E5 [36] 503 211.38 kB 308,729.0 ms 67,444.26 ms 3,696.85 ms 1.00 4583000.0000 4528000.0000 4453000.0000 22366.06 MB 1.000
SplitQuery 4B444(...)B85E5 [36] 503 211.38 kB 4,273.6 ms 480.21 ms 26.32 ms 0.01 19000.0000 15000.0000 9000.0000 69.83 MB 0.003

Expected behavior

Fetch data without unnecessary degradation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions