Skip to content

[sql39]: Actions schemas, queries and SQL store impl #1072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 2, 2025

Conversation

ellemouton
Copy link
Member

@ellemouton ellemouton commented May 13, 2025

This is the final PR for part 1 of #917 🎉

In this PR, we introduce the schema for the actions table. See the diagram in #917 for how this fits in with regard
to our other tables (namely sessions & accounts).

We then implement the SQL version of the Actions store using the new queries.

Finally, our dev build is updated to use the SQL version of the actions table.

@ellemouton ellemouton self-assigned this May 13, 2025
@ellemouton ellemouton marked this pull request as draft May 13, 2025 12:59
@ellemouton ellemouton added the no-changelog This PR is does not require a release notes entry label May 13, 2025
@ellemouton ellemouton force-pushed the sql39 branch 2 times, most recently from 00a6ae4 to 7324726 Compare May 22, 2025 11:54
@ellemouton ellemouton marked this pull request as ready for review May 22, 2025 11:54
Copy link
Contributor

@ViktorTigerstrom ViktorTigerstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this looks great, really awesome work 🔥🚀!!

Leaving some comments below, where I think the most substantial comment is in regards to the actual usage of the indexes.

OFFSET @num_offset;

-- name: ListActions :many
SELECT a.*
Copy link
Contributor

@ViktorTigerstrom ViktorTigerstrom May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I'm not sure that the indexes will end up getting with these queries, due to the "OR IS NULL" clauses for all of the index filter params. It requires some more investigation if it's the case or not, but if this is an issue, we should consider if it's worth creating separate queries that's being used when for example the sessionID is not nil at the call site. That way, we'd ensure that the indexes actually gets used.

Just for some reference, I copied the table + this query into GPT and asked if the indexes will gets used. I'm posting the response below:

gpt convo


Because each filter is written as

(column = :param OR :param IS NULL)

PostgreSQL’s planner can only turn that into an index‐based lookup when it can prove, at plan‐time, that :param is non‐NULL and that the condition simplifies to a simple column = constant. As soon as you wrap it in an OR … IS NULL, you introduce uncertainty:

  1. For a single invocation with literal parameters

    • If you prepare the statement and bind, say, session_id = 42, then

      (a.session_id = 42 OR 42 IS NULL)

      is semantically equivalent to a.session_id = 42, so in principle the actions_session_id_idx can be used for that clause.

  2. When the planner builds a “generic” plan

    • PostgreSQL distinguishes custom plans (where it knows your parameter values at plan‐time) from generic plans (where it has to pick one plan for all future binds).
    • By default, after a few executions it switches to a generic plan, where it doesn’t know whether each :param IS NULL will turn on or off the filter. In that case it almost always chooses a sequential scan, because it can’t assume you’ll eliminate most rows.
  3. Multiple optional filters multiply the uncertainty

    • You have six of these (… OR … IS NULL) filters all ANDed together. Even if one of them would by itself use an index, the planner sees the whole WHERE clause as “maybe trivial, maybe selective” and still tends to fall back to seq‐scan.
  4. Dynamic ordering makes it worse

    • The CASE WHEN :reversed … END DESC, CASE WHEN NOT :reversed … END ASC trick also prevents an “index‐only scan in index‐order” on created_at_idx; PostgreSQL won’t match that to a simple ORDER BY created_at ASC/DESC.

Bottom line

  • In custom plans (i.e. if the planner does re-optimize on each bind, or if you force plan_cache_mode = force_custom_plan), then when you supply non-NULL parameters each (col = constant OR constant IS NULL) really does reduce to col = constant, and the corresponding single-column index can be used.

  • In generic plans (the default after a few executions of a prepared statement), the planner assumes the worst (that your filters might all be turned off), so it chooses a sequential scan and then a regular sort.


How to guarantee index usage

  1. Generate the SQL dynamically so you only include WHERE session_id = … clauses for the parameters you actually passed. No ORs, no uncertainty — the planner sees plain equality and happily uses the index.

  2. Use a stored procedure with EXECUTE format(). Inside PL/pgSQL you can build the WHERE list piecewise and then EXECUTE the resulting text; the final plan is always custom.

  3. Split into two code‐paths:

    • One static SQL with all parameters NOT NULL (index usage guaranteed).
    • One static SQL with no filters (trivial SELECT * … ORDER BY …; index usage isn’t needed).
  4. Partial indexes or multicolumn indexes won’t help here unless you know in advance which combinations of columns you’ll filter on.


TL;DR

Will the indexes actually end up getting used?
Not reliably. With the (col = :param OR :param IS NULL) pattern inside a prepared statement, Postgres will usually fall back to a sequential scan and a regular sort—unless you force it to rebuild a custom plan each time or rewrite the query to eliminate the OR … IS NULL out of your WHERE clause.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edited your comment to put the gpt response in a drop down fyi)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so turns out: postgres correctly uses the index here, but sqlite does not - so good catch!

went with creating these queries manually instead

Copy link
Member Author

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ViktorTigerstrom 🙏

OFFSET @num_offset;

-- name: ListActions :many
SELECT a.*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edited your comment to put the gpt response in a drop down fyi)

OFFSET @num_offset;

-- name: ListActions :many
SELECT a.*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so turns out: postgres correctly uses the index here, but sqlite does not - so good catch!

went with creating these queries manually instead

Copy link
Contributor

@bitromortac bitromortac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool and impressive work! So happy to see all schemas implemented and running against unit tests 🎉.

Comment on lines +12 to +20
SessionID sql.NullInt64
AccountID sql.NullInt64
FeatureName sql.NullString
ActorName sql.NullString
RpcMethod sql.NullString
State sql.NullInt16
EndTime sql.NullTime
StartTime sql.NullTime
GroupID sql.NullInt64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something that could be cool would be to add a unit test that uses reflection and field tagging to check that field names and types are a subset of and consistent with models.Action, such that any changes in the schema could be detected. one could tag EndTime, StartTime, and GroupID as not being part of Action

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existing unit tests would break no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure every field is covered in unit tests, sqlc does all the renaming automatically if a field's name changes, but we would have to manually change the representations in the query and would only fail once we add a unit test. Was just an idea, but not a blocker :)

@ellemouton ellemouton requested a review from bitromortac May 28, 2025 12:14
@ellemouton ellemouton changed the base branch from sql40 to master May 29, 2025 05:04
In this commit we define the schema for the `actions` table along with
various queries we will need for interacting with the table. NOTE: we
will also add some of our own queries manually in commits to follow.
Copy link
Contributor

@bitromortac bitromortac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

Here, we manually define some queries for the actions store. We do this
so that we can manually build the "SELECT" and only add "WHERE" clauses
that are actually needed for the query and hence ensure that available
indexes are used.
In this commit we let the firewalldb.SQLDB implement the ActionsDB. We
also ensure that all the action unit tests now run against the SQL impl.
Copy link
Contributor

@ViktorTigerstrom ViktorTigerstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice changes!! Awesome job 🎉🔥

@@ -0,0 +1,210 @@
package sqlc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice approach with the addition of this file 🎉🔥!

Comment on lines +185 to +188
if errReason != "" && state != ActionStateError {
return fmt.Errorf("error reason should only be set for " +
"ActionStateError")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: As a follow-up to @bitromortac's feedback below, we could potentially also check that an errReason is actually set when state == ActionStateError, though that may be a bit dangerous to require that the middleware interceptor sets an error for all responses marked as "errors". So therefore I think it's probably safer to leave it as is.

@ellemouton ellemouton merged commit cad67fc into lightninglabs:master Jun 2, 2025
22 checks passed
@ellemouton ellemouton deleted the sql39 branch June 2, 2025 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog This PR is does not require a release notes entry sql-ize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants