-
Notifications
You must be signed in to change notification settings - Fork 102
[sql39]: Actions schemas, queries and SQL store impl #1072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
00a6ae4
to
7324726
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally this looks great, really awesome work 🔥🚀!!
Leaving some comments below, where I think the most substantial comment is in regards to the actual usage of the indexes.
db/sqlc/queries/actions.sql
Outdated
OFFSET @num_offset; | ||
|
||
-- name: ListActions :many | ||
SELECT a.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I'm not sure that the indexes will end up getting with these queries, due to the "OR IS NULL"
clauses for all of the index filter params. It requires some more investigation if it's the case or not, but if this is an issue, we should consider if it's worth creating separate queries that's being used when for example the sessionID
is not nil
at the call site. That way, we'd ensure that the indexes actually gets used.
Just for some reference, I copied the table + this query into GPT and asked if the indexes will gets used. I'm posting the response below:
gpt convo
Because each filter is written as
(column = :param OR :param IS NULL)
PostgreSQL’s planner can only turn that into an index‐based lookup when it can prove, at plan‐time, that :param
is non‐NULL and that the condition simplifies to a simple column = constant
. As soon as you wrap it in an OR … IS NULL
, you introduce uncertainty:
-
For a single invocation with literal parameters
-
If you prepare the statement and bind, say,
session_id = 42
, then(a.session_id = 42 OR 42 IS NULL)
is semantically equivalent to
a.session_id = 42
, so in principle theactions_session_id_idx
can be used for that clause.
-
-
When the planner builds a “generic” plan
- PostgreSQL distinguishes custom plans (where it knows your parameter values at plan‐time) from generic plans (where it has to pick one plan for all future binds).
- By default, after a few executions it switches to a generic plan, where it doesn’t know whether each
:param IS NULL
will turn on or off the filter. In that case it almost always chooses a sequential scan, because it can’t assume you’ll eliminate most rows.
-
Multiple optional filters multiply the uncertainty
- You have six of these
(… OR … IS NULL)
filters all ANDed together. Even if one of them would by itself use an index, the planner sees the whole WHERE clause as “maybe trivial, maybe selective” and still tends to fall back to seq‐scan.
- You have six of these
-
Dynamic ordering makes it worse
- The
CASE WHEN :reversed … END DESC, CASE WHEN NOT :reversed … END ASC
trick also prevents an “index‐only scan in index‐order” oncreated_at_idx
; PostgreSQL won’t match that to a simpleORDER BY created_at ASC/DESC
.
- The
Bottom line
-
In custom plans (i.e. if the planner does re-optimize on each bind, or if you force
plan_cache_mode = force_custom_plan
), then when you supply non-NULL parameters each(col = constant OR constant IS NULL)
really does reduce tocol = constant
, and the corresponding single-column index can be used. -
In generic plans (the default after a few executions of a prepared statement), the planner assumes the worst (that your filters might all be turned off), so it chooses a sequential scan and then a regular sort.
How to guarantee index usage
-
Generate the SQL dynamically so you only include
WHERE session_id = …
clauses for the parameters you actually passed. No ORs, no uncertainty — the planner sees plain equality and happily uses the index. -
Use a stored procedure with
EXECUTE format()
. Inside PL/pgSQL you can build the WHERE list piecewise and thenEXECUTE
the resulting text; the final plan is always custom. -
Split into two code‐paths:
- One static SQL with all parameters NOT NULL (index usage guaranteed).
- One static SQL with no filters (trivial
SELECT * … ORDER BY …
; index usage isn’t needed).
-
Partial indexes or multicolumn indexes won’t help here unless you know in advance which combinations of columns you’ll filter on.
TL;DR
Will the indexes actually end up getting used?
Not reliably. With the(col = :param OR :param IS NULL)
pattern inside a prepared statement, Postgres will usually fall back to a sequential scan and a regular sort—unless you force it to rebuild a custom plan each time or rewrite the query to eliminate theOR … IS NULL
out of your WHERE clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(edited your comment to put the gpt response in a drop down fyi)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so turns out: postgres correctly uses the index here, but sqlite does not - so good catch!
went with creating these queries manually instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ViktorTigerstrom 🙏
db/sqlc/queries/actions.sql
Outdated
OFFSET @num_offset; | ||
|
||
-- name: ListActions :many | ||
SELECT a.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(edited your comment to put the gpt response in a drop down fyi)
db/sqlc/queries/actions.sql
Outdated
OFFSET @num_offset; | ||
|
||
-- name: ListActions :many | ||
SELECT a.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so turns out: postgres correctly uses the index here, but sqlite does not - so good catch!
went with creating these queries manually instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool and impressive work! So happy to see all schemas implemented and running against unit tests 🎉.
SessionID sql.NullInt64 | ||
AccountID sql.NullInt64 | ||
FeatureName sql.NullString | ||
ActorName sql.NullString | ||
RpcMethod sql.NullString | ||
State sql.NullInt16 | ||
EndTime sql.NullTime | ||
StartTime sql.NullTime | ||
GroupID sql.NullInt64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something that could be cool would be to add a unit test that uses reflection and field tagging to check that field names and types are a subset of and consistent with models.Action
, such that any changes in the schema could be detected. one could tag EndTime
, StartTime
, and GroupID
as not being part of Action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
existing unit tests would break no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure every field is covered in unit tests, sqlc does all the renaming automatically if a field's name changes, but we would have to manually change the representations in the query and would only fail once we add a unit test. Was just an idea, but not a blocker :)
In this commit we define the schema for the `actions` table along with various queries we will need for interacting with the table. NOTE: we will also add some of our own queries manually in commits to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Here, we manually define some queries for the actions store. We do this so that we can manually build the "SELECT" and only add "WHERE" clauses that are actually needed for the query and hence ensure that available indexes are used.
In this commit we let the firewalldb.SQLDB implement the ActionsDB. We also ensure that all the action unit tests now run against the SQL impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice changes!! Awesome job 🎉🔥
@@ -0,0 +1,210 @@ | |||
package sqlc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice approach with the addition of this file 🎉🔥!
if errReason != "" && state != ActionStateError { | ||
return fmt.Errorf("error reason should only be set for " + | ||
"ActionStateError") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: As a follow-up to @bitromortac's feedback below, we could potentially also check that an errReason
is actually set when state == ActionStateError
, though that may be a bit dangerous to require that the middleware interceptor sets an error for all responses marked as "errors". So therefore I think it's probably safer to leave it as is.
This is the final PR for part 1 of #917 🎉
In this PR, we introduce the schema for the
actions
table. See the diagram in #917 for how this fits in with regardto our other tables (namely sessions & accounts).
We then implement the SQL version of the Actions store using the new queries.
Finally, our
dev
build is updated to use the SQL version of the actions table.