Save performance metrics #137
Description
It would be good to have saved performance metrics that we can use to understand the changes. As mentioned in #36 these metrics can be saved in Session
or in Context
.
Some of the data we can save is:
- Time spent executing nodes
- Memory consumption before and after node execution
- Cache hit/miss
- Child nodes to pinpoint slower paths
The proposal is adding a Performance
interface to base session or context that can be used to save this data. It's done as an interface so we can have several implementations, some can get more info than others or disable collecting data in case it's too expensive.
type Performance interface {
// Called once per query to reset the counters
Reset()
// Marks the start of one node execution, returns an id as there can be several
// nodes running at the same time
Start(nodeName, arguments string) (id int)
// Marks the start of one node execution
End(id int)
// This can be used to save other values like cache hit and miss rates
ExtraData(id int, name string, value int)
// Gets the collected data in some structured format like json
GetData() string
}
Then all the nodes have to be instrumented to save this data:
func (ib *IsBinary) Eval(
ctx *sql.Context,
row sql.Row,
) (interface{}, error) {
// Start collecting data
perID := ctx.Performance.Start("IsBinary", ib.String())
defer ctx.Performance.End(perID)
v, err := ib.Child.Eval(ctx, row)
[...]
blob, err := sql.Blob.Convert(v)
if err != nil {
return nil, err
}
// Add extra information that may be useful
ctx.Performance.ExtraData(perID, "blob size", blob.Size())
return isBinary(blob.([]byte)), nil
}
Extra tooling should be created to read and analyze this data. It would also be interesting to automate running tests with old and new code (for example master
and PR) and comparing the data to find regressions.
Thoughts? @ajnavarro, @erizocosmico, @mcarmonaa