Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add a multi-model smoke test workflow to detect breaking changes to core GPTScript features.
The smoke test runner uses
gpt-4o
to perform a fuzzy equality check between existing golden files and event stream files generated at runtime.To add a new test case, just create a new directory and the GPTScript you want to test; e.g.
pkg/tests/smoke/testdata/<test-case>/<script>.gpt
, set yourGPTSCRIPT_DEFAULT_MODEL
(and the respective auth environment variables), and run thesmoke
make target. This will generate the initial, model-specific, golden file in thepkg/tests/smoke/testdata/<test-case>
directory. Successive runs will then reference this file for comparison.In order for the workflow to run for PRs from external contributors -- that aren't members of the
gptscript-ai
org -- a member must first add therun-smoke
label to the PR. This gate gives members a chance to review the PR first and ensure that it doesn't contain code that would compromise the org's GitHub secrets.Note: The tests are sparse right now, but I'd like to get the framework in first to make sure it gels with folks and all the labeling safeguards work in the real repo since I've been testing against my fork. If that sounds good, I'll follow up with more test cases immediately.