Description
Expected Behavior:
The API function PrintVariables
prints current parameters to a file, and ReadConfigFile
reads parameters from a file. Intuitively, ReadConfigFile
should be able to read the files that PrintVariables
writes. This is explicitly assumed within the ProcessPage
function, where these functions are used together to "Save current config variables before switching modes" and then "Restore saved config variables".
Lines 1293 to 1306 in a873553
Current Behavior:
Unfortunately, this does not currently work properly. The issue is that PrintVariables
prints parameter descriptions alongside key/value pairs (e.g. chs_trailing_punct1 ).,;:?! 1st Trailing punctuation
), and ReadConfigFile
reads the description as a value (for string parameters). An example showing this is below.
#include <tesseract/baseapi.h>
int main()
{
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
if (api->Init(NULL, "eng")) {
fprintf(stderr, "Could not initialize tesseract.\n");
exit(1);
}
static const char *kOldVarsFile = "failed_vars.txt";
// Print default value of chs_trailing_punct1
printf("Initial value: %s\n", api->GetStringVariable("chs_trailing_punct1"));
FILE *fp = fopen(kOldVarsFile, "wb");
api->PrintVariables(fp);
fclose(fp);
api->ReadConfigFile(kOldVarsFile);
printf("After PrintVariables/ReadConfigFile: %s\n", api->GetStringVariable("chs_trailing_punct1"));
api->End();
delete api;
return 0;
}
This returns the following:
Initial value: ).,;:?!
After PrintVariables/ReadConfigFile: ).,;:?! 1st Trailing punctuation
The impact of this is:
ProcessPage
does not work correctly when used withretry_config
- There is no simple interface for generating a config file with the user's current settings
- This is useful for saving/restoring configurations (as
ProcessPage
attempts to do)
- This is useful for saving/restoring configurations (as
Suggested Fix:
The simplest solution would be to remove the descriptions from the PrintVariables
output (or at least hide that behavior behind an option). I can write a PR if others agree this makes sense. Editing ReadConfigFile
to ignore the descriptions is likely also possible, but could be higher effort.
Environment
Tesseract Version: 5.2.0
Commit Number: 15200c6
Platform: Linux ubuntu 5.15.0-43-generic