Skip to content

PrintVariables produces config file that ReadConfigFile does not properly read #3943

Open
@Balearica

Description

@Balearica

Expected Behavior:

The API function PrintVariables prints current parameters to a file, and ReadConfigFile reads parameters from a file. Intuitively, ReadConfigFile should be able to read the files that PrintVariables writes. This is explicitly assumed within the ProcessPage function, where these functions are used together to "Save current config variables before switching modes" and then "Restore saved config variables".

tesseract/src/api/baseapi.cpp

Lines 1293 to 1306 in a873553

// Save current config variables before switching modes.
FILE *fp = fopen(kOldVarsFile, "wb");
if (fp == nullptr) {
tprintf("Error, failed to open file \"%s\"\n", kOldVarsFile);
} else {
PrintVariables(fp);
fclose(fp);
}
// Switch to alternate mode for retry.
ReadConfigFile(retry_config);
SetImage(pix);
Recognize(nullptr);
// Restore saved config variables.
ReadConfigFile(kOldVarsFile);

Current Behavior:

Unfortunately, this does not currently work properly. The issue is that PrintVariables prints parameter descriptions alongside key/value pairs (e.g. chs_trailing_punct1 ).,;:?! 1st Trailing punctuation), and ReadConfigFile reads the description as a value (for string parameters). An example showing this is below.

#include <tesseract/baseapi.h>

int main()
{
    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
    if (api->Init(NULL, "eng")) {
        fprintf(stderr, "Could not initialize tesseract.\n");
        exit(1);
    }

    static const char *kOldVarsFile = "failed_vars.txt";

    // Print default value of chs_trailing_punct1
    printf("Initial value: %s\n", api->GetStringVariable("chs_trailing_punct1"));
    FILE *fp = fopen(kOldVarsFile, "wb");
    api->PrintVariables(fp);
    fclose(fp);
    api->ReadConfigFile(kOldVarsFile);
    printf("After PrintVariables/ReadConfigFile: %s\n", api->GetStringVariable("chs_trailing_punct1"));

    api->End();
    delete api;
    return 0;
}

This returns the following:

Initial value: ).,;:?!
After PrintVariables/ReadConfigFile: ).,;:?!	1st Trailing punctuation

The impact of this is:

  1. ProcessPage does not work correctly when used with retry_config
  2. There is no simple interface for generating a config file with the user's current settings
    1. This is useful for saving/restoring configurations (as ProcessPage attempts to do)

Suggested Fix:

The simplest solution would be to remove the descriptions from the PrintVariables output (or at least hide that behavior behind an option). I can write a PR if others agree this makes sense. Editing ReadConfigFile to ignore the descriptions is likely also possible, but could be higher effort.


Environment

Tesseract Version: 5.2.0
Commit Number: 15200c6
Platform: Linux ubuntu 5.15.0-43-generic

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions