Skip to content

Implement a proper shell tokenizer for the SDK #60341

Open
@osa1

Description

@osa1

In the SDK we have many places where we parse command line arguments, process them, and serialize them again.

Currently we don't have a proper shell argument tokenizer and serializer, and we use various incorrect and incomplete parsers and serializers.

Here are some of the places we parse or serialize shell arguments:

  • In the test runner, when parsing extra compiler options in tests like // dart2wasmOptions = ...:

    List<String> _splitWords(String s) =>
    s.split(' ')..removeWhere((s) => s.isEmpty);
    List<T> _parseOption<T>(
    String filePath, String contents, String name, T Function(String) convert,
    {bool allowMultiple = false}) {
    var matches = RegExp('^[ \t]*// $name=(.*)', multiLine: true)
    .allMatches(contents)
    .toList();
    if (!allowMultiple && matches.length > 1) {
    throw FormatException('More than one "// $name=" line in test $filePath');
    }
    var options = <T>[];
    for (var match in matches) {
    for (var option in _splitWords(match[1]!)) {
    options.add(convert(option));
    }
    }
    return options;
    }

    This code uses s.split(' ') to parse arguments, which splits arguments enclosed in double or single quotes too.

  • In the test runner, when writing command line arguments to a batch compiler's stdin, this code joins the arguments with arguments.join(' '):

    String _createArgumentsLine(List<String> arguments, int timeout) {
    if (_useJson) {
    return "${jsonEncode(arguments)}\n";
    } else {
    return '${arguments.join(' ')}\n';
    }
    }

    If I have a single argument with whitespaces like a b and then another argument c, this code turns these two arguments into three arguments a b c.

  • ddc then parses the arguments generated in the code above using line.split(RegExp(r'\s+')):

    var args = batchArgs.merge(line.split(RegExp(r'\s+')));

    So even if the test runner generates arguments properly as something like "a b" c, ddc doesn't parse them correctly.

    This means it's currently impossible to parse arguments with whitespace to ddc in the test runner.

  • Test runner, when showing the reproduction instructions when a test fails, uses another incorrect shell argument serializer:

    /// This function is pretty stupid and only puts quotes around an argument if
    /// it the argument contains a space.
    String escapeCommandLineArgument(String argument) {
    if (argument.contains(' ')) {
    return '"$argument"';
    }
    return argument;
    }

  • (dart2js does the same thing in the batch mode compiler, but it uses a fancier (more correct) parser: https://github.com/dart-lang/sdk/blob/391b9eeda43fa2c12692b504c9ab9be9ab25a0ad/pkg/compiler/lib/src/util/command_line.dart)

  • Not SDK, but pub also has its own shell argument serializer which is incorrect: https://github.com/dart-lang/pub/blob/d86e3c979a3889fed61b68dae9f9156d0891704d/lib/src/command.dart#L225-L231

    The shell tokenizer we implement could be used in pub as well.

We should implement a proper library for this, and reuse it everywhere.

Note: some of the linked code above will be changed/improved by https://dart-review.googlesource.com/c/sdk/+/415280.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-dart-cliUse area-dart-cli for issues related to the 'dart' command like tool.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions