Skip to content

Performance issue with long lines #8

Closed
@llimllib

Description

@llimllib

Initial checklist

Affected packages and versions

at least the latest version

Link to runnable example

https://gist.github.com/llimllib/4b87fd4042359b4812350038562dba03

Steps to reproduce

  1. Run the test.mjs file from the gist above in the root of this repository. Its output on my machine is:
$ node test.mjs
parsing a file with breaks: 32.809ms
parsing a file without breaks: 10.127s

In this test case, the "file with breaks" is 8000 lines of 100 a characters. The "file without breaks" is 800,000 'a' characters without line breaks, so actually a slightly smaller file but with no line breaks.

here's an observable notebook that demonstrates the behavior, you can play around with the regular expression there and see how performance looks

Expected behavior

parsing a file with a long line should not have such severe performance effects.

(In the terms that affect my team, our customers should not be able to easily DoS us by providing markdown files with long lines)

Actual behavior

The findEmail regular expression here takes a very long time to parse a long line.

(I think this is due to using nested quantifiers, but I am not an expert here)

I noticed this when investigating a slowdown in our app, developed a test case, and profiled it. This regular expression jumped out:

image

Affected runtime and version

node 20.16.0

Affected package manager and version

all

Affected OS and version

all

Build and bundle tools

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions