Skip to content

Use slices instead of cursors #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 5, 2022
Merged

Use slices instead of cursors #83

merged 11 commits into from
Mar 5, 2022

Conversation

chtenb
Copy link
Member

@chtenb chtenb commented Feb 17, 2022

Proof of concept to fix #77. This change would be breaking because it changes the underlying representation.

  • Use slices instead of cursors
  • Do a startsWith check in the string parser instead of a call to indexOf
  • Make the CodePoint parsers anyDigit, anyLetter and derivatives use the CodeUnit version of anyChar for speed
  • Added the change to the changelog's "Unreleased" section with a link to this PR and your username
  • Linked any existing issues or proposals that this pull request should close
Benchmark before Benchmark after
StringParser.runParser parse23AnyCharPoints StringParser.runParser parse23AnyCharPoints
mean = 1.01 s mean = 16.14 ms
stddev = 51.48 ms stddev = 13.64 ms
min = 963.28 ms min = 8.55 ms
max = 1.16 s max = 53.29 ms
StringParser.runParser parse23AnyCharUnits StringParser.runParser parse23AnyCharUnits
mean = 8.53 ms mean = 8.73 ms
stddev = 2.77 ms stddev = 2.95 ms
min = 7.25 ms min = 7.24 ms
max = 38.72 ms max = 40.49 ms
StringParser.runParser parse23DigitPoints StringParser.runParser parse23DigitPoints
mean = 994.93 ms mean = 10.02 ms
stddev = 16.97 ms stddev = 1.61 ms
min = 974.33 ms min = 8.65 ms
max = 1.04 s max = 23.21 ms
StringParser.runParser parse23DigitUnits StringParser.runParser parse23DigitUnits
mean = 10.87 ms mean = 10.28 ms
stddev = 1.78 ms stddev = 1.56 ms
min = 9.51 ms min = 8.71 ms
max = 23.81 ms max = 22.51 ms
StringParser.runParser parse23StringPoints StringParser.runParser parse23StringPoints
mean = 1.06 s mean = 6.91 ms
stddev = 16.93 ms stddev = 954.29 μs
min = 1.01 s min = 5.80 ms
max = 1.09 s max = 15.04 ms
StringParser.runParser parse23StringUnits StringParser.runParser parse23StringUnits
mean = 4.52 ms mean = 4.40 ms
stddev = 958.15 μs stddev = 757.47 μs
min = 3.81 ms min = 3.61 ms
max = 13.09 ms max = 8.46 ms
StringParser.runParser parse23RegexPoints StringParser.runParser parse23RegexPoints
mean = 1.03 s mean = 11.77 ms
stddev = 44.87 ms stddev = 2.11 ms
min = 978.55 ms min = 9.76 ms
max = 1.15 s max = 28.49 ms
StringParser.runParser parse23RegexUnits StringParser.runParser parse23RegexUnits
mean = 5.90 ms mean = 5.88 ms
stddev = 900.33 μs stddev = 1.05 ms
min = 4.89 ms min = 4.93 ms
max = 12.99 ms max = 16.71 m

@chtenb
Copy link
Member Author

chtenb commented Feb 19, 2022

We can close purescript/purescript-strings#155 once this PR gets merged

@chtenb chtenb marked this pull request as ready for review February 26, 2022 14:31
@chtenb chtenb requested a review from jamesdbrock February 26, 2022 14:31
@jamesdbrock
Copy link
Member

Nice, I will look at this.

Copy link
Member

@jamesdbrock jamesdbrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a breaking change, besides changing the names of the record fields in PosString?

Nothing -> Left { pos, error: "CodePoint " <> show cp <> " is not a character" }
Nothing -> Left { pos, error: "Unexpected EOF" }
Just chr -> Right { result: chr, suffix: { substring: SCP.drop 1 substring, position: position + 1 } }
Nothing -> Left { pos: position, error: "CodePoint " <> show cp <> " is not a character" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we're in the CodePoints parser, but I think this says that any astral CodePoint is not allowed? Is that what we want?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean it's true that because this parser returns Char there is no way for it to parse an astral CodePoint.

But I think this will surprise and confuse people? Maybe the CodePoints parsing module should not include this parser?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with you. I'm in favour of exporting an anyCodePoint instead or something in this module. I think the Char type is not a useful abstraction in PureScript (at least, it shouldn't be), but perhaps we should also keep supporting the anyChar parser as long as Char is a thing? I think it also depends on what direction purescript/purescript#3662 is taking.

@chtenb
Copy link
Member Author

chtenb commented Feb 27, 2022

Why is this a breaking change, besides changing the names of the record fields in PosString?

Because the Parser type has changed and it is part of the public API of this package. If people have written a parser in terms of this definition, their code will have to be adjusted. Or does that not qualify as a breaking change?

@jamesdbrock
Copy link
Member

Because the Parser type has changed and it is part of the public API of this package.

Sure, that makes sense. But the only thing that has changed about the type of the Parser is the record field names in PosString, right?

If it were up to me, I think I would do the work you've already done, and then also

  1. add anyCodePoint :: Parser CodePoint
  2. Document that anyChar will not always succeed, like here https://pursuit.purescript.org/packages/purescript-parsing/8.2.0/docs/Text.Parsing.Parser.String#v:anyChar

@chtenb
Copy link
Member Author

chtenb commented Feb 28, 2022

But the only thing that has changed about the type of the Parser is the record field names in PosString, right?

Yes

I'll make an issue for anyCodePoint

@chtenb chtenb merged commit 04fdd95 into main Mar 5, 2022
@chtenb chtenb deleted the slicing branch March 5, 2022 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

StringParser.CodePoints quadratic scaling
2 participants