Skip to content

Overhaul quantification optimizations #710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Feb 7, 2024

Conversation

milseman
Copy link
Member

  • Effectively one code implementation for all optimized (non-reluctant) quantifications
  • Specializations via inlining
  • Non-mutating methods on String (easier to test, optimize, and could be future API fodder)
  • Set up for further improvements

Supersedes #706

@milseman
Copy link
Member Author

=== Regressions ======================================================================
- CompilerMessages_All                    92.4ms	91.5ms	842µs		0.9%
- CompilerMessages_All_Scalar             75.6ms	74.8ms	815µs		1.1%
- BasicCCC_All_Scalar                     6.35ms	6.02ms	331µs		5.5%
- BasicCCC_All                            6.35ms	6.03ms	324µs		5.4%
- BasicRangeCCC_All_Scalar                6.8ms	6.5ms	301µs		4.6%
- CaseInsensitiveCCC_All                  6.75ms	6.46ms	297µs		4.6%
- BasicRangeCCC_All                       6.8ms	6.5ms	296µs		4.5%
- CaseInsensitiveCCC_All_Scalar           6.75ms	6.46ms	289µs		4.5%
- BasicBuiltinCharacterClass_All_Scalar   7.44ms	7.24ms	200µs		2.8%
- ReluctantQuant_Whole_Scalar             9.31ms	9.16ms	153µs		1.7%
- Lines_All                               1.79ms	1.74ms	53µs		3.0%
- Lines_All_Scalar                        1.75ms	1.7ms	50.3µs		3.0%
- MACAddress                              2.43ms	2.4ms	35.2µs		1.5%
- MACAddress_Scalar                       2.3ms	2.27ms	23µs		1.0%
- IPv4Address_Scalar                      2.07ms	2.06ms	14.1µs		0.7%
=== Improvements =====================================================================
- symDiffCCC_All                          56.9ms	58.2ms	-1.28ms		-2.2%
- symDiffCCC_All_Scalar                   56.9ms	58.2ms	-1.26ms		-2.2%
- SubtractionCCC_All_Scalar               20.9ms	22ms	-1.16ms		-5.3%
- IntersectionCCC_All_Scalar              21.8ms	22.9ms	-1.09ms		-4.8%
- SubtractionCCC_All                      20.9ms	21.9ms	-1.07ms		-4.9%
- IntersectionCCC_All                     21.8ms	22.8ms	-1.05ms		-4.6%
- EmailLookahead_All                      18.4ms	19.4ms	-1.02ms		-5.3%
- EmailLookahead_All_Scalar               16.2ms	17.2ms	-1ms		-5.8%
- AnchoredNotFound_All                    13.6ms	14.5ms	-909µs		-6.3%
- DiceRollsInText_All                     41ms	41.8ms	-746µs		-1.8%
- AnchoredNotFound_All_Scalar             8.96ms	9.67ms	-706µs		-7.3%
- AnchoredNotFound_First_Scalar           5.32ms	5.98ms	-656µs		-11.0%
- DiceRollsInText_All_Scalar              38.6ms	39.2ms	-652µs		-1.7%
- EmailRFC_All                            63.3ms	63.8ms	-478µs		-0.7%
- EagarQuantWithTerminal_Whole            441µs	779µs	-338µs		-43.4%
- EagarQuantWithTerminal_Whole_Scalar     442µs	779µs	-338µs		-43.3%
- EmailLookaheadList                      3.62ms	3.91ms	-293µs		-7.5%
- EmailLookaheadList_Scalar               3.39ms	3.68ms	-290µs		-7.9%
- InvertedCCC_All_Scalar                  16.6ms	16.8ms	-241µs		-1.4%
- AnchoredNotFound_First                  9.61ms	9.79ms	-182µs		-1.9%
- GraphemeBreakNoCap_All_Scalar           2.93ms	3.1ms	-174µs		-5.6%
- NotFound_All                            7.15ms	7.31ms	-160µs		-2.2%
- NotFound_All_Scalar                     6.17ms	6.33ms	-156µs		-2.5%
- LiteralSearch_All_Scalar                5.69ms	5.84ms	-154µs		-2.6%
- GraphemeBreakNoCap_All                  3.07ms	3.21ms	-144µs		-4.5%
- InvertedCCC_All                         16.6ms	16.7ms	-144µs		-0.9%
- BasicBuiltinCharacterClass_All          8.05ms	8.19ms	-141µs		-1.7%
- LiteralSearchNotFound_All_Scalar        5.5ms	5.64ms	-137µs		-2.4%
- LiteralSearch_All                       6.65ms	6.79ms	-133µs		-2.0%
- Words_All_Scalar                        12ms	12.2ms	-130µs		-1.1%
- EmailLookaheadNoMatches_All             22.8ms	22.9ms	-123µs		-0.5%
- IPv6Address_Scalar                      2.29ms	2.4ms	-110µs		-4.6%
- Words_All                               12.5ms	12.6ms	-108µs		-0.9%
- IPv6Address                             2.48ms	2.57ms	-92µs		-3.6%
- LiteralSearchNotFound_All               6.5ms	6.59ms	-90.1µs		-1.4%
- DiceNotation                            4.47ms	4.55ms	-80.5µs		-1.8%
- EmojiRegex_All_Scalar                   40.4ms	40.5ms	-75.8µs		-0.2%
- DiceNotation_Scalar                     4.23ms	4.3ms	-66.9µs		-1.6%
- Numbers_All_Scalar                      5.21ms	5.27ms	-61.6µs		-1.2%
- HangulSyllable_All_Scalar               5.97ms	6.03ms	-54.1µs		-0.9%
- Numbers_All                             5.94ms	5.99ms	-51.9µs		-0.9%
- HangulSyllable_All                      6.71ms	6.77ms	-50.5µs		-0.7%
- Css_All                                 3.23ms	3.27ms	-39.3µs		-1.2%
- HangulSyllable_First_Scalar             2.89ms	2.93ms	-30.1µs		-1.0%
- Css_All_Scalar                          2.88ms	2.91ms	-29.6µs		-1.0%
- EmailBuiltinCharacterClass_All          10.3ms	10.3ms	-18.3µs		-0.2%

@milseman
Copy link
Member Author

@swift-ci please test

Copy link
Member

@natecook1000 natecook1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions, otherwise looks good!

@@ -603,9 +602,9 @@ extension AST.Atom.CharacterProperty {
if p(input, bounds) != nil { return nil }

// TODO: bounds check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the story here — does the bounds check always happen before the consume function is called? Or is there a post-advancement check to make sure we haven't gone beyond the bounds of the search range?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't change the behavior. Let's open an issue for that

limitedBy end: Index,
anyMatchesNewline: Bool,
isScalarSemantics: Bool
) -> Index? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still making distinctions between the . literal syntax and other "any" designations at this point in the engine? For example, CharacterClass.any and CharacterClass.anyNonNewline are unaffected by the (?s) option, and always do what they say they should. I thought that we had already handled those distinctions during code gen (e.g. by calling emitAny vs emitAnyNonNewline).

If this method is just for the . literal, does it need to account for the NSRE-compatible-dot mode? In that case, a full \r\n pair needs to be consumed even when using scalar semantics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the leaf function which provides the implementation. The anyMatchesNewline parameter dictates what the behavior is, and is at least part of the quantification payload.

signalFailure()
return false
}
if let savePointRange {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(!produceSavePointRange) here would make sure we don't accidentally return a save point range when instructed not to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Catfish-Man could you take over and see about that assertion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I figure out my build issues, yup :)


while true {
if maxExtraTrips == 0 { break }
return (currentPosition, rangeStart..<rangeEnd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the assertion above, should this be Range(uncheckedBounds:) instead of ..<?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would work.

@Catfish-Man Catfish-Man merged commit 50b2f91 into swiftlang:main Feb 7, 2024
@milseman milseman deleted the overhual_builtin_quant branch February 7, 2024 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants