Skip to content

Add a fast-path for equality checking when the backing is empty or pointer equal #1322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 29, 2025

Conversation

parkera
Copy link
Contributor

@parkera parkera commented May 28, 2025

let b1Address = b1.baseAddress!
let b2Address = b2.baseAddress!

if b1Address == b2Address {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For stylistic consistency with your edit above, could guard here and early exit :)

@parkera
Copy link
Contributor Author

parkera commented May 28, 2025

Simplified benchmark output:

DataEqualEmpty
╒═══════════════════════════╤═════════╤═════════╕
│ Metric                    │      p0 │ Samples │
╞═══════════════════════════╪═════════╪═════════╡
│ Throughput (# / s) (G)    │      24 │ 1593814 │
╘═══════════════════════════╧═════════╧═════════╛

DataEqualInline
╒═══════════════════════════╤═════════╤═════════╕
│ Metric                    │      p0 │ Samples │
╞═══════════════════════════╪═════════╪═════════╡
│ Throughput (# / s) (G)    │      24 │ 1620545 │
╘═══════════════════════════╧═════════╧═════════╛

DataEqualLarge
╒═══════════════════════════╤═════════╤═════════╕
│ Metric                    │      p0 │ Samples │
╞═══════════════════════════╪═════════╪═════════╡
│ Throughput (# / s) (G)    │      24 │ 1678276 │
╘═══════════════════════════╧═════════╧═════════╛

DataEqualReallyLarge
╒═══════════════════════════╤═════════╤═════════╕
│ Metric                    │      p0 │ Samples │
╞═══════════════════════════╪═════════╪═════════╡
│ Throughput (# / s) (G)    │      24 │ 1649187 │
╘═══════════════════════════╧═════════╧═════════╛

DataNotEqualInline
╒═══════════════════════════╤═════════╤═════════╕
│ Metric                    │      p0 │ Samples │
╞═══════════════════════════╪═════════╪═════════╡
│ Throughput (# / s) (G)    │      24 │ 1623817 │
╘═══════════════════════════╧═════════╧═════════╛

DataNotEqualLarge
╒═══════════════════════════╤═════════╤═════════╕
│ Metric                    │      p0 │ Samples │
╞═══════════════════════════╪═════════╪═════════╡
│ Throughput (# / s) (M)    │    8000 │ 1560961 │
╘═══════════════════════════╧═════════╧═════════╛

DataNotEqualReallyLarge
╒═══════════════════════════╤═════════╤═════════╕
│ Metric                    │      p0 │ Samples │
╞═══════════════════════════╪═════════╪═════════╡
│ Throughput (# / s) (K)    │    7545 │   19471 │
╘═══════════════════════════╧═════════╧═════════╛

@parkera
Copy link
Contributor Author

parkera commented May 28, 2025

@swift-ci test

@parkera
Copy link
Contributor Author

parkera commented May 28, 2025

You can see from the benchmark output above that the time is about the same now for the three differently-sized datas that are equal, and slows for larger data that is not equal (note the difference in scaling).

@parkera parkera force-pushed the parkera/data_equal_efficiency branch from 962d303 to 7ea7b31 Compare May 28, 2025 18:01
@parkera
Copy link
Contributor Author

parkera commented May 28, 2025

@swift-ci test

Comment on lines +2736 to +2738
guard b1Address != b2Address else {
return true
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parkera Ahh… very nice!

This looks like an impactful win and I don't want to block this from landing… but I did have some more "long tail" questions about what kind of runtime and space complexity we might be looking at from this approach.

Do what extent (if any) do we know with probability one that withUnsafeBytes can return in constant time and constant space? If withUnsafeBytes is ever performing linear work… this diff is not a "regression" (because we are currently shipping linear work in production anyway). But we are potentially losing a little bit of extra performance.

This comment from @lorentey might have some clues:

https://forums.swift.org/t/how-to-check-two-array-instances-for-identity-equality-in-constant-time/78792/61

[T]he withUnsafeBytes(of:) invocations […] often result in binaries that make several full copies of the input arguments. A clever enough optimizer might get rid of those, but I do not think ours is clever enough for that -- at least not in general.)

For a little extra context… I would eventually like to pitch a public method on Data that exposes "identity equality" on its own without falling back to a linear time comparison. That method should return in constant time… if it ever must perform a linear amount of work to determine identity equality then it's no longer impactful to have that extra method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'm not sure about there is if withUnsafeBytes ever has to move bytes into some temporary storage. I don't believe that is currently the case, because Data is contiguous (although DataProtocol is not).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular use of withUnsafeBytes is from the type that owns its own implementation—advice about the use of such APIs in general are not so applicable because we're not solely relying on the API guarantee; rather, the function call (which is inlinable, unless I'm mistaken) is just code organization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. For Data, this is always O(1).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Data, this is always O(1).

Ahh… good to know! Thanks!

A documentation nit at that point would be to actually add a Complexity callout to Data.withUnsafeBytes reminding engineers this should operate in constant time and space.

@parkera
Copy link
Contributor Author

parkera commented May 28, 2025

@swift-ci test

@parkera parkera merged commit 5db7965 into swiftlang:main May 29, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants