Skip to content

Add a fast-path for equality checking when the backing is empty or pointer equal #1322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions Benchmarks/Benchmarks/Essentials/BenchmarkEssentials.swift
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,91 @@ let benchmarks = {
assert(u1 != u2)
}
}

// MARK: Data

func createSomeData(_ length: Int) -> Data {
var d = Data(repeating: 42, count: length)
// Set a byte to be another value just so we know we have a unique pointer to the backing
// For maximum inefficiency in the not equal case, set the last byte
d[length - 1] = UInt8.random(in: UInt8.min..<UInt8.max)
return d
}

/// A box `Data`. Intentionally turns the value type into a reference, so we can make a promise that the inner value is not copied due to mutation during a test of insertion or replacing.
class TwoDatasBox {
var d1: Data
var d2: Data

init(d1: Data, d2: Data) {
self.d1 = d1
self.d2 = d2
}
}

// MARK: -

Benchmark("DataEqualEmpty", closure: { benchmark, box in
blackHole(box.d1 == box.d2)
}, setup: { () -> TwoDatasBox in
let d1 = Data()
let d2 = d1
let box = TwoDatasBox(d1: d1, d2: d2)
return box
})

Benchmark("DataEqualInline", closure: { benchmark, box in
blackHole(box.d1 == box.d2)
}, setup: { () -> TwoDatasBox in
let d1 = createSomeData(12) // Less than size of InlineData.Buffer
let d2 = d1
let box = TwoDatasBox(d1: d1, d2: d2)
return box
})

Benchmark("DataNotEqualInline", closure: { benchmark, box in
blackHole(box.d1 != box.d2)
}, setup: { () -> TwoDatasBox in
let d1 = createSomeData(12) // Less than size of InlineData.Buffer
let d2 = createSomeData(12)
let box = TwoDatasBox(d1: d1, d2: d2)
return box
})

Benchmark("DataEqualLarge", closure: { benchmark, box in
blackHole(box.d1 == box.d2)
}, setup: { () -> TwoDatasBox in
let d1 = createSomeData(1024 * 8)
let d2 = d1
let box = TwoDatasBox(d1: d1, d2: d2)
return box
})

Benchmark("DataNotEqualLarge", closure: { benchmark, box in
blackHole(box.d1 != box.d2)
}, setup: { () -> TwoDatasBox in
let d1 = createSomeData(1024 * 8)
let d2 = createSomeData(1024 * 8)
let box = TwoDatasBox(d1: d1, d2: d2)
return box
})

Benchmark("DataEqualReallyLarge", closure: { benchmark, box in
blackHole(box.d1 == box.d2)
}, setup: { () -> TwoDatasBox in
let d1 = createSomeData(1024 * 1024 * 8)
let d2 = d1
let box = TwoDatasBox(d1: d1, d2: d2)
return box
})

Benchmark("DataNotEqualReallyLarge", closure: { benchmark, box in
blackHole(box.d1 != box.d2)
}, setup: { () -> TwoDatasBox in
let d1 = createSomeData(1024 * 1024 * 8)
let d2 = createSomeData(1024 * 1024 * 8)
let box = TwoDatasBox(d1: d1, d2: d2)
return box
})

}
26 changes: 24 additions & 2 deletions Sources/FoundationEssentials/Data/Data.swift
Original file line number Diff line number Diff line change
Expand Up @@ -2709,14 +2709,36 @@ public struct Data : Equatable, Hashable, RandomAccessCollection, MutableCollect
/// Returns `true` if the two `Data` arguments are equal.
@inlinable // This is @inlinable as emission into clients is safe -- the concept of equality on Data will not change.
public static func ==(d1 : Data, d2 : Data) -> Bool {
// See if both are empty
switch (d1._representation, d2._representation) {
case (.empty, .empty):
return true
default:
// Continue on to checks below
break
}

let length1 = d1.count
if length1 != d2.count {
let length2 = d2.count

// Unequal length data can never be equal
guard length1 == length2 else {
return false
}

if length1 > 0 {
return d1.withUnsafeBytes { (b1: UnsafeRawBufferPointer) in
return d2.withUnsafeBytes { (b2: UnsafeRawBufferPointer) in
return memcmp(b1.baseAddress!, b2.baseAddress!, b2.count) == 0
// If they have the same base address and same count, it is equal
let b1Address = b1.baseAddress!
let b2Address = b2.baseAddress!

guard b1Address != b2Address else {
return true
}
Comment on lines +2736 to +2738

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parkera Ahh… very nice!

This looks like an impactful win and I don't want to block this from landing… but I did have some more "long tail" questions about what kind of runtime and space complexity we might be looking at from this approach.

Do what extent (if any) do we know with probability one that withUnsafeBytes can return in constant time and constant space? If withUnsafeBytes is ever performing linear work… this diff is not a "regression" (because we are currently shipping linear work in production anyway). But we are potentially losing a little bit of extra performance.

This comment from @lorentey might have some clues:

https://forums.swift.org/t/how-to-check-two-array-instances-for-identity-equality-in-constant-time/78792/61

[T]he withUnsafeBytes(of:) invocations […] often result in binaries that make several full copies of the input arguments. A clever enough optimizer might get rid of those, but I do not think ours is clever enough for that -- at least not in general.)

For a little extra context… I would eventually like to pitch a public method on Data that exposes "identity equality" on its own without falling back to a linear time comparison. That method should return in constant time… if it ever must perform a linear amount of work to determine identity equality then it's no longer impactful to have that extra method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'm not sure about there is if withUnsafeBytes ever has to move bytes into some temporary storage. I don't believe that is currently the case, because Data is contiguous (although DataProtocol is not).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular use of withUnsafeBytes is from the type that owns its own implementation—advice about the use of such APIs in general are not so applicable because we're not solely relying on the API guarantee; rather, the function call (which is inlinable, unless I'm mistaken) is just code organization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. For Data, this is always O(1).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Data, this is always O(1).

Ahh… good to know! Thanks!

A documentation nit at that point would be to actually add a Complexity callout to Data.withUnsafeBytes reminding engineers this should operate in constant time and space.


// Compare the contents
return memcmp(b1Address, b2Address, b2.count) == 0
}
}
}
Expand Down