Skip to content

[ADT] Update hash function of uint64_t for DenseMap #95734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 20, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion llvm/include/llvm/ADT/DenseMapInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ template<> struct DenseMapInfo<unsigned long long> {
static inline unsigned long long getTombstoneKey() { return ~0ULL - 1ULL; }

static unsigned getHashValue(const unsigned long long& Val) {
return (unsigned)(Val * 37ULL);
return DenseMapInfo<unsigned>(Val) ^ DenseMapInfo<unsigned>(Val >> 32);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use some code from ADT/Hashing.h? It seems to contain some well-mixing routines. Seems like something like llvm::hash_value(Val) would be enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good too. But I am not a hashing expert so I don't have an opinion here. Let's see what others propose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with the suggestion to use llvm::hash_value from ADT/Hashing.h. It implements a murmur-like algorithm, which mixes bits enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think that DenseMaps hash functions could use some overhaul, I don't think that switching to Hashing.h in this one place would be appropriate. If we want to do that, we should do so for all the DenseMapInfo hashes, after properly analyzing the cost of the more expensive hash vs the benefit of better hash distribution.

We may also have to refactor Hashing.h to reduce the build-time overhead for places that like this that don't need the full infrastructure and its build overhead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good too. But I am not a hashing expert so I don't have an opinion here. Let's see what others propose.

Here is the easy thing to reason about this just in case. ^ mixes very poor: single bit change of input changes only single bit of output. Ideally when we mix / combine hash values (or do similar things) should produce values that are not dependent on the original hash values being combined. This greatly reduce the possibility of collisions from "related" inputs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with using detail::combineHashValue but not use other functions from ADT/Hashing.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should switch all DenseMapInfo for all integer types to llvm::hash_value?
@chandlerc suggested to use it in his response and he does not expect this to cause any problems.
The code would be simple and we will get to reuse it in more places.

I am not sure how much worse the build times will get, but this should be easy to address by having declarations with functions hashing integers in a separate header.
I feel it should not be a blocker to using the hash function that we feel is better otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version should also relieve the immediate pain, but I wonder if other people also feel that using murmur-like hashing is a better long-term option anyway.

}

static bool isEqual(const unsigned long long& LHS,
Expand Down
Loading