[OpenMP] Fix endian bug in hierarchical barrier code #117073
Open
+23
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The "oncore" flavor of the hierarchical barrier code uses a single uint64 to hold both a barrier state and a set of flags for the current node's "leaf kids" in the tree.
To make this work, the barrier state is placed in the least-significant byte of the uint64, and the flags for leaf kids are placed in other bytes of the uint64
starting from the most-significant byte to represent the first child.
At least, this is how it works on little-endian systems. On big-endian hosts, the current code unfortunately places the leaf kid flags starting from the least significant byte of the uint64, where they will overlap and clobber the barrier state byte, resulting in
corrupted barrier behavior.
To fix this, this PR changes the child flag offsets to always start with the MSB, on both big- and little- endian hosts. This was complicated a bit by the following issues:
The byte offset was stored "off-by-one" for some reason. This makes it impossible to represent the MSB on big- endian systems (i.e. byte 0, which would have to be represented by -1 if offset by one, which does not fit into the uint8 offset field).
Fixed by storing and using the "offset" field as the actual byte number, not off-by-one.
The "offset" field was never used for the master node, but it was still computed. That computed value was also invalid as it pointed outside the target uint64, but that didn't matter as it was unused. For big-endian systems, however, the computation would have resulted in a negative value, which again cannot be stored in the uint8 field.
Fixed by skipping the (unused) computation on the master node and setting the "offset" field to a dummy value.
This is intended to be NFC on little-endian machines.
Fixes: #116215