You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[OpenMP] Fix endian bug in hierarchical barrier code
The "oncore" flavor of the hierarchical barrier code uses a
single uint64 to hold both a barrier state *and* a set
of flags for the current node's "leaf kids" in the tree.
To make this work, the barrier state is placed in the
least-significant byte of the uint64, and the flags for
leaf kids are placed in other bytes of the uint64
starting from the most-significant byte to represent
the first child.
At least, this is how it works on little-endian systems.
On big-endian hosts, the current code unfortunately
places the leaf kid flags starting from the *least*
significant byte of the uint64, where they will overlap
and clobber the barrier state byte, resulting in
corrupted barrier behavior.
To fix this, this PR changes the child flag offsets to
always start with the MSB, on both big- and little-
endian hosts. This was complicated a bit by the following
issues:
- The byte offset was stored "off-by-one" for some reason.
This makes it impossible to represent the MSB on big-
endian systems (i.e. byte 0, which would have to be
represented by -1 if offset by one, which does not
fit into the uint8 offset field).
Fixed by storing and using the "offset" field as the
actual byte number, not off-by-one.
- The "offset" field was never used for the master node,
but it was still computed. That computed value was also
invalid as it pointed outside the target uint64, but that
didn't matter as it was unused. For big-endian systems,
however, the computation would have resulted in a
*negative* value, which again cannot be stored in the uint8
field.
Fixed by skipping the (unused) computation on the master
node and setting the "offset" field to a dummy value.
This is intended to be NFC on little-endian machines.
Fixes: #116215
0 commit comments