Skip to content

[OpenMP] Fix endian bug in hierarchical barrier code #117073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

uweigand
Copy link
Member

The "oncore" flavor of the hierarchical barrier code uses a single uint64 to hold both a barrier state and a set of flags for the current node's "leaf kids" in the tree.

To make this work, the barrier state is placed in the least-significant byte of the uint64, and the flags for leaf kids are placed in other bytes of the uint64
starting from the most-significant byte to represent the first child.

At least, this is how it works on little-endian systems. On big-endian hosts, the current code unfortunately places the leaf kid flags starting from the least significant byte of the uint64, where they will overlap and clobber the barrier state byte, resulting in
corrupted barrier behavior.

To fix this, this PR changes the child flag offsets to always start with the MSB, on both big- and little- endian hosts. This was complicated a bit by the following issues:

  • The byte offset was stored "off-by-one" for some reason. This makes it impossible to represent the MSB on big- endian systems (i.e. byte 0, which would have to be represented by -1 if offset by one, which does not fit into the uint8 offset field).

    Fixed by storing and using the "offset" field as the actual byte number, not off-by-one.

  • The "offset" field was never used for the master node, but it was still computed. That computed value was also invalid as it pointed outside the target uint64, but that didn't matter as it was unused. For big-endian systems, however, the computation would have resulted in a negative value, which again cannot be stored in the uint8 field.

    Fixed by skipping the (unused) computation on the master node and setting the "offset" field to a dummy value.

This is intended to be NFC on little-endian machines.

Fixes: #116215

@llvmbot llvmbot added the openmp:libomp OpenMP host runtime label Nov 20, 2024
Copy link

github-actions bot commented Nov 20, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

The "oncore" flavor of the hierarchical barrier code uses a
single uint64 to hold both a barrier state *and* a set
of flags for the current node's "leaf kids" in the tree.

To make this work, the barrier state is placed in the
least-significant byte of the uint64, and the flags for
leaf kids are placed in other bytes of the uint64
starting from the most-significant byte to represent
the first child.

At least, this is how it works on little-endian systems.
On big-endian hosts, the current code unfortunately
places the leaf kid flags starting from the *least*
significant byte of the uint64, where they will overlap
and clobber the barrier state byte, resulting in
corrupted barrier behavior.

To fix this, this PR changes the child flag offsets to
always start with the MSB, on both big- and little-
endian hosts.  This was complicated a bit by the following
issues:

- The byte offset was stored "off-by-one" for some reason.
  This makes it impossible to represent the MSB on big-
  endian systems (i.e. byte 0, which would have to be
  represented by -1 if offset by one, which does not
  fit into the uint8 offset field).

  Fixed by storing and using the "offset" field as the
  actual byte number, not off-by-one.

- The "offset" field was never used for the master node,
  but it was still computed.  That computed value was also
  invalid as it pointed outside the target uint64, but that
  didn't matter as it was unused.  For big-endian systems,
  however, the computation would have resulted in a
  *negative* value, which again cannot be stored in the uint8
  field.

  Fixed by skipping the (unused) computation on the master
  node and setting the "offset" field to a dummy value.

This is intended to be NFC on little-endian machines.

Fixes: llvm#116215
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openmp:libomp OpenMP host runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libomp tests on s390x sometimes extremely slow
2 participants