Skip to content

A miscompilation bug in LICMPass (concurrency) #64188

Open
@sunghwanl

Description

@sunghwanl
A1:  int* bug(int N, atomic<int>& a) {
A2:    int* p = (int*)malloc(sizeof(int));
A3:    
A4:    for (int i = 1; i <= N; i++) {
A5:      *p = i;
A6:      foo(i, a);
A7:    }
A8:  
A9:    return p;
A10: }

LICMPass of Clang++ (13.0.0+) optimizes the above function into the following (C++ syntax is used for brevity, see godbolt for the IR code):

B1:  int* bug(int N, atomic<int>& a) {
B2:    int* p = (int*)malloc(sizeof(int));
B3:    
B4:    for (int i = 1; i <= N; i++) {
B5:      foo(i, a);
B6:    }
B7:  
B8:    *p = N;
B9:  
B10:   return p;
B11: }

This is a miscompilation bug. By compiling and running a concurrent program that involves the bug function on Armv8 processors Apple M1 and Cavium ThunderX2, we actually observed certain program behaviors forbidden by the original program before the optimization. The actual program we tested can be found here. (The forbidden behavior can be observed couple of times throughout millions~billions of program runs. The test code includes some intricate crafts, e.g., that helps repeat the test many times.)

In the following, we explain a simplified version of the concurrent program we used and show why the above LICM is a miscompilation bug.

To see a buggy execution, let's consider two threads where the first thread calls the bug function.

Thread 1:

T1.1: void thread1(int N, atomic<int>& a, atomic<int*>& X) {
T1.2:   int* p = bug(N, a);
T1.3:   X.store(p, memory_order_relaxed);
T1.4: }

Thread 2:

T2.1: void thread2(int N, atomic<int>& a, atomic<int*>& X) {
T2.2:   while (a.load(memory_order_acquire) < N);
T2.3:   int *p = X.load(memory_order_relaxed);
T2.4:   if (p != NULL) {
T2.5:     if (*p != N) {
T2.6:       cout << "bug!!!" << endl;
T2.7:     }
T2.8:   }
T2.9: }

Also suppose that the foo function is given as follows:

F1: void foo(int i, atomic<int>& a) {
F2:   a.store(i, memory_order_release);
F3: }

Now, if we create two threads as follows, the program should never print bug!!! before the problematic LICM in the bug function.

atomic<int> a;
atomic<int*> X;
int N = 1;
a.store(0, memory_order_relaxed);
X.store(NULL, memory_order_relaxed);
thread* th1 = new thread(thread1, N, ref(a), ref(X));
thread* th2 = new thread(thread2, N, ref(a), ref(X));

Indeed, once thread2 reads N from a in an acquire access mode (line T2.2), the last call to the foo function (line A6, writing N to a with a release mode) synchronizes with the acquire read. Therefore, the last write to p (line A5, *p = N) by thread1 happens-before the read from *p (line T2.5, *p != N) by thread2, and thread2 can only read N from *p. Note that there is no data race in this program (before the LICM).

However, after the above LICM, the program becomes racy on *p, and thread2 can read a value other than N from *p. We actually observed this program printing "bug!!!" by compiling and running it on Armv8 processors Apple M1 and ThunderX2!

Roughly, the above LICM is wrong because it moves a non-atomic write *p = i (line A5) after a release write (line A6). This is wrong even though p is never leaked before it is returned. Specifically, LLVM applies the above LICM and inlines the bug function in thread1, resulting in the following code:

    ...
O1: for (int i = 1; i <= N; i++) {
O2:   foo(i, a);
O3: }
O4:
O5: *p = N;
O6: X.store(p, memory_order_relaxed);
    ...

Now, the write to p (line O5) is not properly synchronized anymore, and thread2 can read an uninitialized value from p (line T2.5). In particular, such behavior is allowed by the Arm architecture because it can reorder the two independent stores O5 and O6. Therefore, this program can print "bug!!!", which is originally forbidden.

We conclude that the pointer p in the function bug should not be treated as completely local even before it is leaked (line A9).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions