Skip to content

[clang][analyzer] Improved PointerSubChecker #93676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 10, 2024
Merged

Conversation

balazske
Copy link
Collaborator

The checker is made more exact (only pointer into array is allowed) and more tests are added.

The checker is made more exact (only pointer into array is allowed)
and more tests are added.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels May 29, 2024
@llvmbot
Copy link
Member

llvmbot commented May 29, 2024

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balázs Kéri (balazske)

Changes

The checker is made more exact (only pointer into array is allowed) and more tests are added.


Full diff: https://github.com/llvm/llvm-project/pull/93676.diff

3 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp (+20-14)
  • (added) clang/test/Analysis/pointer-sub.c (+74)
  • (modified) clang/test/Analysis/ptr-arith.c (+2-12)
diff --git a/clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp
index 2438cf30b39b5..4caa9ded93ed4 100644
--- a/clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp
@@ -35,7 +35,7 @@ class PointerSubChecker
 void PointerSubChecker::checkPreStmt(const BinaryOperator *B,
                                      CheckerContext &C) const {
   // When doing pointer subtraction, if the two pointers do not point to the
-  // same memory chunk, emit a warning.
+  // same array, emit a warning.
   if (B->getOpcode() != BO_Sub)
     return;
 
@@ -44,24 +44,30 @@ void PointerSubChecker::checkPreStmt(const BinaryOperator *B,
 
   const MemRegion *LR = LV.getAsRegion();
   const MemRegion *RR = RV.getAsRegion();
-
-  if (!(LR && RR))
-    return;
-
-  const MemRegion *BaseLR = LR->getBaseRegion();
-  const MemRegion *BaseRR = RR->getBaseRegion();
-
-  if (BaseLR == BaseRR)
+  if (!LR || !RR)
     return;
 
-  // Allow arithmetic on different symbolic regions.
-  if (isa<SymbolicRegion>(BaseLR) || isa<SymbolicRegion>(BaseRR))
-    return;
+  const auto *ElemLR = dyn_cast<ElementRegion>(LR);
+  const auto *ElemRR = dyn_cast<ElementRegion>(RR);
+  // FIXME: We want to verify that these are elements of an array.
+  // Because behavior of ElementRegion it may be confused with a cast.
+  // There is not a simple way to distinguish it from array element (check the
+  // types?). Because this missing check a warning is missing in the rare case
+  // when two casted pointers to the same region (variable) are subtracted.
+  if (ElemLR && ElemRR) {
+    const MemRegion *SuperLR = ElemLR->getSuperRegion();
+    const MemRegion *SuperRR = ElemRR->getSuperRegion();
+    if (SuperLR == SuperRR)
+      return;
+    // Allow arithmetic on different symbolic regions.
+    if (isa<SymbolicRegion>(SuperLR) || isa<SymbolicRegion>(SuperRR))
+      return;
+  }
 
   if (ExplodedNode *N = C.generateNonFatalErrorNode()) {
     constexpr llvm::StringLiteral Msg =
-        "Subtraction of two pointers that do not point to the same memory "
-        "chunk may cause incorrect result.";
+        "Subtraction of two pointers that do not point into the same array "
+        "is undefined behavior.";
     auto R = std::make_unique<PathSensitiveBugReport>(BT, Msg, N);
     R->addRange(B->getSourceRange());
     C.emitReport(std::move(R));
diff --git a/clang/test/Analysis/pointer-sub.c b/clang/test/Analysis/pointer-sub.c
new file mode 100644
index 0000000000000..010c739d6ad10
--- /dev/null
+++ b/clang/test/Analysis/pointer-sub.c
@@ -0,0 +1,74 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.PointerSub -verify %s
+
+void f1(void) {
+  int x, y, z[10];
+  int d = &y - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
+  d = z - &y; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
+  d = &x - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
+  d = (long*)&x - (long*)&x;
+}
+
+void f2(void) {
+  int a[10], b[10], c;
+  int *p = &a[2];
+  int *q = &a[8];
+  int d = q - p; // no-warning
+
+  q = &b[3];
+  d = q - p; // expected-warning{{Subtraction of two pointers that}}
+
+  q = a + 10;
+  d = q - p; // no warning (use of pointer to one after the end is allowed)
+  d = &a[4] - a; // no warning
+
+  q = a + 11;
+  d = q - a; // ?
+
+  d = &c - p; // expected-warning{{Subtraction of two pointers that}}
+}
+
+void f3(void) {
+  int a[3][4];
+  int d;
+
+  d = &(a[2]) - &(a[1]);
+  d = a[2] - a[1]; // expected-warning{{Subtraction of two pointers that}}
+  d = a[1] - a[1];
+  d = &(a[1][2]) - &(a[1][0]);
+  d = &(a[1][2]) - &(a[0][0]); // expected-warning{{Subtraction of two pointers that}}
+}
+
+void f4(void) {
+  int n = 4, m = 3;
+  int a[n][m];
+  int (*p)[m] = a; // p == &a[0]
+  p += 1; // p == &a[1]
+  int d = p - a; // d == 1 // expected-warning{{subtraction of pointers to type 'int[m]' of zero size has undefined behavior}}
+
+  d = &(a[2]) - &(a[1]); // expected-warning{{subtraction of pointers to type 'int[m]' of zero size has undefined behavior}}
+  d = a[2] - a[1]; // expected-warning{{Subtraction of two pointers that}}
+}
+
+typedef struct {
+  int a;
+  int b;
+  int c[10];
+  int d[10];
+} S;
+
+void f5(void) {
+  S s;
+  int y;
+  int d;
+
+  d = &s.b - &s.a; // expected-warning{{Subtraction of two pointers that}}
+  d = &s.c[0] - &s.a; // expected-warning{{Subtraction of two pointers that}}
+  d = &s.b - &y; // expected-warning{{Subtraction of two pointers that}}
+  d = &s.c[3] - &s.c[2];
+  d = &s.d[3] - &s.c[2]; // expected-warning{{Subtraction of two pointers that}}
+  d = s.d - s.c; // expected-warning{{Subtraction of two pointers that}}
+
+  S sa[10];
+  d = &sa[2] - &sa[1];
+  d = &sa[2].a - &sa[1].b; // expected-warning{{Subtraction of two pointers that}}
+}
diff --git a/clang/test/Analysis/ptr-arith.c b/clang/test/Analysis/ptr-arith.c
index 40c8188704e81..f99dfabb07366 100644
--- a/clang/test/Analysis/ptr-arith.c
+++ b/clang/test/Analysis/ptr-arith.c
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,alpha.core.PointerSub,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple x86_64-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,alpha.core.PointerSub,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple i686-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple x86_64-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple i686-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s
 
 void clang_analyzer_eval(int);
 void clang_analyzer_dump(int);
@@ -35,16 +35,6 @@ domain_port (const char *domain_b, const char *domain_e,
   return port;
 }
 
-void f3(void) {
-  int x, y;
-  int d = &y - &x; // expected-warning{{Subtraction of two pointers that do not point to the same memory chunk may cause incorrect result}}
-
-  int a[10];
-  int *p = &a[2];
-  int *q = &a[8];
-  d = q-p; // no-warning
-}
-
 void f4(void) {
   int *p;
   p = (int*) 0x10000; // expected-warning{{Using a fixed address is not portable because that address will probably not be valid in all environments or platforms}}

@llvmbot
Copy link
Member

llvmbot commented May 29, 2024

@llvm/pr-subscribers-clang

Author: Balázs Kéri (balazske)

Changes

The checker is made more exact (only pointer into array is allowed) and more tests are added.


Full diff: https://github.com/llvm/llvm-project/pull/93676.diff

3 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp (+20-14)
  • (added) clang/test/Analysis/pointer-sub.c (+74)
  • (modified) clang/test/Analysis/ptr-arith.c (+2-12)
diff --git a/clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp
index 2438cf30b39b5..4caa9ded93ed4 100644
--- a/clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp
@@ -35,7 +35,7 @@ class PointerSubChecker
 void PointerSubChecker::checkPreStmt(const BinaryOperator *B,
                                      CheckerContext &C) const {
   // When doing pointer subtraction, if the two pointers do not point to the
-  // same memory chunk, emit a warning.
+  // same array, emit a warning.
   if (B->getOpcode() != BO_Sub)
     return;
 
@@ -44,24 +44,30 @@ void PointerSubChecker::checkPreStmt(const BinaryOperator *B,
 
   const MemRegion *LR = LV.getAsRegion();
   const MemRegion *RR = RV.getAsRegion();
-
-  if (!(LR && RR))
-    return;
-
-  const MemRegion *BaseLR = LR->getBaseRegion();
-  const MemRegion *BaseRR = RR->getBaseRegion();
-
-  if (BaseLR == BaseRR)
+  if (!LR || !RR)
     return;
 
-  // Allow arithmetic on different symbolic regions.
-  if (isa<SymbolicRegion>(BaseLR) || isa<SymbolicRegion>(BaseRR))
-    return;
+  const auto *ElemLR = dyn_cast<ElementRegion>(LR);
+  const auto *ElemRR = dyn_cast<ElementRegion>(RR);
+  // FIXME: We want to verify that these are elements of an array.
+  // Because behavior of ElementRegion it may be confused with a cast.
+  // There is not a simple way to distinguish it from array element (check the
+  // types?). Because this missing check a warning is missing in the rare case
+  // when two casted pointers to the same region (variable) are subtracted.
+  if (ElemLR && ElemRR) {
+    const MemRegion *SuperLR = ElemLR->getSuperRegion();
+    const MemRegion *SuperRR = ElemRR->getSuperRegion();
+    if (SuperLR == SuperRR)
+      return;
+    // Allow arithmetic on different symbolic regions.
+    if (isa<SymbolicRegion>(SuperLR) || isa<SymbolicRegion>(SuperRR))
+      return;
+  }
 
   if (ExplodedNode *N = C.generateNonFatalErrorNode()) {
     constexpr llvm::StringLiteral Msg =
-        "Subtraction of two pointers that do not point to the same memory "
-        "chunk may cause incorrect result.";
+        "Subtraction of two pointers that do not point into the same array "
+        "is undefined behavior.";
     auto R = std::make_unique<PathSensitiveBugReport>(BT, Msg, N);
     R->addRange(B->getSourceRange());
     C.emitReport(std::move(R));
diff --git a/clang/test/Analysis/pointer-sub.c b/clang/test/Analysis/pointer-sub.c
new file mode 100644
index 0000000000000..010c739d6ad10
--- /dev/null
+++ b/clang/test/Analysis/pointer-sub.c
@@ -0,0 +1,74 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.PointerSub -verify %s
+
+void f1(void) {
+  int x, y, z[10];
+  int d = &y - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
+  d = z - &y; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
+  d = &x - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
+  d = (long*)&x - (long*)&x;
+}
+
+void f2(void) {
+  int a[10], b[10], c;
+  int *p = &a[2];
+  int *q = &a[8];
+  int d = q - p; // no-warning
+
+  q = &b[3];
+  d = q - p; // expected-warning{{Subtraction of two pointers that}}
+
+  q = a + 10;
+  d = q - p; // no warning (use of pointer to one after the end is allowed)
+  d = &a[4] - a; // no warning
+
+  q = a + 11;
+  d = q - a; // ?
+
+  d = &c - p; // expected-warning{{Subtraction of two pointers that}}
+}
+
+void f3(void) {
+  int a[3][4];
+  int d;
+
+  d = &(a[2]) - &(a[1]);
+  d = a[2] - a[1]; // expected-warning{{Subtraction of two pointers that}}
+  d = a[1] - a[1];
+  d = &(a[1][2]) - &(a[1][0]);
+  d = &(a[1][2]) - &(a[0][0]); // expected-warning{{Subtraction of two pointers that}}
+}
+
+void f4(void) {
+  int n = 4, m = 3;
+  int a[n][m];
+  int (*p)[m] = a; // p == &a[0]
+  p += 1; // p == &a[1]
+  int d = p - a; // d == 1 // expected-warning{{subtraction of pointers to type 'int[m]' of zero size has undefined behavior}}
+
+  d = &(a[2]) - &(a[1]); // expected-warning{{subtraction of pointers to type 'int[m]' of zero size has undefined behavior}}
+  d = a[2] - a[1]; // expected-warning{{Subtraction of two pointers that}}
+}
+
+typedef struct {
+  int a;
+  int b;
+  int c[10];
+  int d[10];
+} S;
+
+void f5(void) {
+  S s;
+  int y;
+  int d;
+
+  d = &s.b - &s.a; // expected-warning{{Subtraction of two pointers that}}
+  d = &s.c[0] - &s.a; // expected-warning{{Subtraction of two pointers that}}
+  d = &s.b - &y; // expected-warning{{Subtraction of two pointers that}}
+  d = &s.c[3] - &s.c[2];
+  d = &s.d[3] - &s.c[2]; // expected-warning{{Subtraction of two pointers that}}
+  d = s.d - s.c; // expected-warning{{Subtraction of two pointers that}}
+
+  S sa[10];
+  d = &sa[2] - &sa[1];
+  d = &sa[2].a - &sa[1].b; // expected-warning{{Subtraction of two pointers that}}
+}
diff --git a/clang/test/Analysis/ptr-arith.c b/clang/test/Analysis/ptr-arith.c
index 40c8188704e81..f99dfabb07366 100644
--- a/clang/test/Analysis/ptr-arith.c
+++ b/clang/test/Analysis/ptr-arith.c
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,alpha.core.PointerSub,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple x86_64-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,alpha.core.PointerSub,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple i686-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple x86_64-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple i686-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s
 
 void clang_analyzer_eval(int);
 void clang_analyzer_dump(int);
@@ -35,16 +35,6 @@ domain_port (const char *domain_b, const char *domain_e,
   return port;
 }
 
-void f3(void) {
-  int x, y;
-  int d = &y - &x; // expected-warning{{Subtraction of two pointers that do not point to the same memory chunk may cause incorrect result}}
-
-  int a[10];
-  int *p = &a[2];
-  int *q = &a[8];
-  d = q-p; // no-warning
-}
-
 void f4(void) {
   int *p;
   p = (int*) 0x10000; // expected-warning{{Using a fixed address is not portable because that address will probably not be valid in all environments or platforms}}

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to see that you're working on this checker; but unfortunately the language standard is very complicated in this area, so you'll need more complex code to cover it properly.

int x, y, z[10];
int d = &y - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
d = z - &y; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
d = &x - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This corner case is explicitly allowed by the standard: non-array variables act as if they were single-element arrays and it's valid to do (trivial) pointer arithmetic on them.

Even calculating the past-the-end pointer of a non-array object (e.g. &x + 1) is a valid (but unusual) operation; however the description on cppreference suggests that it's UB to take the difference (&x+1)-&x. You should probably look this up in the standard itself if you want to be accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look good, except for &x - &x, which should be valid.
Wrt. &x - 1, that should be valid as well, given that the resulting pointer is not dereferenced. However, clang's constrant interpreter disagrees with gcc's on that case: https://godbolt.org/z/dMd1rMazY
However, like I said, I believe gcc's right by accepting that code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrt. &x - 1, that should be valid as well, given that the resulting pointer is not dereferenced.

The standard explicitly disallows this, see [expr.add] part 4.2 and 4.3 (in the most recent C++ draft standard, other versions may be different).

// Because behavior of ElementRegion it may be confused with a cast.
// There is not a simple way to distinguish it from array element (check the
// types?). Because this missing check a warning is missing in the rare case
// when two casted pointers to the same region (variable) are subtracted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If two pointers point to the same variable, it's valid to subtract them.

By the way, it's also valid to declare a long long variable, point a char * pointer at it (char * may point anywhere) and use it to iterate over the memory region of the long long as if it was a char[8]. (By the way in this case you have ElementRegions and your code behaves correctly.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is not clear what rules to apply.

int a[3][3];
int *b = a;
int d = &(a[2][2]) - &(b[3]);
long long l;
char *a = &l;
short *b = &l;
int d = &a[3] - (char *)&l;
d = (char *)&b[1] - &a[1];
int a[3][3];
int d = &a[2][2] - &a[1][1];

I only found that the operands must point into the same "array object" but not what exactly an "array object" is. Are the sub-arrays of a multidimensional array separate "array objects"? If an address (of a variable) is converted to a char * is this the same array object as the original variable (for example l)? Invalid indexing like int a[3][3]; int x = a[0][4]; is undefined behavior, but why should then be allowed to use pointers to such an object and index it like an one-dimensional array, or convert an array pointer into an array with different type and index it?

We should allow anything that points into the same memory block (that is a variable or any array), or only allow pointers to the same array with same type and same size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed your examples and questions with @whisperity and Zoltán Porkoláb, and I'm trying to summarize what we determined. Note that I'm mostly working from the most recent C++ draft standard (http://eel.is/c++draft/), so some the conclusions may be invalid in C or older versions of C++ (but I tried to highlight the differences that I know about).

(1) In your first example the step int *b = a is invalid because the int[3][3] array a decays to an int (*)[3] (pointer to array of 3 ints) and that type is not interconvertible with a plain int * (see [basic.compound] note 5 in the C++ draft standard). (And if you define b as int (*b)[3] = a, then the pointer subtraction will become invalid.)

(2) In the second example using a long * to initialize char * or short * variables is usually an error, it's accepted under old C, but in C++ you need an explicit cast and e.g. modern GCC also produces an error (-Wincompatible-pointer-types a warning that's by default an error) when it compiles this.

I'd guess that the actual pointer arithmetic is standard-compliant, but the relevant parts of the standard ([expr.reinterpret.cast] 7, [expr.static.cast] 14)are too vague to give a confident answer.

Note that under C++ accessing the element short b[1] would be a violation of [basic.lval] part 11, but based on [defns.access] I'd say that the expression &b[1] is not an access of b[1].

(3) In your third example &a[2][2] - &a[1][1] is clearly working in practically all implementations, but we're fairly sure that it's not compliant with the C++ standard, because [expr.add] part 5.2 speaks about "array elements i and j of the same array object x" (and later "i - j" which shows that "i" and "j" are scalar numbers) -- while in your example a[2][2] and a[1][1] are not (numerically indexed) elements within the same array. (This is coherent with [dcl.array] where an "array" implicitly means a one-dimensional array structure (whose elements may be other arrays).)

Nevertheless, it may be reasonable to avoid emitting warnings on this kind of code, because that could be better for the users.

(4) The definition of "what exactly an "array object" is" and "what is an element of an array" primarily appears at [dcl.array] part 6:

An object of type “array of N U” consists of a contiguously allocated non-empty set of N subobjects of type U, known as the elements of the array, and numbered 0 to N-1.

(This clearly shows that the element of an element of an array is not an element of the array. The notion of "multi-dimensional arrays" only appears in the Example [dcl.array] Example 4 with word choices that suggests that this is just a mathematical/intuitive notion and not something that's well-defined in the standard.)

This general definition is augmented by [basic.compound] part 3, sentence 11 which states that:

For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T.

(5) Yes, the sub-arrays of a multidimensional array are separate "array objects" that have their own elements -- but they are also single elements within the same bigger array. This means that if we have int arr[3][3], then

  • int x = &(arr[2]) - &(arr[1]) is valid as the difference of two int (*)[3] pointers that point to elements of the same array arr
  • int y = &(arr[2][1]) - &(arr[2][2]) is valid as the difference of two int * pointers that point to elements of the same array arr[2]
  • int z = &(arr[1][1]) - &(arr[2][2]) is invalid, because one pointer points to an element of arr[1], while the other points to element of arr[2]
  • int w = arr[2] - arr[1] is also invalid, because these arrays decay to int * pointer and this difference is equivalent to &(arr[2][0]) - &(arr[1][0])

(6) "If an address (of a variable) is converted to a char * is this the same array object as the original variable (for example l)?" -- My intuition is that here we're dealing with an imagined char[] array object that "covers" the same memory region as the original variable, and any pointer trickery that start from the original object and produce char * pointers pointing into the original object essentially produces pointers to elements of this imagined char[] array. (We might say that this imagined char[] array is the object representation of the array, although that's probably not exactly accurate.)

(7)

Invalid indexing like int a[3][3]; int x = a[0][4]; is undefined behavior but why should then be allowed to use pointers to such an object and index it like an one-dimensional array, [...]

I'm not exactly sure what you're speaking about here. I completely agree that int a[3][3]; int x = a[0][4]; is undefined behavior, but there may be multiple ways to convert the multidimensional array into a single-dimensional one and some of them might be legitimate.

(8)

or convert an array pointer into an array with different type and index it

This seems to be forbidden by [expr.add] part 6. Note that the definition of the subscript operator is based on pointer addition.

(9)

We should allow anything that points into the same memory block (that is a variable or any array), or only allow pointers to the same array with same type and same size.
I don't understand this sentence; but note that [expr.add] part 2.2 demands that in pointer subtraction the two operands must be pointers to (cv-qualified or cv-unqualified versions of) the same completely-defined object type.

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it looks pretty good. Objectively speaking it improves the TP rate.
I have compared these reports against the constant interpreters of clang and gcc to see if in constexpr context which of these expression would trigger "UB" there.
I found I think only 3 places where this implementation disagrees what I think we should report.

int x, y, z[10];
int d = &y - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
d = z - &y; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
d = &x - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look good, except for &x - &x, which should be valid.
Wrt. &x - 1, that should be valid as well, given that the resulting pointer is not dereferenced. However, clang's constrant interpreter disagrees with gcc's on that case: https://godbolt.org/z/dMd1rMazY
However, like I said, I believe gcc's right by accepting that code.

@balazske
Copy link
Collaborator Author

With the current version I have the following observations:

  • There is a warning for (&x + 1) - &x and (&x - 1) - &x. Should this be fixed?
  • The code (int *)((char *)(&a[4]) + sizeof(int)) - &a[4] produces no warning but (int *)((char *)(&a[4]) + 1) - &a[4] produces warning. For 2-dimensional arrays there is warning for all of these cases (lines 44-47 in the test file). Is this possible to fix (to get warning in all cases), or no warning is needed here?

@NagyDonat
Copy link
Contributor

With the current version I have the following observations:

* There is a warning for `(&x + 1) - &x` and `(&x - 1) - &x`. Should this be fixed?

The expression (&x + 1) - &x is valid and should not produce a warning. It could appear e.g. in code that's structured like

void log_observations(double *begin, double *end, /*...*/) {
  log_observation_count(end - begin);
  // also log the actual observations
}
void do_observations(/*...*/) {
  double array[1024], *p = array;
  //...
  while (can_observe()) {
    *p++ = /* ... */
  }
  log_observations(array, p);
}
void do_single_observation(/*...*/) {
  if (!can_observe())
    return;
  double result = /* ... */
  // ...
  log_observations(&result, &result + 1);
}

On the other hand (&x - 1) - &x is not standard-compliant, because the standard guarantees an "after-the-end" pointer (which is valid in calculations but must not be dereferenced) but it doesn't accept a "before-the-begin" pointer.

* The code `(int *)((char *)(&a[4]) + sizeof(int)) - &a[4]` produces no warning but `(int *)((char *)(&a[4]) + 1) - &a[4]` produces warning.

That's very nice, even a slightly less accurate solution would be acceptable.

For 2-dimensional arrays there is warning for all of these cases (lines 44-47 in the test file). Is this possible to fix (to get warning in all cases), or no warning is needed here?

I'd say that the current behavior (warning on all of 44-47) is OK here -- this is very unusual trickery and deserves a highlight. However I could also accept a situation where there was no warning for these complex multidimensional cases.

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Minor nits.

@@ -53,10 +62,10 @@ void f4(void) {
int (*p)[m] = a; // p == &a[0]
p += 1; // p == &a[1]

// FIXME: This warning is not needed
// FIXME: This is a known problem with -Wpointer-arith
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put the issue link here and to the other place.

@balazske balazske merged commit 26224ca into llvm:main Jun 10, 2024
4 of 8 checks passed
@HerrCai0907 HerrCai0907 mentioned this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants