Skip to content

[clangd] Harden incomingCalls() against possible misinterpretation of a range as pertaining to the wrong file #111616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 19, 2024

Conversation

HighCommander4
Copy link
Collaborator

I don't have a concrete motivating scenario here, just something I noticed during code reading:

CallHierarchyIncomingCall::fromRanges are interpreted as ranges in the same file as the CallHierarchyItem representing the caller (CallHierarchyIncomingCall::from).

In the server implementation, we populate them based on Ref::Location, taking only the range and discarding the file, and associate them with a CallHierarchyItem representing Ref::Container.

Now, logically it should be the case that the definition location of the symbol referred to by Ref::Container is in the same file as the Ref itself... but I don't think anything guarantees this, and if for some reason this doesn't hold, we are effectively taking ranges from one file and interpreting them as being in another file.

The patch adds a check for this and discards ranges which are not in fact in the same file.

@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2024

@llvm/pr-subscribers-clang-tools-extra

Author: Nathan Ridge (HighCommander4)

Changes

I don't have a concrete motivating scenario here, just something I noticed during code reading:

CallHierarchyIncomingCall::fromRanges are interpreted as ranges in the same file as the CallHierarchyItem representing the caller (CallHierarchyIncomingCall::from).

In the server implementation, we populate them based on Ref::Location, taking only the range and discarding the file, and associate them with a CallHierarchyItem representing Ref::Container.

Now, logically it should be the case that the definition location of the symbol referred to by Ref::Container is in the same file as the Ref itself... but I don't think anything guarantees this, and if for some reason this doesn't hold, we are effectively taking ranges from one file and interpreting them as being in another file.

The patch adds a check for this and discards ranges which are not in fact in the same file.


Full diff: https://github.com/llvm/llvm-project/pull/111616.diff

1 Files Affected:

  • (modified) clang-tools-extra/clangd/XRefs.cpp (+24-10)
diff --git a/clang-tools-extra/clangd/XRefs.cpp b/clang-tools-extra/clangd/XRefs.cpp
index f94cadeffaa298..cca5035fb74bd4 100644
--- a/clang-tools-extra/clangd/XRefs.cpp
+++ b/clang-tools-extra/clangd/XRefs.cpp
@@ -63,6 +63,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Error.h"
+#include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/raw_ostream.h"
 #include <optional>
@@ -2272,18 +2273,14 @@ incomingCalls(const CallHierarchyItem &Item, const SymbolIndex *Index) {
   // Initially store the ranges in a map keyed by SymbolID of the caller.
   // This allows us to group different calls with the same caller
   // into the same CallHierarchyIncomingCall.
-  llvm::DenseMap<SymbolID, std::vector<Range>> CallsIn;
+  llvm::DenseMap<SymbolID, std::vector<SymbolLocation>> CallsIn;
   // We can populate the ranges based on a refs request only. As we do so, we
   // also accumulate the container IDs into a lookup request.
   LookupRequest ContainerLookup;
   Index->refs(Request, [&](const Ref &R) {
-    auto Loc = indexToLSPLocation(R.Location, Item.uri.file());
-    if (!Loc) {
-      elog("incomingCalls failed to convert location: {0}", Loc.takeError());
-      return;
-    }
-    auto It = CallsIn.try_emplace(R.Container, std::vector<Range>{}).first;
-    It->second.push_back(Loc->range);
+    auto It =
+        CallsIn.try_emplace(R.Container, std::vector<SymbolLocation>{}).first;
+    It->second.push_back(R.Location);
 
     ContainerLookup.IDs.insert(R.Container);
   });
@@ -2292,9 +2289,26 @@ incomingCalls(const CallHierarchyItem &Item, const SymbolIndex *Index) {
   Index->lookup(ContainerLookup, [&](const Symbol &Caller) {
     auto It = CallsIn.find(Caller.ID);
     assert(It != CallsIn.end());
-    if (auto CHI = symbolToCallHierarchyItem(Caller, Item.uri.file()))
+    if (auto CHI = symbolToCallHierarchyItem(Caller, Item.uri.file())) {
+      SymbolLocation CallerLoc =
+          Caller.Definition ? Caller.Definition : Caller.CanonicalDeclaration;
+      std::vector<Range> FromRanges;
+      for (const SymbolLocation &L : It->second) {
+        if (StringRef{L.FileURI} != StringRef{CallerLoc.FileURI}) {
+          elog("incomingCalls: call location not in same file as caller, this "
+               "is unexpected");
+          continue;
+        }
+        Range R;
+        R.start.line = L.Start.line();
+        R.start.character = L.Start.column();
+        R.end.line = L.End.line();
+        R.end.character = L.End.column();
+        FromRanges.push_back(R);
+      }
       Results.push_back(
-          CallHierarchyIncomingCall{std::move(*CHI), std::move(It->second)});
+          CallHierarchyIncomingCall{std::move(*CHI), std::move(FromRanges)});
+    }
   });
   // Sort results by name of container.
   llvm::sort(Results, [](const CallHierarchyIncomingCall &A,

@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2024

@llvm/pr-subscribers-clangd

Author: Nathan Ridge (HighCommander4)

Changes

I don't have a concrete motivating scenario here, just something I noticed during code reading:

CallHierarchyIncomingCall::fromRanges are interpreted as ranges in the same file as the CallHierarchyItem representing the caller (CallHierarchyIncomingCall::from).

In the server implementation, we populate them based on Ref::Location, taking only the range and discarding the file, and associate them with a CallHierarchyItem representing Ref::Container.

Now, logically it should be the case that the definition location of the symbol referred to by Ref::Container is in the same file as the Ref itself... but I don't think anything guarantees this, and if for some reason this doesn't hold, we are effectively taking ranges from one file and interpreting them as being in another file.

The patch adds a check for this and discards ranges which are not in fact in the same file.


Full diff: https://github.com/llvm/llvm-project/pull/111616.diff

1 Files Affected:

  • (modified) clang-tools-extra/clangd/XRefs.cpp (+24-10)
diff --git a/clang-tools-extra/clangd/XRefs.cpp b/clang-tools-extra/clangd/XRefs.cpp
index f94cadeffaa298..cca5035fb74bd4 100644
--- a/clang-tools-extra/clangd/XRefs.cpp
+++ b/clang-tools-extra/clangd/XRefs.cpp
@@ -63,6 +63,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Error.h"
+#include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/raw_ostream.h"
 #include <optional>
@@ -2272,18 +2273,14 @@ incomingCalls(const CallHierarchyItem &Item, const SymbolIndex *Index) {
   // Initially store the ranges in a map keyed by SymbolID of the caller.
   // This allows us to group different calls with the same caller
   // into the same CallHierarchyIncomingCall.
-  llvm::DenseMap<SymbolID, std::vector<Range>> CallsIn;
+  llvm::DenseMap<SymbolID, std::vector<SymbolLocation>> CallsIn;
   // We can populate the ranges based on a refs request only. As we do so, we
   // also accumulate the container IDs into a lookup request.
   LookupRequest ContainerLookup;
   Index->refs(Request, [&](const Ref &R) {
-    auto Loc = indexToLSPLocation(R.Location, Item.uri.file());
-    if (!Loc) {
-      elog("incomingCalls failed to convert location: {0}", Loc.takeError());
-      return;
-    }
-    auto It = CallsIn.try_emplace(R.Container, std::vector<Range>{}).first;
-    It->second.push_back(Loc->range);
+    auto It =
+        CallsIn.try_emplace(R.Container, std::vector<SymbolLocation>{}).first;
+    It->second.push_back(R.Location);
 
     ContainerLookup.IDs.insert(R.Container);
   });
@@ -2292,9 +2289,26 @@ incomingCalls(const CallHierarchyItem &Item, const SymbolIndex *Index) {
   Index->lookup(ContainerLookup, [&](const Symbol &Caller) {
     auto It = CallsIn.find(Caller.ID);
     assert(It != CallsIn.end());
-    if (auto CHI = symbolToCallHierarchyItem(Caller, Item.uri.file()))
+    if (auto CHI = symbolToCallHierarchyItem(Caller, Item.uri.file())) {
+      SymbolLocation CallerLoc =
+          Caller.Definition ? Caller.Definition : Caller.CanonicalDeclaration;
+      std::vector<Range> FromRanges;
+      for (const SymbolLocation &L : It->second) {
+        if (StringRef{L.FileURI} != StringRef{CallerLoc.FileURI}) {
+          elog("incomingCalls: call location not in same file as caller, this "
+               "is unexpected");
+          continue;
+        }
+        Range R;
+        R.start.line = L.Start.line();
+        R.start.character = L.Start.column();
+        R.end.line = L.End.line();
+        R.end.character = L.End.column();
+        FromRanges.push_back(R);
+      }
       Results.push_back(
-          CallHierarchyIncomingCall{std::move(*CHI), std::move(It->second)});
+          CallHierarchyIncomingCall{std::move(*CHI), std::move(FromRanges)});
+    }
   });
   // Sort results by name of container.
   llvm::sort(Results, [](const CallHierarchyIncomingCall &A,

std::vector<Range> FromRanges;
for (const SymbolLocation &L : It->second) {
if (StringRef{L.FileURI} != StringRef{CallerLoc.FileURI}) {
elog("incomingCalls: call location not in same file as caller, this "
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case is possible.

// t.h
#pragma  once
#define abc2 void test4() {

// t.cc
#include "./t.h"

void func();

abc 
  func();
}

calling call-hierarchy for func().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example! This is exactly the sort of case the hardening was intended to catch, though I didn't think of it.

The case raises an interesting question: what should our behaviour be for an example like this, where the function definition is spread across multiple files?

The protocol representation only allows specifying one file per caller:

export interface CallHierarchyIncomingCall {

	/**
	 * The item that makes the call.
	 */
	from: CallHierarchyItem;

	/**
	 * The ranges at which the calls appear. This is relative to the caller
	 * denoted by [`this.from`](#CallHierarchyIncomingCall.from).
	 */
	fromRanges: Range[];
}

Here, from.uri is the only thing that encodes a file; the ranges in fromRanges encode a pair of line/column positions only, and are assumed to be in the same file as from.uri.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good solution, it looks like the LSP protocol assumes that the ranges are in the same file as from.uri. I think it is worth to reporting it to LSP.

It is not a common case though, I think we could just drop these ranges in the current implementation.

"is unexpected");
continue;
}
Range R;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I didn't use indexToLSPLocation() here because the URI::resolve operation that it does seems like unnecessary work here. I could factor out the part that converts the range into a function to avoid duplicating it here.

std::vector<Range> FromRanges;
for (const SymbolLocation &L : It->second) {
if (StringRef{L.FileURI} != StringRef{CallerLoc.FileURI}) {
elog("incomingCalls: call location not in same file as caller, this "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case is possible.

// t.h
#pragma  once
#define abc2 void test4() {

// t.cc
#include "./t.h"

void func();

abc 
  func();
}

calling call-hierarchy for func().

It->second.push_back(Loc->range);
auto It =
CallsIn.try_emplace(R.Container, std::vector<SymbolLocation>{}).first;
It->second.push_back(R.Location);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we may introduce a use-after-free issue by storing the SymbolLocation, as the FileURI field is an unowned raw pointer. The callback result R is not guaranteed to remain valid outside of the callback context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. I figured FileURI would point to storage inside the ref slab which would stay alive for the lifetime of the index, but I do see now that the comment above of SymbolIndex::refs() says "The returned result must be deep-copied if it's used outside Callback", so while it may work for some index implementations we can't rely on it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it is fine for in-memory index, it is not safe for other implementations (e.g. remote index).

@HighCommander4 HighCommander4 force-pushed the incoming-calls-hardening branch 2 times, most recently from c72cc8d to 8b263d7 Compare November 17, 2024 06:38
@HighCommander4 HighCommander4 force-pushed the incoming-calls-hardening branch from 8b263d7 to d8f5962 Compare November 17, 2024 06:42
@HighCommander4
Copy link
Collaborator Author

Updated patch with the following changes:

  • Resolve the potential UAF by storing clangd::Location rather than SymbolLocation in the CallsIn map
  • Add a testcase for the scenario from this thread

Should be ready for another round of review.

Copy link
Collaborator

@hokein hokein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of the PR is stale and needs to be updated.

@HighCommander4 HighCommander4 merged commit 545917c into llvm:main Nov 19, 2024
8 checks passed
HighCommander4 added a commit that referenced this pull request Apr 24, 2025
`CallHierarchyOutgoingCall::fromRanges` are interpreted as ranges in the
same file as the item for which 'outgoingCalls' was called.

It's possible for outgoing calls to be in a different file than that
item if the item is just a declaration (e.g. in a header file). Now,
such calls are dropped instead of being returned to the client.

This is the same as the change made in #111616, but now for outgoing
calls.

Fixes clangd/clangd#2350

---------

Co-authored-by: Nathan Ridge <[email protected]>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
`CallHierarchyOutgoingCall::fromRanges` are interpreted as ranges in the
same file as the item for which 'outgoingCalls' was called.

It's possible for outgoing calls to be in a different file than that
item if the item is just a declaration (e.g. in a header file). Now,
such calls are dropped instead of being returned to the client.

This is the same as the change made in llvm#111616, but now for outgoing
calls.

Fixes clangd/clangd#2350

---------

Co-authored-by: Nathan Ridge <[email protected]>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
`CallHierarchyOutgoingCall::fromRanges` are interpreted as ranges in the
same file as the item for which 'outgoingCalls' was called.

It's possible for outgoing calls to be in a different file than that
item if the item is just a declaration (e.g. in a header file). Now,
such calls are dropped instead of being returned to the client.

This is the same as the change made in llvm#111616, but now for outgoing
calls.

Fixes clangd/clangd#2350

---------

Co-authored-by: Nathan Ridge <[email protected]>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
`CallHierarchyOutgoingCall::fromRanges` are interpreted as ranges in the
same file as the item for which 'outgoingCalls' was called.

It's possible for outgoing calls to be in a different file than that
item if the item is just a declaration (e.g. in a header file). Now,
such calls are dropped instead of being returned to the client.

This is the same as the change made in llvm#111616, but now for outgoing
calls.

Fixes clangd/clangd#2350

---------

Co-authored-by: Nathan Ridge <[email protected]>
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
`CallHierarchyOutgoingCall::fromRanges` are interpreted as ranges in the
same file as the item for which 'outgoingCalls' was called.

It's possible for outgoing calls to be in a different file than that
item if the item is just a declaration (e.g. in a header file). Now,
such calls are dropped instead of being returned to the client.

This is the same as the change made in llvm#111616, but now for outgoing
calls.

Fixes clangd/clangd#2350

---------

Co-authored-by: Nathan Ridge <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants