-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[clangd] Harden incomingCalls() against possible misinterpretation of a range as pertaining to the wrong file #111616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clangd] Harden incomingCalls() against possible misinterpretation of a range as pertaining to the wrong file #111616
Conversation
@llvm/pr-subscribers-clang-tools-extra Author: Nathan Ridge (HighCommander4) ChangesI don't have a concrete motivating scenario here, just something I noticed during code reading:
In the server implementation, we populate them based on Now, logically it should be the case that the definition location of the symbol referred to by The patch adds a check for this and discards ranges which are not in fact in the same file. Full diff: https://github.com/llvm/llvm-project/pull/111616.diff 1 Files Affected:
diff --git a/clang-tools-extra/clangd/XRefs.cpp b/clang-tools-extra/clangd/XRefs.cpp
index f94cadeffaa298..cca5035fb74bd4 100644
--- a/clang-tools-extra/clangd/XRefs.cpp
+++ b/clang-tools-extra/clangd/XRefs.cpp
@@ -63,6 +63,7 @@
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/Error.h"
+#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/Path.h"
#include "llvm/Support/raw_ostream.h"
#include <optional>
@@ -2272,18 +2273,14 @@ incomingCalls(const CallHierarchyItem &Item, const SymbolIndex *Index) {
// Initially store the ranges in a map keyed by SymbolID of the caller.
// This allows us to group different calls with the same caller
// into the same CallHierarchyIncomingCall.
- llvm::DenseMap<SymbolID, std::vector<Range>> CallsIn;
+ llvm::DenseMap<SymbolID, std::vector<SymbolLocation>> CallsIn;
// We can populate the ranges based on a refs request only. As we do so, we
// also accumulate the container IDs into a lookup request.
LookupRequest ContainerLookup;
Index->refs(Request, [&](const Ref &R) {
- auto Loc = indexToLSPLocation(R.Location, Item.uri.file());
- if (!Loc) {
- elog("incomingCalls failed to convert location: {0}", Loc.takeError());
- return;
- }
- auto It = CallsIn.try_emplace(R.Container, std::vector<Range>{}).first;
- It->second.push_back(Loc->range);
+ auto It =
+ CallsIn.try_emplace(R.Container, std::vector<SymbolLocation>{}).first;
+ It->second.push_back(R.Location);
ContainerLookup.IDs.insert(R.Container);
});
@@ -2292,9 +2289,26 @@ incomingCalls(const CallHierarchyItem &Item, const SymbolIndex *Index) {
Index->lookup(ContainerLookup, [&](const Symbol &Caller) {
auto It = CallsIn.find(Caller.ID);
assert(It != CallsIn.end());
- if (auto CHI = symbolToCallHierarchyItem(Caller, Item.uri.file()))
+ if (auto CHI = symbolToCallHierarchyItem(Caller, Item.uri.file())) {
+ SymbolLocation CallerLoc =
+ Caller.Definition ? Caller.Definition : Caller.CanonicalDeclaration;
+ std::vector<Range> FromRanges;
+ for (const SymbolLocation &L : It->second) {
+ if (StringRef{L.FileURI} != StringRef{CallerLoc.FileURI}) {
+ elog("incomingCalls: call location not in same file as caller, this "
+ "is unexpected");
+ continue;
+ }
+ Range R;
+ R.start.line = L.Start.line();
+ R.start.character = L.Start.column();
+ R.end.line = L.End.line();
+ R.end.character = L.End.column();
+ FromRanges.push_back(R);
+ }
Results.push_back(
- CallHierarchyIncomingCall{std::move(*CHI), std::move(It->second)});
+ CallHierarchyIncomingCall{std::move(*CHI), std::move(FromRanges)});
+ }
});
// Sort results by name of container.
llvm::sort(Results, [](const CallHierarchyIncomingCall &A,
|
@llvm/pr-subscribers-clangd Author: Nathan Ridge (HighCommander4) ChangesI don't have a concrete motivating scenario here, just something I noticed during code reading:
In the server implementation, we populate them based on Now, logically it should be the case that the definition location of the symbol referred to by The patch adds a check for this and discards ranges which are not in fact in the same file. Full diff: https://github.com/llvm/llvm-project/pull/111616.diff 1 Files Affected:
diff --git a/clang-tools-extra/clangd/XRefs.cpp b/clang-tools-extra/clangd/XRefs.cpp
index f94cadeffaa298..cca5035fb74bd4 100644
--- a/clang-tools-extra/clangd/XRefs.cpp
+++ b/clang-tools-extra/clangd/XRefs.cpp
@@ -63,6 +63,7 @@
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/Error.h"
+#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/Path.h"
#include "llvm/Support/raw_ostream.h"
#include <optional>
@@ -2272,18 +2273,14 @@ incomingCalls(const CallHierarchyItem &Item, const SymbolIndex *Index) {
// Initially store the ranges in a map keyed by SymbolID of the caller.
// This allows us to group different calls with the same caller
// into the same CallHierarchyIncomingCall.
- llvm::DenseMap<SymbolID, std::vector<Range>> CallsIn;
+ llvm::DenseMap<SymbolID, std::vector<SymbolLocation>> CallsIn;
// We can populate the ranges based on a refs request only. As we do so, we
// also accumulate the container IDs into a lookup request.
LookupRequest ContainerLookup;
Index->refs(Request, [&](const Ref &R) {
- auto Loc = indexToLSPLocation(R.Location, Item.uri.file());
- if (!Loc) {
- elog("incomingCalls failed to convert location: {0}", Loc.takeError());
- return;
- }
- auto It = CallsIn.try_emplace(R.Container, std::vector<Range>{}).first;
- It->second.push_back(Loc->range);
+ auto It =
+ CallsIn.try_emplace(R.Container, std::vector<SymbolLocation>{}).first;
+ It->second.push_back(R.Location);
ContainerLookup.IDs.insert(R.Container);
});
@@ -2292,9 +2289,26 @@ incomingCalls(const CallHierarchyItem &Item, const SymbolIndex *Index) {
Index->lookup(ContainerLookup, [&](const Symbol &Caller) {
auto It = CallsIn.find(Caller.ID);
assert(It != CallsIn.end());
- if (auto CHI = symbolToCallHierarchyItem(Caller, Item.uri.file()))
+ if (auto CHI = symbolToCallHierarchyItem(Caller, Item.uri.file())) {
+ SymbolLocation CallerLoc =
+ Caller.Definition ? Caller.Definition : Caller.CanonicalDeclaration;
+ std::vector<Range> FromRanges;
+ for (const SymbolLocation &L : It->second) {
+ if (StringRef{L.FileURI} != StringRef{CallerLoc.FileURI}) {
+ elog("incomingCalls: call location not in same file as caller, this "
+ "is unexpected");
+ continue;
+ }
+ Range R;
+ R.start.line = L.Start.line();
+ R.start.character = L.Start.column();
+ R.end.line = L.End.line();
+ R.end.character = L.End.column();
+ FromRanges.push_back(R);
+ }
Results.push_back(
- CallHierarchyIncomingCall{std::move(*CHI), std::move(It->second)});
+ CallHierarchyIncomingCall{std::move(*CHI), std::move(FromRanges)});
+ }
});
// Sort results by name of container.
llvm::sort(Results, [](const CallHierarchyIncomingCall &A,
|
clang-tools-extra/clangd/XRefs.cpp
Outdated
std::vector<Range> FromRanges; | ||
for (const SymbolLocation &L : It->second) { | ||
if (StringRef{L.FileURI} != StringRef{CallerLoc.FileURI}) { | ||
elog("incomingCalls: call location not in same file as caller, this " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we assert as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this case is possible.
// t.h
#pragma once
#define abc2 void test4() {
// t.cc
#include "./t.h"
void func();
abc
func();
}
calling call-hierarchy for func()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example! This is exactly the sort of case the hardening was intended to catch, though I didn't think of it.
The case raises an interesting question: what should our behaviour be for an example like this, where the function definition is spread across multiple files?
The protocol representation only allows specifying one file per caller:
export interface CallHierarchyIncomingCall {
/**
* The item that makes the call.
*/
from: CallHierarchyItem;
/**
* The ranges at which the calls appear. This is relative to the caller
* denoted by [`this.from`](#CallHierarchyIncomingCall.from).
*/
fromRanges: Range[];
}
Here, from.uri
is the only thing that encodes a file; the ranges in fromRanges
encode a pair of line/column positions only, and are assumed to be in the same file as from.uri
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good solution, it looks like the LSP protocol assumes that the ranges are in the same file as from.uri. I think it is worth to reporting it to LSP.
It is not a common case though, I think we could just drop these ranges in the current implementation.
clang-tools-extra/clangd/XRefs.cpp
Outdated
"is unexpected"); | ||
continue; | ||
} | ||
Range R; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I didn't use indexToLSPLocation()
here because the URI::resolve
operation that it does seems like unnecessary work here. I could factor out the part that converts the range into a function to avoid duplicating it here.
clang-tools-extra/clangd/XRefs.cpp
Outdated
std::vector<Range> FromRanges; | ||
for (const SymbolLocation &L : It->second) { | ||
if (StringRef{L.FileURI} != StringRef{CallerLoc.FileURI}) { | ||
elog("incomingCalls: call location not in same file as caller, this " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this case is possible.
// t.h
#pragma once
#define abc2 void test4() {
// t.cc
#include "./t.h"
void func();
abc
func();
}
calling call-hierarchy for func()
.
clang-tools-extra/clangd/XRefs.cpp
Outdated
It->second.push_back(Loc->range); | ||
auto It = | ||
CallsIn.try_emplace(R.Container, std::vector<SymbolLocation>{}).first; | ||
It->second.push_back(R.Location); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we may introduce a use-after-free issue by storing the SymbolLocation
, as the FileURI
field is an unowned raw pointer. The callback result R
is not guaranteed to remain valid outside of the callback context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks. I figured FileURI
would point to storage inside the ref slab which would stay alive for the lifetime of the index, but I do see now that the comment above of SymbolIndex::refs()
says "The returned result must be deep-copied if it's used outside Callback", so while it may work for some index implementations we can't rely on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it is fine for in-memory index, it is not safe for other implementations (e.g. remote index).
c72cc8d
to
8b263d7
Compare
… a range as pertaining to the wrong file
8b263d7
to
d8f5962
Compare
Updated patch with the following changes:
Should be ready for another round of review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description of the PR is stale and needs to be updated.
`CallHierarchyOutgoingCall::fromRanges` are interpreted as ranges in the same file as the item for which 'outgoingCalls' was called. It's possible for outgoing calls to be in a different file than that item if the item is just a declaration (e.g. in a header file). Now, such calls are dropped instead of being returned to the client. This is the same as the change made in #111616, but now for outgoing calls. Fixes clangd/clangd#2350 --------- Co-authored-by: Nathan Ridge <[email protected]>
`CallHierarchyOutgoingCall::fromRanges` are interpreted as ranges in the same file as the item for which 'outgoingCalls' was called. It's possible for outgoing calls to be in a different file than that item if the item is just a declaration (e.g. in a header file). Now, such calls are dropped instead of being returned to the client. This is the same as the change made in llvm#111616, but now for outgoing calls. Fixes clangd/clangd#2350 --------- Co-authored-by: Nathan Ridge <[email protected]>
`CallHierarchyOutgoingCall::fromRanges` are interpreted as ranges in the same file as the item for which 'outgoingCalls' was called. It's possible for outgoing calls to be in a different file than that item if the item is just a declaration (e.g. in a header file). Now, such calls are dropped instead of being returned to the client. This is the same as the change made in llvm#111616, but now for outgoing calls. Fixes clangd/clangd#2350 --------- Co-authored-by: Nathan Ridge <[email protected]>
`CallHierarchyOutgoingCall::fromRanges` are interpreted as ranges in the same file as the item for which 'outgoingCalls' was called. It's possible for outgoing calls to be in a different file than that item if the item is just a declaration (e.g. in a header file). Now, such calls are dropped instead of being returned to the client. This is the same as the change made in llvm#111616, but now for outgoing calls. Fixes clangd/clangd#2350 --------- Co-authored-by: Nathan Ridge <[email protected]>
`CallHierarchyOutgoingCall::fromRanges` are interpreted as ranges in the same file as the item for which 'outgoingCalls' was called. It's possible for outgoing calls to be in a different file than that item if the item is just a declaration (e.g. in a header file). Now, such calls are dropped instead of being returned to the client. This is the same as the change made in llvm#111616, but now for outgoing calls. Fixes clangd/clangd#2350 --------- Co-authored-by: Nathan Ridge <[email protected]>
I don't have a concrete motivating scenario here, just something I noticed during code reading:
CallHierarchyIncomingCall::fromRanges
are interpreted as ranges in the same file as theCallHierarchyItem
representing the caller (CallHierarchyIncomingCall::from
).In the server implementation, we populate them based on
Ref::Location
, taking only the range and discarding the file, and associate them with aCallHierarchyItem
representingRef::Container
.Now, logically it should be the case that the definition location of the symbol referred to by
Ref::Container
is in the same file as theRef
itself... but I don't think anything guarantees this, and if for some reason this doesn't hold, we are effectively taking ranges from one file and interpreting them as being in another file.The patch adds a check for this and discards ranges which are not in fact in the same file.