-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Handle filemaps with length 0 in codemap lookups #11904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This needs a test. |
@nick29581 Needs a test case to capture the problem here. |
Sorry, this has kind of been on the back burner. The bad news is the patch got a lot larger since rebasing. The other bad news is that I haven't managed to get a more minimal test case than running DXR on librustdoc. I'm not actually sure how to create a filemap with length 0 - I see them when running DXR indexing, but I can't create them in a minimal test case. I guess I might be able to come up with a unit test for this. |
It might be something to do with macros. That sounds like something @michaelwoerister came across when looking up spans for debug symbols. |
I think a unit test is acceptable. Regarding "running DXR on librustdoc", I think it should be filed as a separate issue. LGTM otherwise. |
@jdm I can't remember encountering this as an issue. There has recently been a small problem with invalid filemaps from |
… to bytepos_to_charpos.
Now with tests. (Also renamed bytepos_to_local_charpos to reflect what it actually does). |
// located in | ||
fn bytepos_to_local_charpos(&self, bpos: BytePos) -> CharPos { | ||
// Converts an absolute BytePos to a CharPos relative to the codemap. | ||
fn bytepos_to_charpos(&self, bpos: BytePos) -> CharPos { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming, but the documentation and name has changed, but neither has the functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we rather fix the return value? i.e. CharPos(bpos.to_uint() - map.start_pos - total_extra_bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not changed the behaviour. As far as I can tell, the behaviour was non-local.
I don't see why we should change the behaviour - the users seem to expect the current behaviour and it is a private fn. Also, it is not that simple to change because start_pos does not take account of any extra bytes. I guess you could change the total_extra_bytes to only count on the current line, but perhaps that won't work. Given that nothing is broken other than the name, I don't think we should change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be a bug in there. bytepos_to_local_charpos()
is used in just two places, and then the difference between the two results is used. Both values are off by the same value (filemap.start_pos) so the difference is correct again.
The char-pos calculated by the function can't be global to the whole codemap because it's it does not take into account any filemap before the current one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point. I'll try and come up with a better fix in another PR.
Update regex-syntax to support new word boundry assertions From the regex v1.10.0 release notes [1]: This is a new minor release of regex that adds support for start and end word boundary assertions. [...] The new word boundary assertions are: • \< or \b{start}: a Unicode start-of-word boundary (\W|\A on the left, \w on the right). • \> or \b{end}: a Unicode end-of-word boundary (\w on the left, \W|\z on the right)). • \b{start-half}: half of a Unicode start-of-word boundary (\W|\A on the left). • \b{end-half}: half of a Unicode end-of-word boundary (\W|\z on the right). [1]: https://github.com/rust-lang/regex/blob/master/CHANGELOG.md#1100-2023-10-09 changelog: [`regex`]: add support for start and end word boundary assertions ("\<", "\b{start}", etc.) introduced in regex v0.10
No description provided.