Skip to content

Handle filemaps with length 0 in codemap lookups #11904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 19, 2014
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 78 additions & 8 deletions src/libsyntax/codemap.rs
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,20 @@ impl CodeMap {
a = m;
}
}
// There can be filemaps with length 0. These have the same start_pos as the previous
// filemap, but are not the filemaps we want (because they are length 0, they cannot
// contain what we are looking for). So, rewind until we find a useful filemap.
loop {
let lines = files[a].lines.borrow();
let lines = lines.get();
if lines.len() > 0 {
break;
}
if a == 0 {
fail!("position {} does not resolve to a source location", pos.to_uint());
}
a -= 1;
}
if a >= len {
fail!("position {} does not resolve to a source location", pos.to_uint())
}
Expand All @@ -406,10 +420,10 @@ impl CodeMap {
fn lookup_pos(&self, pos: BytePos) -> Loc {
let FileMapAndLine {fm: f, line: a} = self.lookup_line(pos);
let line = a + 1u; // Line numbers start at 1
let chpos = self.bytepos_to_local_charpos(pos);
let mut lines = f.lines.borrow_mut();
let chpos = self.bytepos_to_charpos(pos);
let lines = f.lines.borrow();
let linebpos = lines.get()[a];
let linechpos = self.bytepos_to_local_charpos(linebpos);
let linechpos = self.bytepos_to_charpos(linebpos);
debug!("codemap: byte pos {:?} is on the line at byte pos {:?}",
pos, linebpos);
debug!("codemap: char pos {:?} is on the line at char pos {:?}",
Expand All @@ -432,9 +446,8 @@ impl CodeMap {
return FileMapAndBytePos {fm: fm, pos: offset};
}

// Converts an absolute BytePos to a CharPos relative to the file it is
// located in
fn bytepos_to_local_charpos(&self, bpos: BytePos) -> CharPos {
// Converts an absolute BytePos to a CharPos relative to the codemap.
fn bytepos_to_charpos(&self, bpos: BytePos) -> CharPos {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming, but the documentation and name has changed, but neither has the functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rather fix the return value? i.e. CharPos(bpos.to_uint() - map.start_pos - total_extra_bytes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not changed the behaviour. As far as I can tell, the behaviour was non-local.

I don't see why we should change the behaviour - the users seem to expect the current behaviour and it is a private fn. Also, it is not that simple to change because start_pos does not take account of any extra bytes. I guess you could change the total_extra_bytes to only count on the current line, but perhaps that won't work. Given that nothing is broken other than the name, I don't think we should change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a bug in there. bytepos_to_local_charpos() is used in just two places, and then the difference between the two results is used. Both values are off by the same value (filemap.start_pos) so the difference is correct again.

The char-pos calculated by the function can't be global to the whole codemap because it's it does not take into account any filemap before the current one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. I'll try and come up with a better fix in another PR.

debug!("codemap: converting {:?} to char pos", bpos);
let idx = self.lookup_filemap_idx(bpos);
let files = self.files.borrow();
Expand All @@ -450,8 +463,8 @@ impl CodeMap {
total_extra_bytes += mbc.bytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be total_extra_bytes += (mbc.bytes - 1);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? I don't think so - purely because this path is used in the compiler a fair bit and it doesn't go wrong - I would expect things to be broken if it were off by so much. Could you try and make a test and possibly a fix in a separate PR please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I find the time. However, it seems that nothing can wrong here as long as there are no multibyte characters involved, which is rather rare. I could imagine that a bug here could remain unnoticed for some time.

// We should never see a byte position in the middle of a
// character
assert!(bpos == mbc.pos
|| bpos.to_uint() >= mbc.pos.to_uint() + mbc.bytes);
assert!(bpos == mbc.pos ||
bpos.to_uint() >= mbc.pos.to_uint() + mbc.bytes);
} else {
break;
}
Expand Down Expand Up @@ -486,4 +499,61 @@ mod test {
fm.next_line(BytePos(10));
fm.next_line(BytePos(2));
}

fn init_code_map() ->CodeMap {
let cm = CodeMap::new();
let fm1 = cm.new_filemap(~"blork.rs",~"first line.\nsecond line");
let fm2 = cm.new_filemap(~"empty.rs",~"");
let fm3 = cm.new_filemap(~"blork2.rs",~"first line.\nsecond line");

fm1.next_line(BytePos(0));
fm1.next_line(BytePos(12));
fm2.next_line(BytePos(23));
fm3.next_line(BytePos(23));
fm3.next_line(BytePos(33));

cm
}

#[test]
fn t3() {
// Test lookup_byte_offset
let cm = init_code_map();

let fmabp1 = cm.lookup_byte_offset(BytePos(22));
assert_eq!(fmabp1.fm.name, ~"blork.rs");
assert_eq!(fmabp1.pos, BytePos(22));

let fmabp2 = cm.lookup_byte_offset(BytePos(23));
assert_eq!(fmabp2.fm.name, ~"blork2.rs");
assert_eq!(fmabp2.pos, BytePos(0));
}

#[test]
fn t4() {
// Test bytepos_to_charpos
let cm = init_code_map();

let cp1 = cm.bytepos_to_charpos(BytePos(22));
assert_eq!(cp1, CharPos(22));

let cp2 = cm.bytepos_to_charpos(BytePos(23));
assert_eq!(cp2, CharPos(23));
}

#[test]
fn t5() {
// Test zero-length filemaps.
let cm = init_code_map();

let loc1 = cm.lookup_char_pos(BytePos(22));
assert_eq!(loc1.file.name, ~"blork.rs");
assert_eq!(loc1.line, 2);
assert_eq!(loc1.col, CharPos(10));

let loc2 = cm.lookup_char_pos(BytePos(23));
assert_eq!(loc2.file.name, ~"blork2.rs");
assert_eq!(loc2.line, 1);
assert_eq!(loc2.col, CharPos(0));
}
}