Don't encode negative ints as bcf_*_vector_end or other reserved values #766

jmarshall · 2018-09-07T12:26:52Z

The VCF v4.3 spec reserved a bunch of negative integers back in 2014 to allow for future expansion of missing / end-of-vector / etc special values:

Integers may be encoded as 8, 16, or 32 bit values, in little-endian order. It is up to the encoder to determine the appropriate ranged value to use when writing the BCF2 file. […] In total, eight values are reserved for future use: 0x80-0x87, 0x8000-0x8007, 0x80000000-0x80000007.

but it appears that this was never implemented in HTSlib.

In addition the previous code was avoiding bcf_int8/16_missing correctly but was wrongly encoding -127 and -32767 as bcf_int8/16_vector_end.

This patch is sufficient to fix samtools/bcftools#874 but it's possible that other code also needs modifying to properly reserve these values.

Also adds a test case exercising -127 and -128.

pd3 · 2018-09-11T11:08:09Z

Thanks for looking at this. I think the same should be done in bcf_enc_vint(). Currently it works because the condition checks for the vector_end rather than missing, but the 8 reserved values are not respected.

jmarshall · 2018-09-11T12:14:40Z

You are quite right. I thought I looked at that function too, but apparently not carefully enough!
I'll amend the PR.

BCF_BT_INT8 values in the range 0x80-0x87 are reserved in VCFv4.3, so such integers must be encoded as BCF_BT_INT16 rather than BCF_BT_INT8. Similarly 0x8000-0x8007 is reserved in BCF_BT_INT16 so such integers must be encoded as BCF_BT_INT32. The range 0x80000000-0x80000007 is reserved in BCF_BT_INT32, but this commit does not add an error case if such integers present themselves. In particular, the previous bcf_enc_int1() code was avoiding bcf_int8/16_missing correctly but was wrongly encoding -127 and -32767 as bcf_int8/16_vector_end. Fixes samtools/bcftools#874. Add a test case exercising -127 and -128.

jmarshall · 2018-09-11T14:21:27Z

bcf_enc_vint() also now fixed.

This is probably NEWS-worthy, as previously bcftools has been writing slightly incorrect .bcf files. (Previously-written files remain readable with bcftools after this bug fix.)

pd3 · 2018-09-11T15:14:07Z

Thank you

Add NEWS item describing the bug.

daviesrob · 2018-09-12T15:21:46Z

Thanks, merged with NEWS update.

jmarshall · 2018-09-12T16:00:41Z

Thanks. Um, I think it's -121…-128 and -32761…-32768 that get bumped up to the next length.

Re the second NEWS paragraph: BTW I haven't investigated fully when it prints out -127 in non-vector context as hoped for and when (like in the filter expression that exposed this) it sees it as vector_end. But it looks like plain converting does indeed work. Usually, hopefully 😄

daviesrob · 2018-09-12T16:42:14Z

Yes, you're right. 0x88 and 0x8008 escape unchanged. I'll modify the text.

I checked what happens in both scalar and vector contexts. It does appear to work in both as far as writing out VCF. Probably not safe to claim anything more than that though...

jmarshall force-pushed the reserved-negints branch from 5859f70 to 52368d6 Compare September 11, 2018 13:42

daviesrob merged commit 52368d6 into samtools:develop Sep 12, 2018

daviesrob added a commit that referenced this pull request Sep 12, 2018

Merge Don't encode negative ints as reserved values (PR #766)

c646f80

Add NEWS item describing the bug.

jmarshall deleted the reserved-negints branch September 12, 2018 16:01

jmarshall mentioned this pull request Mar 4, 2019

bcf_get_info_values fails to read values if value is -127 #832

Closed

jmarshall mentioned this pull request Dec 9, 2019

64bit integers break VCF/BCF #999

Closed

dlaehnemann mentioned this pull request Feb 5, 2020

Update to htslib 1.10.2 rust-bio/rust-htslib#184

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't encode negative ints as bcf_*_vector_end or other reserved values #766

Don't encode negative ints as bcf_*_vector_end or other reserved values #766

Uh oh!

jmarshall commented Sep 7, 2018 •

edited

Loading

Uh oh!

pd3 commented Sep 11, 2018

Uh oh!

jmarshall commented Sep 11, 2018

Uh oh!

jmarshall commented Sep 11, 2018 •

edited

Loading

Uh oh!

pd3 commented Sep 11, 2018

Uh oh!

daviesrob commented Sep 12, 2018

Uh oh!

jmarshall commented Sep 12, 2018

Uh oh!

daviesrob commented Sep 12, 2018

Uh oh!

Uh oh!

Don't encode negative ints as bcf_*_vector_end or other reserved values #766

Don't encode negative ints as bcf_*_vector_end or other reserved values #766

Uh oh!

Conversation

jmarshall commented Sep 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pd3 commented Sep 11, 2018

Uh oh!

jmarshall commented Sep 11, 2018

Uh oh!

jmarshall commented Sep 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pd3 commented Sep 11, 2018

Uh oh!

daviesrob commented Sep 12, 2018

Uh oh!

jmarshall commented Sep 12, 2018

Uh oh!

daviesrob commented Sep 12, 2018

Uh oh!

Uh oh!

jmarshall commented Sep 7, 2018 •

edited

Loading

jmarshall commented Sep 11, 2018 •

edited

Loading