-
Notifications
You must be signed in to change notification settings - Fork 450
Don't encode negative ints as bcf_*_vector_end or other reserved values #766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for looking at this. I think the same should be done in |
You are quite right. I thought I looked at that function too, but apparently not carefully enough! |
BCF_BT_INT8 values in the range 0x80-0x87 are reserved in VCFv4.3, so such integers must be encoded as BCF_BT_INT16 rather than BCF_BT_INT8. Similarly 0x8000-0x8007 is reserved in BCF_BT_INT16 so such integers must be encoded as BCF_BT_INT32. The range 0x80000000-0x80000007 is reserved in BCF_BT_INT32, but this commit does not add an error case if such integers present themselves. In particular, the previous bcf_enc_int1() code was avoiding bcf_int8/16_missing correctly but was wrongly encoding -127 and -32767 as bcf_int8/16_vector_end. Fixes samtools/bcftools#874. Add a test case exercising -127 and -128.
5859f70
to
52368d6
Compare
This is probably NEWS-worthy, as previously bcftools has been writing slightly incorrect .bcf files. (Previously-written files remain readable with bcftools after this bug fix.) |
Thank you |
Add NEWS item describing the bug.
Thanks, merged with NEWS update. |
Thanks. Um, I think it's -121…-128 and -32761…-32768 that get bumped up to the next length. Re the second NEWS paragraph: BTW I haven't investigated fully when it prints out -127 in non-vector context as hoped for and when (like in the filter expression that exposed this) it sees it as vector_end. But it looks like plain converting does indeed work. Usually, hopefully 😄 |
Yes, you're right. 0x88 and 0x8008 escape unchanged. I'll modify the text. I checked what happens in both scalar and vector contexts. It does appear to work in both as far as writing out VCF. Probably not safe to claim anything more than that though... |
The VCF v4.3 spec reserved a bunch of negative integers back in 2014 to allow for future expansion of missing / end-of-vector / etc special values:
but it appears that this was never implemented in HTSlib.
In addition the previous code was avoiding
bcf_int8/16_missing
correctly but was wrongly encoding -127 and -32767 asbcf_int8/16_vector_end
.This patch is sufficient to fix samtools/bcftools#874 but it's possible that other code also needs modifying to properly reserve these values.
Also adds a test case exercising -127 and -128.