Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed-length Character fields #235

Open
holtgrewe opened this issue Feb 21, 2024 · 2 comments
Open

Fixed-length Character fields #235

holtgrewe opened this issue Feb 21, 2024 · 2 comments

Comments

@holtgrewe
Copy link
Contributor

holtgrewe commented Feb 21, 2024

The glnexus tool writes out a FORMAT/RNC field with the following header.

##FORMAT=<ID=RNC,Number=2,Type=Character,Description="Reason for No Call in GT: . = n/a, M = Missing data, P = Partial data, I = gVCF input site is non-called, D = insufficient Depth of coverage, - = unrepresentable overlapping deletion, L = Lost/unrepresentable allele (other than deletion), U = multiple Unphased variants present, O = multiple Overlapping variants present, 1 = site is Monoallelic, no assertion about presence of REF or ALT allele">

In the data rows this looks as follows:

GT:RNC      ./.:II     1/1:15:1,14:1:16,4,0:..

In other words, it writes it out as strings of length 2. This is not allowed by noodles that expects I,I or .,..

Is this a special case that could be supported by nodles-vcf?

Also see:

@zaeleus
Copy link
Owner

zaeleus commented Feb 21, 2024

There is indeed no consensus on how fixed-sized character arrays are encoded/decoded. I'll try to push samtools/hts-specs#631 to see if can be resolved first.

As a workaround in noodles, you can modify the RNC format header record type definition to be a single string value, e.g.,

use noodles::vcf::{self, header::{record::value::map::format::Type, Number};

let mut header = reader.read_header()?;

// In a write context, write the original header before the modification to preserve the type
// definition.
writer.write_header(&header);

if let Some(format) = header.formats_mut().get_mut("RNC") {
    *format.number_mut() = Number::Count(1);
    *format.type_mut() = Type::String;
}

let mut record = vcf::Record::default();
reader.read_record(&header, &mut record)?;

writer.write_record(&header, &record)?;

@holtgrewe
Copy link
Contributor Author

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants