Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to index a VCF statically or on the fly #214

Open
nh13 opened this issue Nov 9, 2023 · 4 comments
Open

How to index a VCF statically or on the fly #214

nh13 opened this issue Nov 9, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request vcf

Comments

@nh13
Copy link
Contributor

nh13 commented Nov 9, 2023

I'd like to programmatically generate VCF records as part of my unit tests for various tools. In some cases I'll create new records then write them to disk to test the tool end to end (versus just passing methods the variant records themselves). The VCF needs to be indexed, and I was wondering how to do that after I've written the VCF as well as ideally on the fly.

I think to create the index, you can get the virtual position from the writer and perhaps something like this:

for (reference_sequence_name, start, end) in records {
writeln!(writer, "{}\t{}\t{}", reference_sequence_name, start, end)?;
let end_position = writer.virtual_position();
let chunk = Chunk::new(start_position, end_position);
indexer.add_record(reference_sequence_name, start, end, chunk)?;
start_position = end_position;
}
?

Any guidance would be greatly appreciated!

@zaeleus zaeleus added the vcf label Nov 9, 2023
@zaeleus
Copy link
Owner

zaeleus commented Nov 9, 2023

The tabix_write example can indeed be applied to create a tabix index for VFC on the fly. noodles-vcf/examples/vcf_index.rs shows how to create one from a file and what record information is need to build the index.

For the latter case, we could also add a fn index<P: AsRef<Path>>(src: P) -> io::Result<csi::Index> convenience function, if that would be useful.

@zaeleus zaeleus added the enhancement New feature or request label Nov 10, 2023
@zaeleus zaeleus self-assigned this Nov 10, 2023
@brentp
Copy link

brentp commented Nov 13, 2023

@zaeleus , I am trying to do this with the following code, but the index does not seem to be correct. what do I miss?
Thanks in advance:

    use noodles::core;
    use noodles::csi::{self as csi, index::reference_sequence::bin::Chunk};
    use noodles::vcf::header::record::value::{map::Contig, Map};
    use noodles::vcf::record::Position;

    use tempfile::tempdir;

    fn write_vcf(
        temp_dir: &PathBuf,
        chrom: String,
        start: usize,
        reference: &str,
        alternate: &str,
    ) -> String {
        let vcf_path = temp_dir.join("test.vcf.gz");
        let fs = File::create(&vcf_path).expect("error creating file");

        let mut wtr = bgzf::Writer::new(&fs);

        let mut writer = vcf::Writer::new(&mut wtr);
        let id = chrom.parse().expect("error parsing chrom");

        let contig = Map::<Contig>::new();
        let header = vcf::Header::builder().add_contig(id, contig.clone()).build();

        writer.write_header(&header).expect("error writing header");

        let start_position = writer.get_ref().virtual_position();

        let chrom = chrom.parse().expect("error parsing chrom");
        let record = vcf::Record::builder()
            .set_chromosome(chrom)
            .set_position(Position::from(start))
            .set_reference_bases(reference.parse().expect("error parsing reference"))
            .set_alternate_bases(alternate.parse().expect("error parsing alternate"))
            .set_info("AF=0.1".parse().expect("error parsing AF"))
            .build()
            .expect("error building record");

        writer.write_record(&header, &record).expect("error writing test record");

        let end_position = writer.get_ref().virtual_position();
        let chunk = Chunk::new(start_position, end_position);

        let alignment_context = Some((
            0 as usize,
            core::Position::try_from(start).expect("bad start in test"),
            core::Position::try_from(start + reference.len()).expect("bad end in test"),
            true,
        ));

        let mut indexer =
            csi::index::Indexer::default().set_header(csi::index::header::Builder::vcf().build());

        indexer.add_record(alignment_context, chunk).expect("error indexing");
        let index = indexer.build(1 as usize);

        let index_path = vcf_path.clone().into_os_string().into_string().unwrap() + ".csi";

        let index_file = File::create(&index_path).expect("error creating index file");
        let mut index_writer = csi::Writer::new(index_file);
        index_writer.write_index(&index).expect("error writing index");

        wtr.finish().expect("error finishing bgzf");
        return vcf_path.into_os_string().into_string().unwrap();
    }

@zaeleus
Copy link
Owner

zaeleus commented Nov 13, 2023

I am trying to do this with the following code, but the index does not seem to be correct. what do I miss?

Thanks for the example. The output is indeed incorrect, and the issues for these are being tracked in #215 and #216.

zaeleus added a commit that referenced this issue Nov 13, 2023
@zaeleus
Copy link
Owner

zaeleus commented Nov 13, 2023

For the latter case, we could also add a fn index<P: AsRef<Path>>(src: P) -> io::Result<csi::Index> convenience function, if that would be useful.

@nh13, this is added in noodles 0.58.0 / noodles-vcf 0.46.0 as vcf::index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request vcf
Projects
None yet
Development

No branches or pull requests

3 participants