Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: transform subcommand #45

Merged
merged 23 commits into from
May 14, 2022
Merged

feat: transform subcommand #45

merged 23 commits into from
May 14, 2022

Conversation

aryarm
Copy link
Member

@aryarm aryarm commented May 11, 2022

resolves #33
note: please merge this before #43 !

Overview

This PR adds a subcommand to output haplotypes in VCF format, so that we can obtain counts of each haplotype in each chromosome of each sample. This functionality is primarily handled by a new Haplotype.transform() method.
There is also a Haplotypes.transform() method to transform multiple haplotypes at once. At the moment, Haplotypes.transform() just calls Haplotype.transform() repeatedly, but I think there might be ways to make this faster.

There are also new methods to write the haplotypes to a VCF and some small improvements to make reading from VCFs less memory-intensive.

Usage and docs

I've documented the transform subcommand here in the commands section.

Usage of the transform() methods for the Haplotype and Haplotypes classes are documented in the API docs here and here, respectively.

Testing

I added the following tests to the TestHaplotypes class in tests/test_data.py:

  1. test_hap_transform()
    Try the Haplotype.transform() method on a basic haplotype and its genotypes.
  2. test_haps_transform()
    Try the Haplotypes.transform() method on a set of basic haplotypes and genotypes.
  3. test_hap_gt_write()
    Try to write haplotypes to a VCF. Read the VCF back and check that it looks like what we would expect.

@aryarm aryarm mentioned this pull request May 11, 2022
@aryarm aryarm marked this pull request as ready for review May 14, 2022 05:10
@aryarm aryarm merged commit 34a839d into feat/haplotypes May 14, 2022
@aryarm aryarm deleted the feat/transform branch May 14, 2022 05:11
@aryarm aryarm linked an issue May 14, 2022 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support for a haps format spec that extends the VCF format spec
1 participant