-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: transform
subcommand
#45
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Helps avoid memory "Killed" errors Unfortunately, reading from a VCF is still unbearably slow - we need to use PLINK or BGEN
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
resolves #33
note: please merge this before #43 !
Overview
This PR adds a subcommand to output haplotypes in VCF format, so that we can obtain counts of each haplotype in each chromosome of each sample. This functionality is primarily handled by a new
Haplotype.transform()
method.There is also a
Haplotypes.transform()
method to transform multiple haplotypes at once. At the moment,Haplotypes.transform()
just callsHaplotype.transform()
repeatedly, but I think there might be ways to make this faster.There are also new methods to write the haplotypes to a VCF and some small improvements to make reading from VCFs less memory-intensive.
Usage and docs
I've documented the
transform
subcommand here in the commands section.Usage of the
transform()
methods for theHaplotype
andHaplotypes
classes are documented in the API docs here and here, respectively.Testing
I added the following tests to the
TestHaplotypes
class intests/test_data.py
:test_hap_transform()
Try the
Haplotype.transform()
method on a basic haplotype and its genotypes.test_haps_transform()
Try the
Haplotypes.transform()
method on a set of basic haplotypes and genotypes.test_hap_gt_write()
Try to write haplotypes to a VCF. Read the VCF back and check that it looks like what we would expect.