Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze 7 #225

Open
wants to merge 168 commits into
base: master
Choose a base branch
from
Open

Freeze 7 #225

wants to merge 168 commits into from

Conversation

ch-kr
Copy link
Collaborator

@ch-kr ch-kr commented Oct 6, 2021

Code updating 455k vcf export for return to UKBB.

Updates:

  • Created sample and vcf (variant) data dictionaries and summaries (required by UKBB): sample_data_dicts.py, vcf_data_dicts.py, data_dict_summaries.md
  • Updated raw MT path (old path has been moved from standard storage to nearline): basics.py
  • Updated excluded samples path to point to most recent file for samples who withdrew consent: basics.py
  • Updated get_ukbb_data to remove new sample IDs with withdrawn consent. Also updated get_ukbb_data to exclude samples with undefined batch if ukbb_samples_only is set: basics.py
  • Updated index dict functions to remove subset, subpop information: utils.py
  • Updated validity checks to remove subset/subpop information and remove unnecessary checks on gnomAD data: sanity_checks.py. NOTE that the code is still expecting an older version of the validity checks from gnomAD methods -- this is the commit that will work (cannot use a more recent commit): d43951086bfea1b80b1bad34b3193f7c031dbc8d
  • Removed code adding/unfurling gnomAD fields for export: prepare_vcf_data_release.py
  • Added homalt release patch frequency to release VCF HT, MT: prepare_vcf_data_release.py
  • Updated homalt hotfix GT adjustment: prepare_vcf_data_release.py
  • Added code to get vcf shard start/end positions: prepare_vcf_data_release.py
  • Added code to generate manifest (required by DNAnexus for transfer): manifest.py
  • Added batch script to repackage VCF (required by DNAnexus): vcf_repackage.py
  • Added header repackage code from DNAnexus: ukbb_header_reformat.sh
  • Edited meta HT to remove control samples/samples with withdrawn consent/samples with undefined batch, rekey using UKBB ID, and rename ukbb_meta to ukb_meta: create_meta_ht.py

I think these are all the changes we need to return data -- please let me know if I missed anything!

@ch-kr ch-kr requested a review from jkgoodrich October 6, 2021 19:59
Copy link
Collaborator

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good so far. I didn't get through it all because it's very big, but I wanted to put in my current comments because I will be in meetings all day tomorrow and won't get to the rest of the files until Friday.

ukbb_qc/assessment/sanity_checks.py Show resolved Hide resolved
ukbb_qc/assessment/sanity_checks.py Outdated Show resolved Hide resolved
ukbb_qc/assessment/sanity_checks.py Outdated Show resolved Hide resolved
ukbb_qc/assessment/sanity_checks.py Outdated Show resolved Hide resolved
ukbb_qc/release/prepare_vcf_data_release.py Show resolved Hide resolved
ukbb_qc/sample_qc/create_meta_ht.py Show resolved Hide resolved
ukbb_qc/utils/utils.py Show resolved Hide resolved
ukbb_qc/utils/utils.py Outdated Show resolved Hide resolved
ukbb_qc/utils/utils.py Outdated Show resolved Hide resolved
ukbb_qc/utils/utils.py Outdated Show resolved Hide resolved
@ch-kr ch-kr requested a review from jkgoodrich November 5, 2021 18:17
@ch-kr ch-kr requested a review from jkgoodrich November 5, 2021 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants