Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variant Annotations are missing from the 20220216 releases #2

Open
mckeowr1 opened this issue Jul 7, 2022 · 3 comments
Open

Variant Annotations are missing from the 20220216 releases #2

mckeowr1 opened this issue Jul 7, 2022 · 3 comments

Comments

@mckeowr1
Copy link
Contributor

mckeowr1 commented Jul 7, 2022

Variant annotations for consequence == 'intron_variant' are missing in the most recent releases. The variants are still present but are not annotated with a gene or consequence.

@mckeowr1
Copy link
Contributor Author

mckeowr1 commented Aug 3, 2022

I looked at the VCF after it is annotated by BCSQ and there are intronic variant annotations. It's likely to do with the generation of the flatfile

@mckeowr1
Copy link
Contributor Author

mckeowr1 commented Aug 3, 2022

The first place I thought we could be losing these is during the process bcsq_extract_scores which runs a bcftools query to pull out information to make the tsv file. It looks like they are still present.

head BCSQ_scores.tsv

III     1331    A       G       synonymous|WBGene00008352|cTel54X.1.1|protein_coding|-|324G|1331A>G     6       NA      94.46
III     7934    G       C       intron|WBGene00019183||protein_coding   NA      NA      NA
III     8031    T       C       intron|WBGene00019183||protein_coding   NA      NA      NA
III     23328   A       C       intron|WBGene00019185||protein_coding   NA      NA      NA
III     23344   G       C       intron|WBGene00019185||protein_coding   NA      NA      NA
III     23345   T       TC      intron|WBGene00019185||protein_coding   NA      NA      NA
III     23352   TA      T       intron|WBGene00019185||protein_coding   NA      NA      NA
III     23356   C       CCG     intron|WBGene00019185||protein_coding   NA      NA      NA
III     23358   AAG     A       intron|WBGene00019185||protein_coding   NA      NA      NA

@mckeowr1
Copy link
Contributor Author

mckeowr1 commented Aug 3, 2022

In the next process bcsq_extract_samples it appears that we are losing the intronic annotation:
head BCSQ_samples.tsv

III     1331    A       G       JU2234:synonymous|WBGene00008352|cTel54X.1.1|protein_coding|-|324G|1331A>G=
III     7934    G       C
III     8031    T       C
III     23328   A       C
III     23344   G       C
III     23345   T       TC
III     23352   TA      T
III     23356   C       CCG
III     23358   AAG     A
III     23363   A       G

The BCSQ_score_parsed.tsv that is also generated from the same VCF has these intronic variant annotations.

CHROM	POS	REF	ALT	ANNOTATION	BLOSUM	Grantham	Percent_Protein
III	1331	A	G	synonymous|WBGene00008352|cTel54X.1.1|protein_coding|-|324G|1331A>G	6	NA	94.46
III	7934	G	C	intron|WBGene00019183||protein_coding	NA	NA	NA
III	8031	T	C	intron|WBGene00019183||protein_coding	NA	NA	NA
III	23328	A	C	intron|WBGene00019185||protein_coding	NA	NA	NA
III	23344	G	C	intron|WBGene00019185||protein_coding	NA	NA	NA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant