Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils.load_features cannot parse Nextclade gene map GFFs #1007

Closed
huddlej opened this issue Jul 20, 2022 · 1 comment
Closed

utils.load_features cannot parse Nextclade gene map GFFs #1007

huddlej opened this issue Jul 20, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@huddlej
Copy link
Contributor

huddlej commented Jul 20, 2022

Current Behavior

Nextclade provides gene map GFFs for its many datasets. These gene maps use the gene_name qualifier key to define the name of each gene (e.g., the SARS-CoV-2 gene map). Augur's utils.load_features function currently only checks for gene names stored with qualifier keys of gene and locus_tag. When it tries to parse a Nextclade GFF, the load_features function fails to find the gene qualifier, defaults to locus_tag, and then crashes with a key error when that qualifier does not exist.

Expected behavior

We should be able to load features from GFF3 files with gene names referenced by alternate qualifier keys.

How to reproduce

Steps to reproduce the current behavior:

  1. Download the latest SARS-CoV-2 gene map.
curl -O https://raw.githubusercontent.com/nextstrain/nextclade_data/master/data/datasets/sars-cov-2/references/MN908947/versions/2022-07-12T12%3A00%3A00Z/files/genemap.gff
  1. Import the load_features function from a Python terminal and call it with the gene map GFF.
>>> from augur.utils import load_features
>>> load_features("genemap.gff")
Traceback (most recent call last):
  Input In [4] in <cell line: 1>
    load_features("genemap.gff")
  File ~/miniconda3/envs/nextstrain/lib/python3.9/site-packages/augur/utils.py:166 in load_features
    fname = feat.qualifiers["locus_tag"][0]
KeyError: 'locus_tag'

Possible solution

  1. Add a unit test for utils.load_features that attempts to load features from a Nextclade GFF.
  2. Add logic to the load_features function to check for gene_name among the other default qualifiers.

Additional context

Related to #953. Although we plan to reimplement load_features with a more modern GFF parser, the solution proposed here will fix this minor but blocking bug in the short term.

@huddlej huddlej added the bug Something isn't working label Jul 20, 2022
@huddlej huddlej self-assigned this Jul 20, 2022
@huddlej huddlej moved this from New to Prioritized in Nextstrain planning (archived) Jul 20, 2022
@huddlej
Copy link
Contributor Author

huddlej commented May 17, 2023

This was resolved by #1017.

@huddlej huddlej closed this as completed May 17, 2023
@github-project-automation github-project-automation bot moved this from Prioritized to Done in Nextstrain planning (archived) May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

No branches or pull requests

1 participant