Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index Error for variant using am37.c_to_g() #606

Open
sptaylor opened this issue Jul 30, 2020 · 7 comments
Open

Index Error for variant using am37.c_to_g() #606

sptaylor opened this issue Jul 30, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@sptaylor
Copy link

We encountered a bug when running c_to_g via assembly mapper for variant 'NM_001291722.1:c.283-3C>T':

$ hgvs-shell
Python 3.6.9 (default, Jan  8 2020, 08:00:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.


############################################################################
hgvs-shell -- interactive hgvs
hgvs version: 1.5.1
data provider url: postgresql://anonymous:[email protected]/uta/uta_20180821
schema_version: 1.1
data_version: uta_20180821
sequences source: SeqRepo (/Users/paige.taylor/Data/seqrepo/2018-08-21)

The following variables are defined:
* global_config
* hp, parser, hgvs_parser -- Parser instance
* hdp, hgvs_data_provider -- UTA data provider instance
* vm, variant_mapper, hgvs_variant_mapper -- VariantMapper instance
* am37, hgvs_assembly_mapper_37 -- GRCh37 Assembly Mapper instance
* am38, projector, hgvs_assembly_mapper_38 -- GRCh38 Assembly Mapper instances
* hn, normalizer, hgvs_normalizer -- Normalizer instance
* hv, validator, hgvs_validator) -- Validator instance

The following functions are available:
  * parse, normalize, validate
  * g_to_c, g_to_n, g_to_t,
  * c_to_g, c_to_n, c_to_p,
  * n_to_c, n_to_g,
  * t_to_g,
  * get_relevant_transcripts

When submitting bug reports, include the version header shown above
and use these variables/variable names whenever possible.



In [1]: var = hp.parse_hgvs_variant('NM_001291722.1:c.283-3C>T')

In [2]: am37.c_to_g(var)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/virtualenv/hgvs_v15/lib/python3.6/site-packages/hgvs/shell.py in <module>
----> 1 am37.c_to_g(var)

~/virtualenv/hgvs_v15/lib/python3.6/site-packages/hgvs/assemblymapper.py in c_to_g(self, var_c)
    114         alt_ac = self._alt_ac_for_tx_ac(var_c.ac)
    115         var_out = super(AssemblyMapper, self).c_to_g(
--> 116             var_c, alt_ac, alt_aln_method=self.alt_aln_method)
    117         return self._maybe_normalize(var_out)
    118

~/virtualenv/hgvs_v15/lib/python3.6/site-packages/hgvs/variantmapper.py in c_to_g(self, var_c, alt_ac, alt_aln_method)
    301             pos_n = mapper.g_to_n(pos_g)
    302             edit_g = hgvs.edit.NARefAlt(
--> 303                 ref='', alt=self._get_altered_sequence(mapper.strand, pos_n, var_n))
    304         pos_g.uncertain = var_c.posedit.pos.uncertain
    305         var_g = hgvs.sequencevariant.SequenceVariant(

~/virtualenv/hgvs_v15/lib/python3.6/site-packages/hgvs/variantmapper.py in _get_altered_sequence(self, strand, interval, var)
    524
    525         if edit.type == 'sub':
--> 526             seq[pos_start] = edit.alt
    527         elif edit.type == 'del':
    528             del seq[pos_start:pos_end]

IndexError: list assignment index out of range
@reece
Copy link
Member

reece commented Jul 31, 2020

@sptaylor: I can confirm this bug.

I accidentally discovered that this bug doesn't happen if I use a uta_20150827 (discovered through a sloppy copy-paste from old notes). That discovery only increases my curiosity.

I don't have funding, or time to donate, to work on this right now.

@cassiemk
Copy link

cassiemk commented Dec 2, 2020

@reece we are experiencing the same error in a different context.
when using the Assembly Mapper's g_to_c function with strict_bounds=True, this variant NC_000001.10:g.16890441C>G throws a HGVSInvalidIntervalError due to gaps in the sequence. when using the same function with strict_bounds=False we get

'Traceback (most recent call last):
{some lines ommitted for brevity}
File "/usr/local/lib/python3.6/site-packages/hgvs/assemblymapper.py", line 100, in g_to_c
var_g, tx_ac, alt_aln_method=self.alt_aln_method)
File "/usr/local/lib/python3.6/site-packages/hgvs/variantmapper.py", line 259, in g_to_c
ref='', alt=self._get_altered_sequence(mapper.strand, pos_g, var_g))
File "/usr/local/lib/python3.6/site-packages/hgvs/variantmapper.py", line 526, in _get_altered_sequence
seq[pos_start] = edit.alt
IndexError: list assignment index out of range'

We propose a try/except in the else statement code block at line 225 of variantmapper.py (also below, hgvs1.5.1) to raise the HGVSInvalidInteralError. Does this make sense or is there a deeper issue?

'if not pos_c.uncertain:
edit_c = self._convert_edit_check_strand(mapper.strand, var_g.posedit.edit)
if edit_c.type == 'ins' and pos_c.start.offset == 0 and pos_c.end.offset == 0 and pos_c.end - pos_c.start > 1:
pos_c.start.base += 1
pos_c.end.base -= 1
edit_c.ref = ''
else:
# variant at alignment gap
pos_g = mapper.c_to_g(pos_c)
edit_c = hgvs.edit.NARefAlt(
ref='', alt=self._get_altered_sequence(mapper.strand, pos_g, var_g))'

kyuhas pushed a commit to kyuhas/hgvs that referenced this issue Dec 23, 2020
kyuhas pushed a commit to kyuhas/hgvs that referenced this issue Dec 23, 2020
kyuhas pushed a commit to kyuhas/hgvs that referenced this issue Dec 23, 2020
kyuhas pushed a commit to kyuhas/hgvs that referenced this issue Dec 23, 2020
@davmlaw
Copy link
Contributor

davmlaw commented Sep 21, 2023

Validation problem?

Other tools seem to think this is an error, so perhaps it's a validation problem:

ClinGen allele registry: dies with "intronic position inside exon"
Variant Validator: ExonBoundaryError: Position c.283-3 does not correspond with an exon boundary for transcript NM_001291722.1

Exon positions in UTA

Where is "c.283" ?

In [46]: var_c = hp.parse_hgvs_variant('NM_001291722.1:c.283C>T') # remove the 3
In [47]: am37.c_to_n(var_c)
Out[47]: SequenceVariant(ac=NM_001291722.1, type=n, posedit=422C>T, gene=None)

Double check - NM_001291722.1 CDS start = 139
139 + 283 = transcript pos 422 so c_to_n is correct here

GRCh37 Tx exons around 422 are:

['CYFIP2', 'NM_001291722.1', 'NC_000005.9', 'splign', 1, 3, 346, 424, 156721791, 156721866, '75=3D', None, None, 738621, 366974, 6225535, 3655233, 3833217]
['CYFIP2', 'NM_001291722.1', 'NC_000005.9', 'splign', 1, 4, 424, 526, 156723677, 156723782, '3I102=', None, None, 738621, 366974, 6225536, 3655234, 3833382]

It looks like perhaps UTA is calling the exon wrong here? Why have a deletion of 3 bases at exon/intron boundary rather than shift the exon boundary?


Other investigation

I started to look into this and noticed something...

var_c = hp.parse_hgvs_variant('NM_001291722.1:c.283-3C>T')
mapper = am37._fetch_AlignmentMapper("NM_001291722.1", "NC_000005.9", "splign")

pos_c = var_c.posedit.pos
pos_c_to_n = mapper.c_to_n(pos_c)
pos_g = mapper.c_to_g(pos_c)
pos_c_to_g_to_n = mapper.g_to_n(pos_g)

print(pos_c_to_n)
print(pos_c_to_g_to_n)

These should be the same? Output:

422-3
418_419

Not sure if related, but I've found a few other examples of where conversion seems inconsistent (if you convert from c to g then back to c it's changed)

# These examples are from ./tests/data/clinvar.gz so presumably real HGVS from the wild
# Yes, they are probably wrong, eg "NM_000180.3:c.3233-3236dup" is probably a typo, dash is supposed to be underscore
example_hgvs = [
    'NM_001145026.1:c.715A>G',
    'NM_004543.4:c.7432-2025_7536+372del2502',
    'NM_000180.3:c.3233-3236dup',
    'NM_032383.3:c.1303+1G>A',
    'NM_001011.3:c.148+1G>A',
    'NM_000158.3:c.2052+5289_2052+5297delGTGTGGTGGinsTGTTTTTTACATGACAGGT',
    'NM_003611.2:c.2122-2125dupAAGA',
    'NM_001017975.4:c.1687-1G>C',
    'NM_006904.6:c.1776+1523dupA',
    'NM_001122679.1:c.2968-2A>T'
]

for hgvs_str in example_hgvs:
    var_c = hp.parse_hgvs_variant(hgvs_str)
    var_g = am37.c_to_g(var_c)
    var_c2 = am37.g_to_c(var_g, var_c.ac)  # This should in theory be the same as we started
    print(var_c)
    print(var_c2)
    print("-" * 20)

Between the dashed lines, the two should be consistent

NM_001145026.1:c.715A>G
NM_001145026.1:c.661-2_661-1insGAGAATAGTGAATCTTTTTTATGGAGTACAGCCAGCCCTTCTCCAACCCTTGGTGGAGTTACACCTCCATCGCGTACCACACATTCATCAAGCACGTTGACACAGAATGAGATCAGCTCTGTGTGGAAAGAGCCTATCAGTTTTGTAGTGACACACTTGAG
--------------------
NM_004543.4:c.7432-2025_7536+372del
NM_004543.4:c.7431+1917_7536+372del
--------------------
NM_000180.3:c.3233-3236dup
NM_000180.3:c.2226dup
--------------------
NM_032383.3:c.1303+1G>A
NM_032383.3:c.1304T>A
--------------------
NM_001011.3:c.148+1G>A
NM_001011.3:c.149=
--------------------
NM_000158.3:c.2052+5289_2052+5297delinsTGTTTTTTACATGACAGGT
NM_000158.3:c.2053-3358_2053-3350delinsTGTTTTTTACATGACAGGT
--------------------
NM_003611.2:c.2122-2125dup
NM_003611.2:c.1654+9dup
--------------------
NM_001017975.4:c.1687-1G>C
NM_001017975.4:c.1686G>C
--------------------
NM_006904.6:c.1776+1523dup
NM_006904.6:c.1777-723dup
--------------------
NM_001122679.1:c.2968-2A>T
NM_001122679.1:c.2966G>T

Some of these are def invalid, though in that case, they should throw an error?

Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Dec 21, 2023
Copy link

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2023
@reece reece reopened this Feb 19, 2024
@reece reece added resurrected and removed stale Issue is stale and subject to automatic closing labels Feb 19, 2024
@reece
Copy link
Member

reece commented Feb 19, 2024

This issue was closed by stalebot. It has been reopened to give more time for community review. See biocommons coding guidelines for stale issue and pull request policies. This resurrection is expected to be a one-time event.

Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label May 21, 2024
@jsstevenson jsstevenson added bug Something isn't working and removed stale Issue is stale and subject to automatic closing resurrected labels May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants