Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix out-of-bounds error #97

Merged
merged 1 commit into from
Apr 9, 2019
Merged

Fix out-of-bounds error #97

merged 1 commit into from
Apr 9, 2019

Conversation

standage
Copy link
Contributor

@standage standage commented Apr 8, 2019

This pull request provides a simple solution to the problem described in #96. If the computed index is greater than the length of the sequence, append A. Otherwise, append the requested nucleotide.

I'm assuming this problem doesn't occur much with whole genome sequences, since it's very rare for a read both to have an indel and line up right at the end of the chromosome. However, when sequencing short amplicons in draft mode this case occurred with high frequency.

There are probably more elegant solutions to this problem. One could:

  • draw a random base instead of A
  • draw a base from some distribution of nucleotide composition
  • append Illumina adapters or other technical sequences to the end of the genomic sequence

But for now, this is a simple solution that allows InSilicoSeq to run without barfing on amplicon sequences.

@codecov
Copy link

codecov bot commented Apr 8, 2019

Codecov Report

Merging #97 into master will decrease coverage by 0.36%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #97      +/-   ##
==========================================
- Coverage   91.54%   91.18%   -0.37%     
==========================================
  Files          12       12              
  Lines         769      771       +2     
==========================================
- Hits          704      703       -1     
- Misses         65       68       +3
Impacted Files Coverage Δ
iss/error_models/__init__.py 77.9% <0%> (-1.86%) ⬇️
iss/util.py 97.29% <0%> (-0.91%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd73724...e8e7681. Read the comment docs.

@HadrienG
Copy link
Owner

HadrienG commented Apr 9, 2019

Good catch, thanks for the PR.

This hasn't happened yet with whole genome sequences.
I like the adapter solution in theory, but in practise we might never encounter two deletions in a row at the end of a sequence, so a A can do.

It'd be a good idea to implement adapter sequences in general. By padding the reads with adapter sequences this bug should disappear and we can get rid of your "A" then 😄

@HadrienG HadrienG changed the base branch from master to dev April 9, 2019 05:18
@HadrienG
Copy link
Owner

HadrienG commented Apr 9, 2019

I'm merging this and will publish a bugfix release soon. Thanks again!

@HadrienG HadrienG merged commit e8c55c8 into HadrienG:dev Apr 9, 2019
@standage standage deleted the outofrange branch April 9, 2019 15:11
@standage standage mentioned this pull request Jun 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants