Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TabixReader bug in query(String) method for coordiantes with 0 values #908

Open
phanikishore2 opened this issue Jun 22, 2017 · 4 comments
Open

Comments

@phanikishore2
Copy link

The following method breaks if coordinate value is 0.

For example chr2:0-1

TabixReader.query("chr2:0-1") returns (tid,-1,1)
TabixReader.query("chr2",0,1) returns (tid,0,1)

Digging down in method parseReg(String) method ret[1] value assignment does subtract 1.

public int[] parseReg(final String reg) { // FIXME: NOT working when the sequence name contains : or -.
String chr;
int colon, hyphen;
int[] ret = new int[3];
colon = reg.indexOf(':');
hyphen = reg.indexOf('-');
chr = colon >= 0 ? reg.substring(0, colon) : reg;
ret[1] = colon >= 0 ? Integer.parseInt(reg.substring(colon + 1, hyphen >= 0 ? hyphen : reg.length())) - 1 : 0;
ret[2] = hyphen >= 0 ? Integer.parseInt(reg.substring(hyphen + 1)) : 0x7fffffff;
ret[0] = this.chr2tid(chr);
return ret;
}

Thanks,
Phani

@yfarjoun
Copy link
Contributor

Genomic locations in the format that you use are interpreted using 1-based coordinate system.

this is not a bug.

@Vernando
Copy link

Hello,
so far I understand it there is a bug in the query(final String) and in the query(String, int, int) methods.
With a 1-indexed file query(final String) fails with an IndexOutOfBoundsException with "chr2:0-0" which is in my eyes a valid query string with indexes outside of the allowed range and with that the found result should be just an empty iterator (or maybe null, but not a crashing function). The commandline tool handles this correctly.

So with that experience changing to the query(final String, int, int) method fails again, but differently.
A valid search like ("chr3",178936275,178936275) delivers ZERO results instead of the actual findings in the used tabix files (which should be e.g. 1 in our case).

The inner most used query(final int tid, final int beg, final int end) which is the same in both cases seems to need a range information versus a position information for beginPos and endPosition.
I switched to use only the query(final String) as entry point and extended the out of bound index checks there. I think that should fix the problem in (only) this method.

    public Iterator query(final String reg) {
        int[] x = parseReg(reg);
        if(x[0]<0
        // (jh, 6/23/17) I found bugs in this method and query(String, int, int) for two different scenarios. This
        // method breaks if it is used with a 1-indexed tabix file and queried with e.g a "chr2:0-0" string which is a
        // valid string below the lowest allowed index of 1 (and should return zero found results, but not crash). In
        // this case it throws an IndexOutOfBounds exception. query(String, int, int) can handle the call with
        // "chr2:0-0" BUT FAILS delivering correct results for example "chr3:178936275-178936275". This call should
        // return one found result in our case, but actually returns zero! My fix is to use only this method and not
        // query(String, int, int) and extend the out of bounds checks here which were missing in the original code.
           || x[1]<0
           || x[2]<0
          ){
          return EOF_ITERATOR;
        }
        return query(x[0], x[1], x[2]);
    }

Both mentioned cases which end in a crash or wrong finding are handled correctly by the command tool.

@yfarjoun
Copy link
Contributor

yfarjoun commented Jun 24, 2017

@Vernando Thanks for the clear explanation. I think that we can now solve this problem.

@ronlevine
Copy link
Contributor

ronlevine commented Jun 24, 2017

For @phanikishore2's case, the code works correctly and an EOF_ITERATOR is returned.

The "fix" is to make the code more robust. The changes should go in public Iterator query(final int tid, final int beg, final int end) since it's used by public Iterator query(final String reg) and can be used anywhere, since it's public . The following change should fix this issue:

if(tid< 0 || tid>=this.mIndex.length) return EOF_ITERATOR;

to

if(tid< 0 || beg < 0 || end < 0 || tid>=this.mIndex.length) return EOF_ITERATOR;

@ronlevine ronlevine self-assigned this Jun 24, 2017
@ronlevine ronlevine removed their assignment Jul 29, 2017
yfarjoun pushed a commit that referenced this issue Feb 5, 2019
Follow up to #908.
Move the bad coordinates check to the root method, `public Iterator query(final String reg)`. If this public method is used by an application, the bad coordinates will return an `EOF_ITERATOR;`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants