You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used the seqtk/1.3 version and I use subseq <in.fq> <name.lst> to extract reads from fq file and it failed.
The in.fq is a simulated file so the query name is with an ascending ID number:
I use awk to confirm that the query names specified by name.lst file are existed in fq file. Then I tried to extract the first several reads, using a file stored the names of sim_sample_1_chr7-chr7-1, sim_sample_1_chr7-chr7-3, sim_sample_1_chr7-chr7-5(one name per line). And it worked. But if I chose a query name ranked far behind in the fq file, the extraction carried by seqtk subseq failed!
Upon my test, If I try to fetch read before query name sim_sample_1_chr7-chr7-343063, it all works well. Any query name comes behind this failed to be extracted.
Here I show u an example, First a screenshot of a test name.lst: (I assure u every query name in this list exist in the in.fq file, confirmed by awk)
Then a screenshot of the extracted sequences by commanding seqtk subseq in.fq name.lst | less -S -
I was so confused why this happened?! Does seqtk only read a part of fq file into memory for inspection? Pls help take a look at this issue at ur convenience. Much appreciated.
The text was updated successfully, but these errors were encountered:
I just used seqtk seq to view the same fastq file and found that it ends at the query name sim_sample_1_chr7-chr7-343063. Why there is a line limit here for seqtk to inspect on data, I don't see any introduction on the manual about this limit and any argument I can use to remove this restriction.
I used the seqtk/1.3 version and I use subseq <in.fq> <name.lst> to extract reads from fq file and it failed.
The in.fq is a simulated file so the query name is with an ascending ID number:
![image](https://user-images.githubusercontent.com/40780228/109685658-6dbb6d80-7bbc-11eb-92da-2a9a20b17ad3.png)
I use awk to confirm that the query names specified by name.lst file are existed in fq file. Then I tried to extract the first several reads, using a file stored the names of sim_sample_1_chr7-chr7-1, sim_sample_1_chr7-chr7-3, sim_sample_1_chr7-chr7-5(one name per line). And it worked. But if I chose a query name ranked far behind in the fq file, the extraction carried by seqtk subseq failed!
Upon my test, If I try to fetch read before query name sim_sample_1_chr7-chr7-343063, it all works well. Any query name comes behind this failed to be extracted.
Here I show u an example, First a screenshot of a test name.lst:
![image](https://user-images.githubusercontent.com/40780228/109686416-36998c00-7bbd-11eb-94bf-32def1aa4b8d.png)
(I assure u every query name in this list exist in the in.fq file, confirmed by awk)
Then a screenshot of the extracted sequences by commanding
![image](https://user-images.githubusercontent.com/40780228/109686580-5fba1c80-7bbd-11eb-9a80-0a9904f37aba.png)
seqtk subseq in.fq name.lst | less -S -
I was so confused why this happened?! Does seqtk only read a part of fq file into memory for inspection? Pls help take a look at this issue at ur convenience. Much appreciated.
The text was updated successfully, but these errors were encountered: