Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing tools unable to handle large MAF files #8

Open
blankenberg opened this issue Jul 29, 2016 · 5 comments
Open

Indexing tools unable to handle large MAF files #8

blankenberg opened this issue Jul 29, 2016 · 5 comments

Comments

@blankenberg
Copy link
Contributor

One example is the hg38 projected chr1 multiz100way from UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/maf/chr1.maf.gz (5.9G gz'd, 66G gunzip'd)

maf_build_index.py chr1.maf
Traceback (most recent call last):
  File "/path_to/bin/maf_build_index.py", line 83, in <module>
    if __name__ == "__main__": main()
  File "/path_to/bin/maf_build_index.py", line 80, in main
    indexes.write( out )
  File "/path_to/lib/python2.7/site-packages/bx/interval_index_file.py", line 332, in write
    write_packed( f, ">I", base )
  File "/path_to/lib/python2.7/site-packages/bx/interval_index_file.py", line 463, in write_packed
    f.write( pack( pattern, *vals ) )
struct.error: 'I' format requires 0 <= number <= 4294967295

One possibility is to up the version number and store unsigned integers as unsigned long long >Q, which would max out at 18446744073709551615 vs 4294967295. Would double the packed size though.

Another potential workaround could be to break the MAF up into multiple files, but I haven't tested this.

xref: https://biostar.usegalaxy.org/p/18196/

@jxtx
Copy link
Contributor

jxtx commented Jul 29, 2016

I think pushing the version is the right solution. Somebody just needs to
find time to work on this...

-- jt

On Fri, Jul 29, 2016 at 12:41 PM, Daniel Blankenberg <
[email protected]> wrote:

One example is the hg38 projected chr1 multiz100way from UCSC:
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/maf/chr1.maf.gz
(5.9G gz'd, 66G gunzip'd)

maf_build_index.py chr1.maf
Traceback (most recent call last):
File "/path_to/bin/maf_build_index.py", line 83, in
if name == "main": main()
File "/path_to/bin/maf_build_index.py", line 80, in main
indexes.write( out )
File "/path_to/lib/python2.7/site-packages/bx/interval_index_file.py", line 332, in write
write_packed( f, ">I", base )
File "/path_to/lib/python2.7/site-packages/bx/interval_index_file.py", line 463, in write_packed
f.write( pack( pattern, *vals ) )
struct.error: 'I' format requires 0 <= number <= 4294967295

One possibility is to up the version number and store unsigned integers as
unsigned long long >Q, which would max out at 18446744073709551615 vs
4294967295. Would double the packed size though.

Another potential workaround could be to break the MAF up into multiple
files, but I haven't tested this.

xref: https://biostar.usegalaxy.org/p/18196/


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#8, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAE4ZSWKFE4vERJLatl6XcOwXGIzEhg3ks5qai1QgaJpZM4JYXhp
.

@poojanarang
Copy link

Thanks for looking into this.
This issue have been bothering us for several weeks. Hope to have a solution to this problem soon.

@jodyhey
Copy link

jodyhey commented Mar 22, 2021

Any progress on this?
I just ran into what seems to be the same problem on a large MAF file (48Gb):

(base) /mnt/e/genemod/better_dNdS_models/drosophila/11_6_2020/cactus_work$ python /home/jodyhey/miniconda3/bin/maf_build_index.py drosophila_cactus.maf drosophila_cactus.mafindex

Traceback (most recent call last):
File "/home/jodyhey/miniconda3/bin/maf_build_index.py", line 82, in
main()
File "/home/jodyhey/miniconda3/bin/maf_build_index.py", line 77, in main
indexes.write(out)
File "/home/jodyhey/miniconda3/lib/python3.8/site-packages/bx/interval_index_file.py", line 351, in write
write_packed(f, ">I", base)
File "/home/jodyhey/miniconda3/lib/python3.8/site-packages/bx/interval_index_file.py", line 486, in write_packed
f.write(pack(pattern, *vals))
struct.error: 'I' format requires 0 <= number <= 4294967295

@nsoranzo
Copy link
Collaborator

@jodyhey No one is working on this issue, sorry, but pull requests are welcome!

@pkncsk
Copy link

pkncsk commented Mar 24, 2022

Is there any update on this issue? with how .maf files have gotten bigger lately, this might become a more common issue.

here I ran the script on a 30 Gb .lzo file (85 Gb uncompressed)

python3 maf_index.py

Traceback (most recent call last):
  File "maf_index.py", line 75, in <module>
    main()
  File "maf_index.py", line 70, in main
    indexes.write(out)
  File "/home/pc575/jupyter-env-icelake/lib/python3.7/site-packages/bx/interval_index_file.py", line 351, in write
    write_packed(f, ">I", base)
  File "/home/pc575/jupyter-env-icelake/lib/python3.7/site-packages/bx/interval_index_file.py", line 486, in write_packed
    f.write(pack(pattern, *vals))
struct.error: 'I' format requires 0 <= number <= 4294967295

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants