Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genbank-d2 database is too big... #646

Closed
ctb opened this issue Feb 23, 2019 · 4 comments
Closed

genbank-d2 database is too big... #646

ctb opened this issue Feb 23, 2019 · 4 comments

Comments

@ctb
Copy link
Contributor

ctb commented Feb 23, 2019

I'm working on revamping the tutorials, and the genbank-d2 database doesn't fit in ~15 GB when unpacked, which makes it impossible to run on a default Jetstream m1.medium machine. Sigh.

What are our options for making a db that's smaller when unpacked, @luizirber?

@ctb
Copy link
Contributor Author

ctb commented Feb 23, 2019

now that I think of it, one simple solution is to use the LCA database, since sourmash gather works on that in the same way and it's much smaller. What do you think of that as a stopgap measure?

@ctb
Copy link
Contributor Author

ctb commented Feb 23, 2019

this is what I did in ff22e2f, part of #631.

It would still be good to get a rundown of what our options are for SBTs!

@luizirber
Copy link
Member

I pulled #648 out of some other indexing refactoring I was doing in #456 . This supports loading data directly from a Zip file. It would work with tar too, but they don't support random access well (see #490 for more info).

@ctb
Copy link
Contributor Author

ctb commented Jun 20, 2020

As of #648 sourmash now supports direct search of SBTs in zip files!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants