-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--work-on-disk skips steps #97
Comments
As a test of reproducibility, I killed the
...however,
|
I get the same error with
|
Please check --work-on-disk option in the latest release v0.7.3, it should work properly now. |
With v0.7.3, I'm still getting the error described at #52. My build directory includes:
|
What is your command? I would like to reproduce the error.
…On Thu, Jun 23, 2022 at 8:48 AM Nick Youngblut ***@***.***> wrote:
I get the following error when using --work-on-disk with v0.7.3:
Kraken build set to minimize RAM usage.
Found 500 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Skipping step 1, k-mer set already exists.
Skipping step 2, no database reduction requested.
Skipping step 3, k-mer set already sorted.
Skipping step 4, seqID to taxID map already complete.
Skipping step 5, taxDB exists.
Building KrakenUniq LCA database (step 6 of 6)...
Reading taxonomy index from taxDB. Done.
Loaded database with 1623560677 keys with k of 31 [val_len 4, key_len 8].
set_lcas: unable to open database.idx: No such file or directory
xargs: cat: terminated by signal 13
Not such error occurs if I don't use --work-on-disk:
Kraken build set to minimize disk writes.
Found 500 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Skipping step 1, k-mer set already exists.
Skipping step 2, no database reduction requested.
Skipping step 3, k-mer set already sorted.
Skipping step 4, seqID to taxID map already complete.
Skipping step 5, taxDB exists.
Building KrakenUniq LCA database (step 6 of 6)...
Reading taxonomy index from taxDB. Done.
Getting database0.kdb into memory (18.145 GB) ...
—
Reply to this email directly, view it on GitHub
<#97 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGPXGHNHEPINEJMJAC4EJKTVQRMI5ANCNFSM5YFNBP6Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Dr. Alexey V. Zimin
Associate Research Scientist
Department of Biomedical Engineering,
Johns Hopkins University,
Baltimore, MD, USA
(301)-437-6260
website http://ccb.jhu.edu/people/alekseyz/
blog http://masurca.blogspot.com
|
A simple Using |
The command I have been using to test was:
krakenuniq-build --db . --threads 32 --work-on-disk
I have library and taxonomy folders in the current dir. I will test with
library and taxonomy in another folder
…On Thu, Jun 23, 2022 at 8:55 AM Nick Youngblut ***@***.***> wrote:
A simple ./krakenuniq-build --kmer-len 31 --build --threads 12 --db $DB,
with $DB denoting the database base directory path.
Using --rebuild does not help (just checked again)
—
Reply to this email directly, view it on GitHub
<#97 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGPXGHL3RGJOLVQKL4RDQULVQRNE5ANCNFSM5YFNBP6Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Dr. Alexey V. Zimin
Associate Research Scientist
Department of Biomedical Engineering,
Johns Hopkins University,
Baltimore, MD, USA
(301)-437-6260
website http://ccb.jhu.edu/people/alekseyz/
blog http://masurca.blogspot.com
|
I tried Maybe it's due to how I'm adding genomes to the library? My simple helper script for that:
|
It is possible. The command worked fine for me just now, see below.
***@***.*** test_krakenuniq]$ krakenuniq-build --db DBDIR --threads 32
--work-on-disk
Kraken build set to minimize RAM usage.
Finding all library files
Found 1 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using /ccb/sw/bin/jellyfish-install/bin/jellyfish
Hash size not specified, using '2575692630'
K-mer set created. [13m43.538s]
Skipping step 2, no database reduction requested.
Sorting k-mer set (step 3 of 6)...
db_sort: Getting database into memory ...Loaded database with 2505641687
keys with k of 31 [val_len 4, key_len 8].
Loaded database with 2505641687 keys with k of 31 [val_len 4, key_len 8].
db_sort: Sorting ...db_sort: Sorting complete - writing database to disk ...
K-mer set sorted. [48m52.013s]
Creating seqID to taxID map (step 4 of 6)..
705 sequences mapped to taxa. [0.059s]
Creating taxDB (step 5 of 6)...
Building taxonomy index from taxonomy//nodes.dmp and taxonomy//names.dmp.
Done, got 2426193 taxa
taxDB construction finished. [1m4.789s]
Building KrakenUniq LCA database (step 6 of 6)...
Reading taxonomy index from taxDB. Done.
Loaded database with 2505641687 keys with k of 31 [val_len 4, key_len 8].
Reading sequence ID to taxonomy ID mapping ... got 705 mappings.
Finished processing 705 sequences (skipping 0 empty sequences, and 0
sequences with no taxonomy mapping)
Writing kmer counts to database.kdb.counts...
LCA database created. [28m27.253s]
Creating database summary report database.report.tsv ...
/ccb/sw/bin/classify -d ././database.kdb -i ././database.idx -t 32 -r
database.report.tsv -a ././taxDB -p 12
Database ././database.kdb
Loaded database with 2505641687 keys with k of 31 [val_len 4, key_len 8].
Reading taxonomy index from ././taxDB. Done.
705 sequences (3298.43 Mbp) processed in 153.354s (0.3 Kseq/m, 1290.51
Mbp/m).
705 sequences classified (100.00%)
0 sequences unclassified (0.00%)
Writing report file to database.report.tsv ..
Reading genome sizes from ././database.kdb.counts ... done
Setting values in the taxonomy tree ... done
Printing classification report ... done
Report finished in 0.006 seconds.
Finishing up ...Database construction complete. [Total: 1h36m33.683s]
You can delete all files but database.{kdb,idx} and taxDB now, if you want
Here are the contents of DBDIR:
***@***.*** test_krakenuniq]$ ls DBDIR/*
DBDIR/database0.kdb DBDIR/database.idx DBDIR/database.kdb
DBDIR/database.kraken.tsv DBDIR/library-files.txt DBDIR/taxDB
DBDIR/database-build.log DBDIR/database.jdb DBDIR/database.kdb.counts
DBDIR/database.report.tsv DBDIR/seqid2taxid.map
DBDIR/library:
vertebrate_mammalian
DBDIR/taxonomy:
citations.dmp database-build.log delnodes.dmp division.dmp gc.prt
gencode.dmp merged.dmp names.dmp nodes.dmp readme.txt taxdump.tar.gz
…On Thu, Jun 23, 2022 at 9:16 AM Nick Youngblut ***@***.***> wrote:
I tried krakenuniq-build --db . --threads 32 --work-on-disk in the
appropriate directory, but I still got the same error.
Maybe it's due to how I'm adding genomes to the library? My simple helper
script for that:
#!/usr/bin/env python
from __future__ import print_function
import os
import sys
import re
import argparse
import logging
# logging
logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.DEBUG)
# argparse
class CustomFormatter(argparse.ArgumentDefaultsHelpFormatter,
argparse.RawDescriptionHelpFormatter):
pass
desc = 'Adding genome to krakenuniq database'
epi = """DESCRIPTION:
Write output files to db_dir:
* renamed genome fasta (all special characters removed from names)
* krakenuniq map file
"""
parser = argparse.ArgumentParser(description=desc, epilog=epi,
formatter_class=CustomFormatter)
parser.add_argument('fasta_file', type=str,
help='Input genome fasta file')
parser.add_argument('taxid', type=str,
help='Taxonomy ID for the genome')
parser.add_argument('sample', type=str,
help='Genome name')
parser.add_argument('db_dir', type=str,
help='Output database location (e.g., ku_db/library/)')
parser.add_argument('--version', action='version', version='0.0.1')
def copy_genome(infile, outdir, sample):
outfile = os.path.join(outdir, sample + '.fna')
regex = re.compile(r'[^>A-Za-z0-9-\n]')
gz = infile.endswith('.gz')
contigs = list()
with _open(infile) as inF, open(outfile, 'w') as outF:
for line in inF:
if gz:
line = line.decode('utf-8')
# seq header
if line.startswith('>'):
line = regex.sub('_', line)
contigs.append(line.lstrip('>').rstrip())
# writing to output directory
outF.write(line)
logging.info(f'File written: {outfile}')
# return
return contigs
def write_map(contigs, outdir, sample, taxid):
outfile = os.path.join(outdir, sample + '.map')
with open(outfile, 'w') as outF:
for contig in contigs:
outF.write('\t'.join([contig, taxid, sample]) + '\n')
logging.info(f'File written: {outfile}')
## main interface function
def main(args):
if not os.path.isdir(args.db_dir):
os.makedirs(args.db_dir)
contigs = copy_genome(args.fasta_file, args.db_dir, args.sample)
write_map(contigs, args.db_dir, args.sample, args.taxid)
## script main
if __name__ == '__main__':
args = parser.parse_args()
main(args)
—
Reply to this email directly, view it on GitHub
<#97 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGPXGHLAYYFAF6PS2C7TTYLVQRPUJANCNFSM5YFNBP6Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Dr. Alexey V. Zimin
Associate Research Scientist
Department of Biomedical Engineering,
Johns Hopkins University,
Baltimore, MD, USA
(301)-437-6260
website http://ccb.jhu.edu/people/alekseyz/
blog http://masurca.blogspot.com
|
I tried creating a new krakenuniq library, and now I'm getting the following:
I installed krakenuniq v0.7.3 via:
...since that version isn't on bioconda yet |
Did jellyfish compile and install properly? Can you check if
/tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/
krakenuniq/jellyfish-install/bin/jellyfish works? If you have jellyfish1
installed elsewhere, you can specify its path with the appropriate option
to build.
…On Thu, Jun 23, 2022 at 11:07 AM Nick Youngblut ***@***.***> wrote:
I tried creating a new krakenuniq library, and now I'm getting the
following:
krakenuniq-build --kmer-len 31 --build --threads 12 --db $DB
Kraken build set to minimize disk writes.
Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using /tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/krakenuniq/jellyfish-install/bin/jellyfish
Hash size not specified, using '32573424'
/tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/krakenuniq/jellyfish-install/bin/jellyfish: error while loading shared libraries: libjellyfish-1.1.so.1: cannot open shared object file: No such file or directory
I installed krakenuniq v0.7.3 via:
git clone https://github.com/fbreitwieser/krakenuniq
cd krakenuniq
./install_krakenuniq /PATH/TO/INSTALL_DIR
...since that version isn't on bioconda yet
—
Reply to this email directly, view it on GitHub
<#97 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGPXGHIE5LLWCQDRUG7RQB3VQR4THANCNFSM5YFNBP6Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Dr. Alexey V. Zimin
Associate Research Scientist
Department of Biomedical Engineering,
Johns Hopkins University,
Baltimore, MD, USA
(301)-437-6260
website http://ccb.jhu.edu/people/alekseyz/
blog http://masurca.blogspot.com
|
There may be a problem with your environment. Simple:
export LD_LIBRARY_PATH=tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/
krakenuniq/jellyfish-install/lib/
should fix it, but in general it should not be necessary.
On Thu, Jun 23, 2022 at 11:17 AM Aleksey Zimin ***@***.***>
wrote:
… Did jellyfish compile and install properly? Can you check if
/tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/
krakenuniq/jellyfish-install/bin/jellyfish works? If you have jellyfish1
installed elsewhere, you can specify its path with the appropriate option
to build.
On Thu, Jun 23, 2022 at 11:07 AM Nick Youngblut ***@***.***>
wrote:
> I tried creating a new krakenuniq library, and now I'm getting the
> following:
>
> krakenuniq-build --kmer-len 31 --build --threads 12 --db $DB
> Kraken build set to minimize disk writes.
> Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
> Creating k-mer set (step 1 of 6)...
> Using /tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/krakenuniq/jellyfish-install/bin/jellyfish
> Hash size not specified, using '32573424'
> /tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/krakenuniq/jellyfish-install/bin/jellyfish: error while loading shared libraries: libjellyfish-1.1.so.1: cannot open shared object file: No such file or directory
>
> I installed krakenuniq v0.7.3 via:
>
> git clone https://github.com/fbreitwieser/krakenuniq
> cd krakenuniq
> ./install_krakenuniq /PATH/TO/INSTALL_DIR
>
> ...since that version isn't on bioconda yet
>
> —
> Reply to this email directly, view it on GitHub
> <#97 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AGPXGHIE5LLWCQDRUG7RQB3VQR4THANCNFSM5YFNBP6Q>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
--
Dr. Alexey V. Zimin
Associate Research Scientist
Department of Biomedical Engineering,
Johns Hopkins University,
Baltimore, MD, USA
(301)-437-6260
website http://ccb.jhu.edu/people/alekseyz/
blog http://masurca.blogspot.com
--
Dr. Alexey V. Zimin
Associate Research Scientist
Department of Biomedical Engineering,
Johns Hopkins University,
Baltimore, MD, USA
(301)-437-6260
website http://ccb.jhu.edu/people/alekseyz/
blog http://masurca.blogspot.com
|
Yeah, the path was just messed up. The run worked:
...but I the |
Thank you for reporting this bug -- it must have been there for a while. I
fixed it, please go to your krakenuniq folder and git pull and reinstall.
…On Thu, Jun 23, 2022 at 11:37 AM Nick Youngblut ***@***.***> wrote:
Yeah, the path was just messed up.
The run worked:
Kraken build set to minimize RAM usage.
Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Skipping step 1, k-mer set already exists.
Skipping step 2, no database reduction requested.
Skipping step 3, k-mer set already sorted.
Skipping step 4, seqID to taxID map already complete.
Skipping step 5, taxDB exists.
Skipping step 6, LCAs already set.
Database construction complete. [Total: 0.014s]
You can delete all files but database.{kdb,idx} and taxDB now, if you want
...but I the set_lcas: unable to open database.idx: No such file or
directory is generated if you try to re-build the database after building
(or attempting to build) the database once
—
Reply to this email directly, view it on GitHub
<#97 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGPXGHIUBNXVXB7UPF6RQ3DVQSAB5ANCNFSM5YFNBP6Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Dr. Alexey V. Zimin
Associate Research Scientist
Department of Biomedical Engineering,
Johns Hopkins University,
Baltimore, MD, USA
(301)-437-6260
website http://ccb.jhu.edu/people/alekseyz/
blog http://masurca.blogspot.com
|
Yep, that fixed the issue. Thanks @alekseyzimin for all of your help! |
krakenuniq-build
died due to an out-of-memory error:I then tried running
krakenuniq-build --work-on-disk
, and the job took ~5 seconds:...however, the job never generated the
database.kdb
output file. If I instead don't use--work-on-disk
,krakenuniq-build
seems to actually work on producing thedatabase.kdb
output:I'm using
krakenuniq=0.6
due to #95The text was updated successfully, but these errors were encountered: