Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

canu/2.0 failed to find the number of jobs in 'correction/0-mercounts/meryl-count.sh' #1740

Closed
einzigsue opened this issue Jun 17, 2020 · 14 comments

Comments

@einzigsue
Copy link

I installed canu/2.0 under CentOS 8 on our cluster and we tested the installation with two cases and both of them failed with the following error in the file canu.out.

Loading python3/3.7.4
  Loading requirement: intel-mkl/2019.3.199

Found perl:
   /bin/perl
   This is perl 5, version 26, subversion 3 (v5.26.3) built for x86_64-linux-thread-multi

Found java:
   /bin/java
   openjdk version "1.8.0_242"

Found canu:
   /apps/canu/2.0/bin/canu
   Canu 2.0

-- Canu 2.0
--
-- Detected Java(TM) Runtime Environment '1.8.0_242' (from 'java') with -d64 support.
-- Detected gnuplot version '5.2 patchlevel 4   ' (from 'gnuplot') and image format 'png'.
-- Detected 96 CPUs and 189 gigabytes of memory.
-- Detecting PBSPro resources.
--
-- Found 2875 hosts with  48 cores and  192 GB memory under PBSPro control.
-- Found 153 hosts with  48 cores and  203 GB memory under PBSPro control.
-- Found  49 hosts with  48 cores and 1536 GB memory under PBSPro control.
-- Found 160 hosts with  48 cores and  383 GB memory under PBSPro control.
-- Found   1 host  with  48 cores and 1503 GB memory under PBSPro control.
--
--                         (tag)Threads
--                (tag)Memory         |
--        (tag)             |         |  algorithm
--        -------  ----------  --------  -----------------------------
-- Grid:  meryl     12.000 GB    4 CPUs  (k-mer counting)
-- Grid:  hap        8.000 GB    4 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap    6.000 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  obtmhap    6.000 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  utgmhap    6.000 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  cor        8.000 GB    4 CPUs  (read correction)
-- Grid:  ovb        4.000 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8.000 GB    1 CPU   (overlap store sorting)
-- Grid:  red       16.000 GB    4 CPUs  (read error detection)
-- Grid:  oea        8.000 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       16.000 GB    4 CPUs  (contig construction with bogart)
-- Grid:  cns        -.--- GB    4 CPUs  (consensus)
-- Grid:  gfa       16.000 GB    4 CPUs  (GFA alignment and processing)
...
----------------------------------------
-- Starting command on Tue Jun  9 09:44:08 2020 with 4913847.217 GB free disk space

    cd correction/0-mercounts
    ./meryl-configure.sh \
    > ./meryl-configure.err 2>&1

-- Finished on Tue Jun  9 09:44:09 2020 (one second) with 4913847.217 GB free disk space
----------------------------------------
--  segments   memory batches
--  -------- -------- -------
--
--  For 1572 reads with 16809400 bases, limit to 1 batch.
--  Will count kmers using  jobs, each using  GB and 4 threads.
--
-- Report changed.
-- Finished stage 'merylConfigure', reset canuIteration.

ABORT:
ABORT: Canu 2.0
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT:   failed to find the number of jobs in 'correction/0-mercounts/meryl-count.sh'.
ABORT:

Is it because the file correction/0-mercounts/meryl-count.sh is not correctly generated in the following lines?

if [ $jobid -gt  ]; then
  echo Error: Only  jobs, you asked for $jobid.
  exit 1
fi

Does canu use some sort of template to generate the meryl-count script? Is there any recent changes since version 1.9? We don't have any issues using the earlier version 1.9 yet.
Here is the command we used to start canu and the jobwrapper.sh passed to gridEngineSubmitCommnad, together with other grid engine related options, is the way we adapt canu to our cluster which should have very little impact on how the file correction/0-mercounts/meryl-count.sh is generated.

canu \
    -d canu_out4 \
    -p rdna \
    overlapper=mhap \
    utgReAlign=true \
    genomeSize=84k \
    useGrid=true \
    gridEngineSubmitCommand="${CANU_BASE}/Linux-amd64/bin/jobwrapper.sh -j oe" \
    gridEngine=pbspro \
    gridEngineResourceOption="-lncpus=THREADS,mem=MEMORY" \
    gridEngineNameToJobIDCommand="qstat -f |grep -F -B 1 WAIT_TAG | grep Id: | cut -c 9-" \
    stageDirectory=\$PBS_JOBFS \
    gridEngineStageOption="-ljobfs=10GB" \
    gridEngineArrayMaxJobs=500 \
    gridOptionsExecutive="-lwalltime=4:00:00" \
    gridOptions="-q normal -lwd -P a00 -lstorage=tmp/a00" \
    -nanopore-raw 20samples.fastq 2> canu.log

Let us know your diagnosis
Yue

@skoren
Copy link
Member

skoren commented Jun 17, 2020

I'm pretty sure this changed between 1.9 and 2.0 because Meryl had some large-scale changes between those releases.

I'd guess either the Meryl binary is failing (this is what gives the configuration) or it's something with PBSPro since that cluster manager seems to always be unstable and breaking. Can you post the output from the configure steps (correction/0-mercounts/*config*)?

@einzigsue
Copy link
Author

Hi skoren,

Thank you for the quick reply. Here is what I saw in the output dir.

$ cat correction/0-mercounts/meryl-configure.sh 
#!/bin/sh


#  Path to Canu.

syst=`uname -s`
arch=`uname -m | sed s/x86_64/amd64/`

bin="/apps/canu/2.0/bin/$syst-$arch/bin"

if [ ! -d "$bin" ] ; then
  bin="/apps/canu/2.0/bin"
fi

#  Report paths.

echo ""
echo "Found perl:"
echo "  " `which perl`
echo "  " `perl --version | grep version`
echo ""
echo "Found java:"
echo "  " `which java`
echo "  " `java -showversion 2>&1 | head -n 1`
echo ""
echo "Found canu:"
echo "  " $bin/canu
echo "  " `$bin/canu -version`
echo ""


#  Environment for any object storage.

export CANU_OBJECT_STORE_CLIENT=
export CANU_OBJECT_STORE_CLIENT_UA=
export CANU_OBJECT_STORE_CLIENT_DA=
export CANU_OBJECT_STORE_NAMESPACE=
export CANU_OBJECT_STORE_PROJECT=



if [ z$PBS_O_WORKDIR != z ] ; then
  cd $PBS_O_WORKDIR
fi

/apps/canu/2.0/bin/meryl -C k=16 threads=4 memory=12 \
  count segment=1/01 ../../rdna.seqStore \
> rdna.ms16.config.01.out 2>&1
exit 0

and as far as I can tell the working directory for the PBS job is just the output dir passed through -d in the command I started canu and rdna.seqStore is right in it but some how meryl think its two level above the output directory.

$ ls
canu-logs  canu-scripts  jobwrapper_1591659825.pbs  rdna.report    rdna.seqStore.err
canu.out   correction    rdna.ms16.config.01.out    rdna.seqStore  rdna.seqStore.sh

$ cat canu-logs/$(ls -t canu-logs | head -1)
Canu v2.0 (+0 commits) r0 .

Current Working Directory:
/scratch/z00/01PROJECTS/canu/testcase_rDNA/canu_out4

Command:
/apps/canu/2.0/bin/meryl \
  -C k=16 threads=4 memory=12 count segment=1/01 ../../rdna.seqStore

Cheers
Yue

@skoren
Copy link
Member

skoren commented Jun 19, 2020

So I wanted to see the contents of Meryl's run, rdna.ms16.config.01.out

@einzigsue
Copy link
Author

Here you go

$ cat rdna.ms16.config.01.out 
usage: /apps/canu/2.0/bin/meryl ...

  A meryl command line is formed as a series of commands and files, possibly
  grouped using square brackets.  Each command operates on the file(s) that
  are listed after it.

  COMMANDS:

    print                display kmers on the screen as 'kmer<tab>count'.  accepts exactly one input.

    count                Count the occurrences of canonical kmers in the input.  must have 'output' specified.
    count-forward        Count the occurrences of forward kmers in the input.  must have 'output' specified.
    count-reverse        Count the occurrences of reverse kmers in the input.  must have 'output' specified.
      k=<K>              create mers of size K bases (mandatory).
      n=<N>              expect N mers in the input (optional; for precise memory sizing).
      memory=M           use no more than (about) M GB memory.
      threads=T          use no more than T threads.

    less-than N          return kmers that occur fewer than N times in the input.  accepts exactly one input.
    greater-than N       return kmers that occur more than N times in the input.  accepts exactly one input.
    equal-to N           return kmers that occur exactly N times in the input.  accepts exactly one input.
    not-equal-to N       return kmers that do not occur exactly N times in the input.  accepts exactly one input.

    increase X           add X to the count of each kmer.
    decrease X           subtract X from the count of each kmer.
    multiply X           multiply the count of each kmer by X.
    divide X             divide the count of each kmer by X.
    modulo X             set the count of each kmer to the remainder of the count divided by X.

    union                return kmers that occur in any input, set the count to the number of inputs with this kmer.
    union-min            return kmers that occur in any input, set the count to the minimum count
    union-max            return kmers that occur in any input, set the count to the maximum count
    union-sum            return kmers that occur in any input, set the count to the sum of the counts

    intersect            return kmers that occur in all inputs, set the count to the count in the first input.
    intersect-min        return kmers that occur in all inputs, set the count to the minimum count.
    intersect-max        return kmers that occur in all inputs, set the count to the maximum count.
    intersect-sum        return kmers that occur in all inputs, set the count to the sum of the counts.

    difference           return kmers that occur in the first input, but none of the other inputs
    symmetric-difference return kmers that occur in exactly one input

  MODIFIERS:

    output O             write kmers generated by the present command to an output  meryl database O
                         mandatory for count operations.

  EXAMPLES:

  Example:  Report 22-mers present in at least one of input1.fasta and input2.fasta.
            Kmers from each input are saved in meryl databases 'input1' and 'input2',
            but the kmers in the union are only reported to the screen.

            meryl print \
                    union \
                      [count k=22 input1.fasta output input1] \
                      [count k=22 input2.fasta output input2]

  Example:  Find the highest count of each kmer present in both files, save the kmers to
            database 'maxCount'.

            meryl intersect-max input1 input2 output maxCount

  Example:  Find unique kmers common to both files.  Brackets are necessary
            on the first 'equal-to' command to prevent the second 'equal-to' from
            being used as an input to the first 'equal-to'.

            meryl intersect [equal-to 1 input1] equal-to 1 input2

Don't know what to do with '../../rdna.seqStore'.

@skoren
Copy link
Member

skoren commented Jun 24, 2020

Hmm, that looks like you have a bad Canu installation, how did you install Canu? If through a package manager like condo, we do not recommend this and instead I suggest you download the package hosted under the releases page.

@einzigsue
Copy link
Author

Here how I installed canu/2.0.

wget https://github.com/marbl/canu/archive/v2.0.tar.gz
tar -xzf v2.0.tar.gz
cd canu-2.0/src
make

@skoren
Copy link
Member

skoren commented Jun 25, 2020

If that is, I don't think that is the version you're running because then you wouldn't end up with a path like canu/2.0/bin/meryl. Have you tried giving the full path the the version you installed as you said above?

@einzigsue
Copy link
Author

I copied the folder to where all our application is installed. Is it a show stopper?

cp -r ../Linux-amd64/* /apps/canu/2.0

@skoren
Copy link
Member

skoren commented Jun 26, 2020

Ah OK, that's why I was surprised by the path change. No as long as it's finding all the binaries it shouldn't matter that the path is different. I just wanted to make sure it wasn't picking up some system-wide Canu installation instead of yours or mixing the two up.

Everything in the Meryl command that is not working looks correct, it should be looking for the seqStore two levels up because it first changes directories, you can see this in the log:

    cd correction/0-mercounts
    ./meryl-configure.sh \
    > ./meryl-configure.err 2>&1

So Meryl and all its output should be running under the correction/0-mercounts folder. It's weird the output log (rdna.ms16.config.01.out)is in the top-level folder unless you tried to run the command by hand after it failed. Is there an rdna.ms16.config.01.out in the correction/0-mercounts folder? Is there a correction/0-mercounts/meryl-configure.err file, if so post both of those assuming they're not identical to what you already posted.

The other thing to check is the sequence store. What's the contents of your seqStore folder? Can you post the seqStore.err log?

@einzigsue
Copy link
Author

I didn't run any part of the job manually.
There is neither rdna.ms16.config.01.out nor meryl-configure.err in the correction/0-mercounts folder

$ ls
canu-logs  canu-scripts  jobwrapper_1591659825.pbs  rdna.report    rdna.seqStore.err
canu.out   correction    rdna.ms16.config.01.out    rdna.seqStore  rdna.seqStore.sh
$ ls correction/0-mercounts/
meryl-configure.sh  meryl-count.sh  meryl-make-ignore.pl  meryl-process.sh

This is how the seqStore looks like

$ ls rdna.seqStore
blobs.0001  info.txt       readlengths-cor.dat  readlengths-cor.txt  reads-corc  reads-rawu
errorLog    libraries      readlengths-cor.gp   readNames.txt        reads-coru  version.001
info        libraries.txt  readlengths-cor.png  reads                reads-rawc

The seqStore.err is in the top-level folder, 583KB large, so I just put the meaningful par here

Found perl:
   /bin/perl
   This is perl 5, version 26, subversion 3 (v5.26.3) built for x86_64-linux-thread-multi

Found java:
   /bin/java
   openjdk version "1.8.0_242"

Found canu:
   /apps/canu/2.0/bin/canu
   Canu 2.0


Creating library '20samples' for Nanopore raw reads.

               reads               bases
---------- --------- ------ ------------ ------
Loaded         16000  90.5%    166468512  99.3%  /scratch/z00/01PROJECTS/canu/testcase_rDNA/20samples.fastq
Short           1689   9.5%      1204630   0.7%


All reads processed.

               reads               bases
---------- --------- ------ ------------ ------
Loaded         16000  90.5%    166468512  99.3%
Short           1689   9.5%      1204630   0.7%

sqStore_loadMetadata()-- Using 'raw' 0x01 reads.

EXCESSIVE COVERAGE DETECTED.  Sampling reads.

For genome size of        84000 bases,
            retain     16800000 bases (200.00X coverage).
Found        16000 reads with    166468512 bases (1981.77X coverage).
readID    length        score
------- -------- ------------
12276       1629       1.0000
15045       2533       0.9999
2          12045       0.9998
10326       2671       0.9997
1002       13364       0.9997
5883       18910       0.9996
...
10708       3036       0.9040
9073        2186       0.9039
12610       2334       0.9039
5013       10559       0.9039
175         1104       0.9036 REMOVED
14399       1946       0.9036 REMOVED
436         3313       0.9036 REMOVED
....
54Dropped      14428 reads with    149659112 bases (1781.66X coverage).
Retained      1572 reads with     16809400 bases (200.11X coverage).

Bye.
08        7007       0.0094 REMOVED
14657       4241       0.0094 REMOVED
6838        4097       0.0093 REMOVED
8893       14944       0.0093 REMOVED
....

5433       12740       0.0003 REMOVED
1950        8102       0.0003 REMOVED
11638       4390       0.0002 REMOVED
12111       8388       0.0002 REMOVED
6101        7457       0.0001 REMOVED
77          1268       0.0000 REMOVED

I know the failure is quite mysterious. I myself don't understand it neither. My initial guess is the wrong correction/0-mercounts/meryl-count.sh script leads to failure during the meryl configuration and in turn breaks some assumption about when the working directory should be which, and then lead to the failure.

@skoren
Copy link
Member

skoren commented Jun 26, 2020

No, the error can't have anything to do with meryl-count.sh, the configure step comes before that and it is what isn't running properly. I don't see how it can be ending up in the wrong folder as if your system is ignoring the chdir command in perl. That code hasn't changed in a long time.

Are you able to run on a single node w/useGrid=false without this error?

@cgjosephlee
Copy link

Hi @skoren ,
I have ran into similar issue, and I can say that using useGrid=false works good.
correction/0-mercounts/genome.ms16.config.??.out are generated in right place.
So it might be a grid-related issue.

@skoren
Copy link
Member

skoren commented Jul 8, 2020

@einzigsue after carefully reading your logs I think I see what is happening and this is a PBS-specific bug. Since PBS doesn't maintain working directories for submitted jobs, Canu generates the scripts to correctly change into their folder on start but then this wipes out the initial cd to the right place. So essentially the flow is:

cd correction/0-mercounts            # ends up in the right place
if [ z$PBS_O_WORKDIR != z ] ; then.  # Meryl-configure jumps back to incorrect location
  cd $PBS_O_WORKDIR
fi

and then configuration fails. It should be enough to remove the work directory shell code from the configure script. Commenting out line 399 in Meryl.pm should do it:

399     #print F setWorkDirectoryShellCode($path);

(if you don't want to build it you can change the file in <path_to_canu>/Linux-amd64/lib/site_perl/canu/Meryl.pm. If you can give that a try and see if it fixes your issue I'd appreciate it.

@skoren
Copy link
Member

skoren commented Jul 28, 2020

Idle, resolved by commit above or the code change listed.

@skoren skoren closed this as completed Jul 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants