Workflow element | GitHub | bio.tools | BioContainers | bioconda |
---|---|---|---|---|
Pacific Biosciences assembly tool suite v0.0.8 | ● | ● | ● |
The Pacific Biosciences assembly tool suite covers the three main functions of Falcon;
- fc_run,
- fc_unzip, and
- fc_phase
- As per parameter settings in each .cfg file
Requires Conda and Nextflow to run on Zeus.
Fasta (fc_run), bam files (fc_unzip), hi-C files (fc_phase). This release of the pipeline does not support hifi data, but this will come soon. Hi-C data it optional. If you only have bam files, but no fasta files, you can use https://github.com/PacificBiosciences/bam2fastx to convert them.
subreads.fasta.fofn: list of fasta files for analysis. All names must be on one line.
subreads.bam.fofn: list of bam files for analysis. All names must be on seperate lines.
subreads.hi-c.fofn: list of hi-c files for analysis. All names must be on one line.
fc_run.cfg, fc_unzip.cfg, and fc_phase.cfg: files that specify the parameters for pb-assembly
Assembly parameters are altered via the .cfg files. See https://github.com/PacificBiosciences/pb-assembly for details.
pb-assembly outputs many files, which can be used for quality checking as seen the tutorial section. However, the final outputs are 5-phase/output/phased.0.fasta and 5-phase/output/phased.1.fasta.
Workflow Tool | Version | Infrastructure | Scheduler | Workflow manager | Container | Install method |
---|---|---|---|---|---|---|
Pacific Biosciences assembly tool suite v0.0.8 | Pre-release 2nd July 2020 | Zeus, Pawsey | SLURM | Nextflow (v19.10.0) | Singularity (v3.5.2) | Miniconda3 - this environment will be activated from the workflow install. Independent install is not required. |
Table with embedded Compute infrastructure name optimisation -> "HPC-HTC" column
Workflow | Version | Group | Sample name (e.g. organism) | Other sample detail (e.g. Genus species) | Other sample detail (e.g. genome size (GB)) | Hours required | Cores | Peak RAM in GB (requested) | Drive (GB) | HPC-HTC | Month-Year |
---|---|---|---|---|---|---|---|---|---|---|---|
Any comment on major features being introduced, or default/API changes that might result in unexpected behaviours.
We encountered a bug in the 2-asm_falcon ovlp_filtering stage, where preads.m4 had an erroneous '---' at the end of the file. We fixed this by following this github issue: PacificBiosciences/pbbioconda#294. This step is now automatically taken care of in the nextflow pipeline.
The fofn files need to have all entries on one line.
The all.log
file is useful for checking job progress and identifying which step(s) caused the workflow to fail if troubleshooting is required. This file should be used to guide you to the detailed stderr file for the failed step(s). The detailed log files for each step are in located in the sub-directories of each process (e.g. nf-work/<nextflowID1>/<nextflowID2>/0-rawreads/build/run-Pccabfacd84af34.bash.strerr
). The slurm log file contains only the nextflow standard output/error, which is generally not very verbose.
This workflow is for the reference genome assembly of the Fat-tailed Dunnart (Sminthopsis crassicaudata, as part of an Australian Biocommons collaboration. Raw sequences were generated by Stephen Frankenberg (University of Melbourne) as part of the Oz Mammals Genomics (OMG) Framework Initiative, and downloaded from the Bioplatforms Australia data portal. The Initiative is supported by funding from Bioplatforms Australia through the Australian Government National Collaborative Research Infrastructure Strategy (NCRIS).
The first release is credited to Pawsey Supercomputing Centre in collaboration with the Australian Biocommons.