Skip to content

Latest commit

 

History

History
90 lines (54 loc) · 4.9 KB

falcon.md

File metadata and controls

90 lines (54 loc) · 4.9 KB

Falcon

Description

Workflow element GitHub bio.tools BioContainers bioconda
Pacific Biosciences assembly tool suite v0.0.8

The Pacific Biosciences assembly tool suite covers the three main functions of Falcon;

  1. fc_run,
  2. fc_unzip, and
  3. fc_phase

Required (minimum) inputs/parameters

  • As per parameter settings in each .cfg file

Third party tools / dependencies

Requires Conda and Nextflow to run on Zeus.

Input(s)

Data types:

Fasta (fc_run), bam files (fc_unzip), hi-C files (fc_phase). This release of the pipeline does not support hifi data, but this will come soon. Hi-C data it optional. If you only have bam files, but no fasta files, you can use https://github.com/PacificBiosciences/bam2fastx to convert them.

pb-assembly specific files required:

subreads.fasta.fofn: list of fasta files for analysis. All names must be on one line.

subreads.bam.fofn: list of bam files for analysis. All names must be on seperate lines.

subreads.hi-c.fofn: list of hi-c files for analysis. All names must be on one line.

fc_run.cfg, fc_unzip.cfg, and fc_phase.cfg: files that specify the parameters for pb-assembly

Parameter(s)

Assembly parameters are altered via the .cfg files. See https://github.com/PacificBiosciences/pb-assembly for details.

Output(s)

pb-assembly outputs many files, which can be used for quality checking as seen the tutorial section. However, the final outputs are 5-phase/output/phased.0.fasta and 5-phase/output/phased.1.fasta.

Diagram

Usage

Summary

Workflow Tool Version Infrastructure Scheduler Workflow manager Container Install method
Pacific Biosciences assembly tool suite v0.0.8 Pre-release 2nd July 2020 Zeus, Pawsey SLURM Nextflow (v19.10.0) Singularity (v3.5.2) Miniconda3 - this environment will be activated from the workflow install. Independent install is not required.

High level resource usage

Table with embedded Compute infrastructure name optimisation -> "HPC-HTC" column

Workflow Version Group Sample name (e.g. organism) Other sample detail (e.g. Genus species) Other sample detail (e.g. genome size (GB)) Hours required Cores Peak RAM in GB (requested) Drive (GB) HPC-HTC Month-Year

Additional notes

Any comment on major features being introduced, or default/API changes that might result in unexpected behaviours.

Install

Tutorials

Help / FAQ / Troubleshooting

Note 1

We encountered a bug in the 2-asm_falcon ovlp_filtering stage, where preads.m4 had an erroneous '---' at the end of the file. We fixed this by following this github issue: PacificBiosciences/pbbioconda#294. This step is now automatically taken care of in the nextflow pipeline.

Note 2

The fofn files need to have all entries on one line.

Troubleshooting

The all.log file is useful for checking job progress and identifying which step(s) caused the workflow to fail if troubleshooting is required. This file should be used to guide you to the detailed stderr file for the failed step(s). The detailed log files for each step are in located in the sub-directories of each process (e.g. nf-work/<nextflowID1>/<nextflowID2>/0-rawreads/build/run-Pccabfacd84af34.bash.strerr). The slurm log file contains only the nextflow standard output/error, which is generally not very verbose.

License(s)

Acknowledgements / citations / credits

This workflow is for the reference genome assembly of the Fat-tailed Dunnart (Sminthopsis crassicaudata, as part of an Australian Biocommons collaboration. Raw sequences were generated by Stephen Frankenberg (University of Melbourne) as part of the Oz Mammals Genomics (OMG) Framework Initiative, and downloaded from the Bioplatforms Australia data portal. The Initiative is supported by funding from Bioplatforms Australia through the Australian Government National Collaborative Research Infrastructure Strategy (NCRIS).

The first release is credited to Pawsey Supercomputing Centre in collaboration with the Australian Biocommons.