Skip to content

Releases: naobservatory/mgs-workflow

v2.6.0.0

14 Jan 17:15
38899ac
Compare
Choose a tag to compare

v2.6.0.0

Updated version to reflect the new versioning scheme, which is described in docs/version_schema.md.

v2.5.4

Fixed fatal bug in configs/run_validation.config that prevents users from running the RUN_VALIDATION workflow.

v2.5.3

06 Jan 16:56
cef7648
Compare
Choose a tag to compare
  • Added new LOAD_SAMPLESHEET subworkflow to centralize samplesheet processing
  • Updated tags to prevent inappropriate S3 auto-cleanup
  • Testing infrastructure
    • Split up the tests in End-to-end MGS workflow test so that they can be run in parallel on Github Actions.
    • Implemented an end-to-end test that checks if the RUN workflow produces the correct output. The correct output for the test has been saved in test-data/gold-standard-results so that the user can diff the output of their test with the correct output to check where their pipeline might be failing.
  • Began development of single-end read processing (still in progress)
    • Restructured RAW, CLEAN, QC, TAXONOMY, and PROFILE workflows to handle both single-end and paired-end reads
    • Added new FASTP_SINGLE, TRUNCATE_CONCAT_SINGLE, BBDUK_SINGLE, CONCAT_GROUP_SINGLE, SUBSET_READS_SINGLE and SUBSET_READS_SINGLE_TARGET processes to handle single-end reads
    • Created separate end-to-end test workflow for single-end processing (which will be removed once single-end processing is fully integrated)
    • Modified samplesheet handling to support both single-end and paired-end data
    • Updated generate_samplesheet.sh to handle single-end data with --single_end flag
    • Added read_type.config to handle single-end vs paired-end settings (set automatically based on samplesheet format)
    • Created run_dev_se.config and run_dev_se.nf for single-end development testing (which will be removed once single-end processing is fully integrated)
    • Added single-end samplesheet to test-data

v2.5.2

27 Nov 16:17
b75ddc6
Compare
Choose a tag to compare

From the CHANGELOG:

  • Changes to default read filtering:
    • Relaxed FASTP quality filtering (--cut_mean_quality and --average_qual reduced from 25 to 20).
    • Relaxed BBDUK viral filtering (switched from 3 21-mers to 1 24-mer).
  • Overhauled BLAST validation functionality:
    • BLAST now runs on forward and reverse reads independently
    • BLAST output filtering no longer assumes specific filename suffixes
    • Paired BLAST output includes more information
    • RUN_VALIDATION can now directly take in FASTA files instead of a virus read DB
    • Fixed issues with publishing BLAST output under new Nextflow version
  • Implemented nf-test for end-to-end testing of pipeline functionality
    • Implemented test suite in tests/main.nf.test
    • Reconfigured INDEX workflow to enable generation of miniature index directories for testing
    • Added Github Actions workflow in .github/workflows/end-to-end.yml
    • Pull requests will now fail if any of INDEX, RUN, or RUN_VALIDATION crashes when run on test data.
    • Generated first version of new, curated test dataset for testing RUN workflow. Samplesheet and config file are available in test-data. The previous test dataset in test has been removed.
  • Implemented S3 auto-cleanup:
    • Added tags to published files to facilitate S3 auto-cleanup
    • Added S3 lifecycle configuration file to ref, along with a script in bin to add it to an S3 bucket
  • Minor changes
    • Added logic to check if grouping variable in nextflow.config matches the input samplesheet, if it doesn't, the code throws an error.
    • Externalized resource specifications to resources.config, removing hardcoded CPU/memory values
    • Renamed index-params.json to params-index.json to avoid clash with Github Actions
    • Removed redundant subsetting statement from TAXONOMY workflow.
    • Added --group_across_illumina_lanes option to generate_samplesheet

v2.5.1

15 Nov 14:06
fa9fbc9
Compare
Choose a tag to compare

Includes changes from both v2.5.0 and v2.5.1:

v2.5.1

  • Enabled extraction of BBDuk-subset putatively-host-viral raw reads for downstream chimera detection.
  • Added back viral read fields accidentally being discarded by COLLAPSE_VIRUS_READS.

v2.5.0

  • Reintroduced user-specified sample grouping and concatenation (e.g. across sequencing lanes) for deduplication in PROFILE and EXTRACT_VIRAL_READS.
  • Generalised pipeline to detect viruses infecting arbitrary host taxa (not just human-infecting viruses) as specified by ref/host-taxa.tsv and config parameters.
  • Configured index workflow to enable hard-exclusion of specific virus taxa (primarily phages) from being marked as infecting ost taxa of interest.
  • Updated pipeline output code to match changes made in latest Nextflow update (24.10.0).
  • Created a new script bin/analyze-pipeline.py to analyze pipeline structure and identify unused workflows and modules.
  • Cleaned up unused workflows and modules made obsolete in this and previous updates.
  • Moved module scripts from bin to module directories.
  • Modified trace filepath to be predictable across runs.
  • Removed addParams calls when importing dependencies (deprecated in latest Nextflow update).
  • Switched from nt to core_nt for BLAST validation.
  • Reconfigured QC subworkflow to run FASTQC and MultiQC on each pair of input files separately (fixes bug arising from allowing arbitrary filenames for forward and reverse read files).

v2.4.0

21 Oct 13:43
8667e9f
Compare
Choose a tag to compare

See CHANGELOG for details.

v2.3.0

03 Aug 13:15
Compare
Choose a tag to compare

Significant refactor to increase efficiency on large datasets.

v2.2.0

28 Jun 15:44
Compare
Choose a tag to compare

Major pipeline refactor; changes include:

  • Introducing module/subworkflow structure from nf-core.
  • Updated and expanded reference files.
  • Much more extensive documentation.
  • Several bug fixes and other resolved issues.

v2.1.0

21 Jun 12:21
Compare
Choose a tag to compare

Working version of v2 pipeline used for published version of p2ra report.