Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NGSCheckMate #993

Open
wants to merge 18 commits into
base: dev
Choose a base branch
from
Open

Add NGSCheckMate #993

wants to merge 18 commits into from

Conversation

SPPearce
Copy link

@SPPearce SPPearce commented Mar 31, 2023

This PR adds the NGSCheckMate tool to the pipeline. I find this an essential tool and part of my initial QC for all sequencing we perform. This tool takes a bed file with a set of SNPs and tries to determine if samples come from the same individuals. Developed and generally used in humans (there are SNP bed files for hg19/hg38/GRCh37/GRCh38 available), but could be used in other species if a suitable set of common SNPs was provided.
This should be configured to run automatically on GRCh37/38, hg37/hg38 I think needs the files adding to igenomes - need to check.

This ensures that no sample swaps have occurred by checking:

  • Replicates come from the same individual
  • Paired samples (e.g. timepoints or tissue samples) come from the same individual
  • Non-paired samples didn't come from the same individual!
    It works on human derived cell lines, but won't separate different treatments applied to the same cell line.

Testing.
I have used a bed file on the test datasets to get it to run on yeast, haven't tried the full tests yet.

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [X] If necessary, also make a PR on the nf-core/rnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@SPPearce SPPearce changed the base branch from master to dev March 31, 2023 15:37
@nf-core nf-core deleted a comment from github-actions bot Mar 31, 2023
@github-actions
Copy link

github-actions bot commented Jun 8, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit a4a946c

+| ✅ 145 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in WorkflowRnaseq.groovy: Optionally add in-text citation tools to this list.

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml
  • multiqc_config - multiqc_config

✅ Tests passed:

Run details

  • nf-core/tools version 2.10
  • Run at 2023-11-25 17:13:18

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inclusion probably a call for @drpatelh, but a couple of minor comments in the meantime

@@ -768,6 +772,15 @@ workflow RNASEQ {
ch_versions = ch_versions.mix(DUPRADAR.out.versions.first())
}

if (params.ngscheckmate_bed) {
BAM_NGSCHECKMATE (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way the output could be baked into the MultiQC report?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That has been suggested over in sarek too, something I could consider.

docs/output.md Outdated

</details>

[NGSCheckMate](https://github.com/parklab/NGSCheckMate) is a tool to verify that samples come from the same individual, by examining a set of single nucleotide polymorphisms (SNPs). This calculates correlations between the samples, and then applies a depth-dependent model of allele fractions to call samples as being related or not. The principal output is a dendrogram, where samples that are .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfinished sentence

ch_versions = ch_versions.mix(BCFTOOLS_MPILEUP.out.versions)

BCFTOOLS_MPILEUP
.out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent the ops (here and below)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the nf-core subworkflow, but I'll try and remember to do it when I convert to nf-test.

@pinin4fjords
Copy link
Member

@SPPearce - @drpatelh is concerned that we take care to ensure that the new input bed file is compatible with other reference inputs (coordinates, 'chr' prefixes etc). Can we incorporate a check for those things?

@SPPearce
Copy link
Author

@SPPearce - @drpatelh is concerned that we take care to ensure that the new input bed file is compatible with other reference inputs (coordinates, 'chr' prefixes etc). Can we incorporate a check for those things?

Hmm, we could check for the existence of chr mismatching, that would be relatively simple I guess?
A genome build mismatch (hg37/hg38) would probably just end up with most of the samples being considered as matching (the bed file would point to the wrong locations, the ~25% of which have the correct reference allele in the SNP bed file would then all probably be ref in all the samples so all would be considered homozygous)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants