Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline remodelling. Issues 24, 31 and 36 #44

Merged
merged 173 commits into from
Mar 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
173 commits
Select commit Hold shift + click to select a range
965804d
Split out the config profiles (#43)
abhi18av Nov 18, 2021
fb02022
trying to have a create dbs process
fmalmeida Nov 19, 2021
30b9c44
found way to download dbs
fmalmeida Nov 19, 2021
c5c6816
adding resfinder and plasmidfinder download rules
fmalmeida Nov 19, 2021
a65ecb6
added phigaro, vfdb and amrfinder download rules
fmalmeida Nov 19, 2021
1724796
added last database download rules
fmalmeida Nov 19, 2021
88c93c9
added prokka HMM download rule
fmalmeida Nov 19, 2021
abd066c
added label to use docker for tools
fmalmeida Nov 19, 2021
53cb5b4
fixed argminer download
fmalmeida Nov 19, 2021
cc2fb19
fixed victors download
fmalmeida Nov 19, 2021
4e631e2
fixed iceberg download
fmalmeida Nov 19, 2021
4e46026
prokka using given database and bioconda image
fmalmeida Nov 19, 2021
c4d9998
docker specific for downloading databases
fmalmeida Nov 19, 2021
4cf3b72
update packages
fmalmeida Nov 19, 2021
523e1f3
added mlst database download rule
fmalmeida Nov 19, 2021
ea48078
add bacannot db info
fmalmeida Nov 19, 2021
ad7cf28
also downloads PGAP db
fmalmeida Nov 19, 2021
932de14
added prokka and mlst
fmalmeida Nov 20, 2021
247093b
fix conditional
fmalmeida Nov 20, 2021
7d74fc1
fixed pgap conditional
fmalmeida Nov 20, 2021
64e01d1
added barrnap
fmalmeida Nov 20, 2021
9eec439
added 'compute_gc' module -- think about modules with two tools
fmalmeida Nov 21, 2021
63279dd
adding identation
fmalmeida Dec 6, 2021
68a9ea3
fixing label
fmalmeida Dec 6, 2021
d71199f
first tries on conda envs
fmalmeida Dec 6, 2021
6a5e0f9
removing label
fmalmeida Dec 6, 2021
5cb719a
adding plasmidfinder
fmalmeida Dec 6, 2021
cc2320e
added platon
fmalmeida Dec 6, 2021
e563c29
working until islandpath
fmalmeida Dec 15, 2021
c29b581
added: VFDB
fmalmeida Dec 15, 2021
89d43b0
added: Victors
fmalmeida Dec 15, 2021
4a047b1
changing name and organization of MISC image
fmalmeida Dec 15, 2021
7b77d8f
added: PHAST
fmalmeida Dec 15, 2021
4cce360
added: phigaro
fmalmeida Dec 16, 2021
4104983
added: phispy
fmalmeida Dec 17, 2021
07d19d1
added: iceberg
fmalmeida Dec 17, 2021
e708082
added: kofamscan db download
fmalmeida Dec 17, 2021
cc86f1a
removing named outputs as it will be solved in another PR
fmalmeida Dec 21, 2021
d355889
added: kofamscan from downloaded database
fmalmeida Dec 21, 2021
bfe0c83
added: kegg_decoder
fmalmeida Dec 21, 2021
a50b4e7
Refactor channel identifiers and process names (#45)
abhi18av Jan 6, 2022
58893ea
Merge branch 'develop' into remodeling
fmalmeida Jan 6, 2022
280ebb8
trying to brind amrfinder, card_rgi and argminer
fmalmeida Jan 12, 2022
ca17425
trying to fix how the scale is calculated for amrfinder
fmalmeida Jan 12, 2022
32b29a5
changing scale to perl
fmalmeida Jan 12, 2022
91f84ba
typo fix
fmalmeida Jan 12, 2022
decf5c6
removing unnecessary comma
fmalmeida Jan 12, 2022
848ec5f
Update amrfinder.nf
fmalmeida Jan 13, 2022
72ebcb4
properly working until amrfinder and card_rgi
fmalmeida Jan 13, 2022
ed6f51a
added small dataset test profile
fmalmeida Jan 20, 2022
c0fcb09
Merge branch 'develop' into remodeling
fmalmeida Jan 20, 2022
b639174
fixing name of quicktest profile
fmalmeida Jan 20, 2022
7621731
fixing urls of testing samplesheets
fmalmeida Jan 21, 2022
ed49f29
removing unnecessary files
fmalmeida Jan 21, 2022
348ba18
Merge branch 'develop' into remodeling
fmalmeida Jan 21, 2022
6d8d147
Merge branch 'master' into remodeling
fmalmeida Jan 25, 2022
2b1165f
fix missing label
fmalmeida Jan 26, 2022
a7ad338
adding resfinder to miscellaneous image
fmalmeida Jan 26, 2022
327bf73
fixing gitignore
fmalmeida Jan 26, 2022
a7f678e
change manifest to upcoming version
fmalmeida Jan 26, 2022
d7a9060
updating db download workflow behaviour
fmalmeida Jan 26, 2022
83a04a7
removing .git dirs
fmalmeida Jan 26, 2022
6684f14
fixing argminder download
fmalmeida Jan 26, 2022
fba3be9
properly added resfinder
fmalmeida Jan 26, 2022
53f8665
adding script that parses nanopolish methyl call
fmalmeida Jan 31, 2022
56365cf
removing unnecessary labels
fmalmeida Jan 31, 2022
944a0e5
starting to change how images are done
fmalmeida Feb 9, 2022
e8ba6c2
added perl tools
fmalmeida Feb 9, 2022
e025c5c
added misc module
fmalmeida Feb 9, 2022
df92172
adding labels
fmalmeida Feb 9, 2022
290c766
added kofam analysis
fmalmeida Feb 9, 2022
29893f4
added main pyenv
fmalmeida Feb 9, 2022
5f34721
fixed perlenv image
fmalmeida Feb 9, 2022
e474e21
included virulence modules
fmalmeida Feb 9, 2022
afc5e2b
pyenv image updated
fmalmeida Feb 9, 2022
f2a43bb
added iceberg db module
fmalmeida Feb 9, 2022
088f798
added py36env image
fmalmeida Feb 10, 2022
91962a9
added resistance tools
fmalmeida Feb 10, 2022
453e3f8
added nanopolish
fmalmeida Feb 10, 2022
0d094ff
added refseq_masher
fmalmeida Feb 10, 2022
0dfee86
added digIS
fmalmeida Feb 15, 2022
4713250
added antismash
fmalmeida Feb 16, 2022
76d4117
added sequence server
fmalmeida Feb 16, 2022
e85bd2b
added merge_annotation module
fmalmeida Feb 17, 2022
0a17f43
added draw_gis modules
fmalmeida Feb 17, 2022
f08a4f4
added gff2gbk module
fmalmeida Feb 17, 2022
08443a6
added create_sql module
fmalmeida Feb 17, 2022
fa6c04d
added first resource management labels
fmalmeida Feb 17, 2022
0ead1df
fixing resouce label for phigaro
fmalmeida Feb 17, 2022
6601aa3
little finx in env
fmalmeida Feb 17, 2022
b160221
fixing how tuples should be passed
fmalmeida Feb 18, 2022
2cabd4c
arrived at jbrowse step
fmalmeida Feb 18, 2022
83e841c
fixing draw_gis module input tuple
fmalmeida Feb 18, 2022
393ca59
jbrowse added
fmalmeida Feb 18, 2022
acdf004
Create test_pr.yml
fmalmeida Feb 20, 2022
7062d3c
Merge branch 'master' into remodeling
fmalmeida Feb 20, 2022
e70fd81
fixed phast db incorporation
fmalmeida Feb 20, 2022
748e52e
fixing inputs on report
fmalmeida Feb 20, 2022
305956e
Update test_pr.yml
fmalmeida Feb 20, 2022
58f901f
Merge branch 'master' into remodeling
fmalmeida Feb 20, 2022
590b74f
adding ENV VAR for current version
fmalmeida Feb 20, 2022
34d28d4
adding scripts to automatically build images
fmalmeida Feb 20, 2022
ca2b4cf
fixed iceberg db incorporation
fmalmeida Feb 20, 2022
64b9f79
Update digIS.nf
fmalmeida Feb 22, 2022
95f8e8d
fixed vfdb incorporation
fmalmeida Feb 23, 2022
5b061fd
changing file to path input resolution
fmalmeida Feb 23, 2022
c5cbf22
fixed victors db incorporation
fmalmeida Feb 23, 2022
144b082
changed channel names in main script
fmalmeida Feb 23, 2022
8845222
fixed argminer and prokka tables
fmalmeida Feb 23, 2022
03b4bbc
fixed custom db annotations incorporation
fmalmeida Feb 23, 2022
393d94d
fixed bacannot server loading
fmalmeida Mar 1, 2022
16f26b6
fixed custom db reports
fmalmeida Mar 1, 2022
4e2bc82
begin incorporation of nf-core framework
fmalmeida Mar 1, 2022
ba469ae
nf-core libs have been added to the pipeline
fmalmeida Mar 2, 2022
a99cca7
custom database annotation added to JBrowse
fmalmeida Mar 2, 2022
891f900
fixed custom database gff generation
fmalmeida Mar 2, 2022
9365da6
Update Dockerfile
fmalmeida Mar 2, 2022
298615c
Update docker.config
fmalmeida Mar 2, 2022
a031547
creating singularity profile
fmalmeida Mar 2, 2022
2ef8f20
begin documentation update
fmalmeida Mar 2, 2022
9c46c01
added singularity profile
fmalmeida Mar 3, 2022
c355326
made image compatible with earlier versions
fmalmeida Mar 3, 2022
44be846
given 777 permissions to workdir
fmalmeida Mar 3, 2022
3250045
adjusted default values
fmalmeida Mar 3, 2022
7924e60
fixed prokka to work with singularity
fmalmeida Mar 3, 2022
0348ecc
fixed rgi for singularity usage
fmalmeida Mar 4, 2022
d3ff2d5
updating PR test action
fmalmeida Mar 9, 2022
5b97c8b
update targeted branches
fmalmeida Mar 9, 2022
aee4f40
limiting process resources
fmalmeida Mar 9, 2022
988be72
updated image for singularity
fmalmeida Mar 9, 2022
1b4ff95
removing unused dbs in quicktest
fmalmeida Mar 9, 2022
9c2df04
not using big dbs in quicktest
fmalmeida Mar 9, 2022
7148102
Update resfinder.nf
fmalmeida Mar 9, 2022
5581b28
adding gitpod config
fmalmeida Mar 17, 2022
d0b822b
fixed yml
fmalmeida Mar 17, 2022
f7e6a7c
fixing custom db report getattributefileld snippet
fmalmeida Mar 17, 2022
f92a6ff
begin change to mkdocs
fmalmeida Mar 17, 2022
a4252b1
add requirements
fmalmeida Mar 17, 2022
0691107
Update .readthedocs.yml
fmalmeida Mar 17, 2022
20cf333
Update .readthedocs.yml
fmalmeida Mar 17, 2022
16edc6c
Update .readthedocs.yml
fmalmeida Mar 17, 2022
69323ca
Update .readthedocs.yml
fmalmeida Mar 17, 2022
b652218
update
fmalmeida Mar 17, 2022
d224916
update requirements
fmalmeida Mar 21, 2022
766a6b6
added index
fmalmeida Mar 21, 2022
de2d00e
fixed tags
fmalmeida Mar 21, 2022
e2871f9
added installation information
fmalmeida Mar 21, 2022
99cff69
now on quickstart
fmalmeida Mar 21, 2022
549036b
changed some admonitions
fmalmeida Mar 21, 2022
d255b62
added quickstart
fmalmeida Mar 21, 2022
0f4077b
added samplesheet page to mkdocs
fmalmeida Mar 22, 2022
6bc5aa2
included dir with images
fmalmeida Mar 22, 2022
c00fa7b
Update samplesheet.md
fmalmeida Mar 22, 2022
42348d1
outputs page added to mkdocs
fmalmeida Mar 22, 2022
9f489a4
Update standard.config
fmalmeida Mar 22, 2022
cf51c0d
Update defaults.config
fmalmeida Mar 22, 2022
2cba8e9
updated and tested quickstart
fmalmeida Mar 22, 2022
e0e5d97
updating gitpod.yml
fmalmeida Mar 22, 2022
c4b0d6d
creates a testing dir with more space
fmalmeida Mar 22, 2022
3dc2779
added profile selection information
fmalmeida Mar 23, 2022
60d4990
Merge branch 'remodeling' of https://github.com/fmalmeida/bacannot in…
fmalmeida Mar 23, 2022
6dad88a
Update manual.md
fmalmeida Mar 23, 2022
4fa2598
Update nextflow_schema.json
fmalmeida Mar 23, 2022
b3a7a90
Update nextflow.config
fmalmeida Mar 23, 2022
fb397b6
added config file page
fmalmeida Mar 23, 2022
51a67ff
information about custom databases added
fmalmeida Mar 23, 2022
b064d61
Update nextflow_schema.json
fmalmeida Mar 24, 2022
e7f89be
defaults need to be loaded before boilerplate
fmalmeida Mar 24, 2022
cf464cd
changing label of unicycler and flye
fmalmeida Mar 24, 2022
b273143
fixed antismash installation
fmalmeida Mar 26, 2022
113e8c4
fixed keggdecoder requires py36
fmalmeida Mar 26, 2022
08c0026
fixed resfinder module
fmalmeida Mar 26, 2022
1144f9e
pipeline fixed for docker
fmalmeida Mar 26, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
name: Testing pipeline's core for the new PR
name: Testing new PR with docker
on:
pull_request:
branches: master
types: [ opened, synchronize, reopened ]
branches: [ master, dev, develop ]
types: [ ready_for_review, synchronize, reopened ]

jobs:
run_nextflow:
Expand All @@ -23,9 +23,17 @@ jobs:
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run the pipeline
- name: Clean environment
run: |
sudo rm -rf /usr/local/lib/android # will release about 10 GB if you don't need Android
sudo rm -rf /usr/share/dotnet # will release about 20GB if you don't need .NET
nextflow run main.nf -profile docker,quicktest --threads 2

- name: Build bacannot database
run: |
nextflow run main.nf -profile docker --get_dbs --output bacannot_dbs --max_cpus 2 --max_memory '6.GB' --max_time '6.h'
rm -rf bacannot_dbs/antismash_db bacannot_dbs/kofamscan_db bacannot_dbs/prokka_db/PGAP_NCBI.hmm # remove unused in quicktest to diminish size

- name: Run the pipeline
run: |
nextflow run main.nf -profile docker,quicktest --bacannot_db bacannot_dbs
41 changes: 41 additions & 0 deletions .github/workflows/test_pr_singularity.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Testing new PR with singularity
on:
pull_request:
branches: [ master, dev, develop ]
types: [ ready_for_review, synchronize, reopened ]

jobs:
run_nextflow:
name: Run pipeline for the upcoming PR
runs-on: ubuntu-latest

steps:

- name: Check out pipeline code
uses: actions/checkout@v2

- name: Install Nextflow
env:
CAPSULE_LOG: none
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Install Singularity
uses: eWaterCycle/setup-singularity@v7
with:
singularity-version: 3.8.3

- name: Clean environment
run: |
sudo rm -rf /usr/local/lib/android # will release about 10 GB if you don't need Android
sudo rm -rf /usr/share/dotnet # will release about 20GB if you don't need .NET

- name: Build bacannot database
run: |
nextflow run main.nf -profile singularity --get_dbs --output bacannot_dbs --max_cpus 2 --max_memory '6.GB' --max_time '6.h'
rm -rf bacannot_dbs/antismash_db bacannot_dbs/kofamscan_db bacannot_dbs/prokka_db/PGAP_NCBI.hmm # remove unused in quicktest to diminish size

- name: Run the pipeline
run: |
nextflow run main.nf -profile singularity,quicktest --bacannot_db bacannot_dbs
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,3 @@
.Ruserdata
TESTE
docs/_html
teste
29 changes: 29 additions & 0 deletions .gitpod.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
image: nfcore/gitpod:latest

tasks:
- before: |
wget -qO- get.nextflow.io | bash
chmod 777 nextflow
sudo mv nextflow /usr/local/bin/
pip install tiptop
pip install nf-core
mkdir -p /testing
sudo chmod 777 -R /testing
ln -rs /testing .

vscode:
extensions: # based on nf-core.nf-core-extensionpack
- codezombiech.gitignore # Language support for .gitignore files
# - cssho.vscode-svgviewer # SVG viewer
- davidanson.vscode-markdownlint # Markdown/CommonMark linting and style checking for Visual Studio Code
- eamodio.gitlens # Quickly glimpse into whom, why, and when a line or code block was changed
- EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files
- Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar
- mechatroner.rainbow-csv # Highlight columns in csv files in different colors
# - nextflow.nextflow # Nextflow syntax highlighting
- oderwat.indent-rainbow # Highlight indentation level
- streetsidesoftware.code-spell-checker # Spelling checker for source code

ports:
- port: 3000
onOpen: open-preview
25 changes: 11 additions & 14 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -1,23 +1,20 @@
# .readthedocs.yml
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
# Set the version of Python and other tools you might need
build:
os: ubuntu-20.04
tools:
python: "3.9"

# Build documentation with MkDocs
# mkdocs:
# configuration: mkdocs.yml
mkdocs:
configuration: mkdocs.yml

# Optionally build your docs in additional formats such as PDF and ePub
formats: all

# Optionally set the version of Python and requirements required to build your docs
# Optionally declare the Python requirements required to build your docs
python:
version: 3.7
install:
- requirements: docs/requirements.txt
install:
- requirements: docs/requirements.txt
4 changes: 2 additions & 2 deletions .zenodo.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"description": "<p>The pipeline</p>\n\n<p>bacannot, is a customisable, easy to use, pipeline that uses state-of-the-art software for comprehensively annotating prokaryotic genomes having only Docker and Nextflow as dependencies. It is able to annotate and detect virulence and resistance genes, plasmids, secondary metabolites, genomic islands, prophages, ICEs, KO, and more.</p>",
"description": "<p>The pipeline</p>\n\n<p>bacannot, is a customisable, easy to use, pipeline that uses state-of-the-art software for comprehensively annotating prokaryotic genomes having only Docker and Nextflow as dependencies. It is able to annotate and detect virulence and resistance genes, plasmids, secondary metabolites, genomic islands, prophages, ICEs, KO, and more, while providing nice an beautiful interactive documents for results exploration.</p>",
"license": "other-open",
"title": "fmalmeida/bacannot: A generic but comprehensive bacterial annotation pipeline",
"version": "v3.0",
"version": "v3.1",
"upload_type": "software",
"creators": [
{
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
[![Nextflow version](https://img.shields.io/badge/Nextflow%20>=-v20.07-important)](https://www.nextflow.io/docs/latest/getstarted.html)
[![License](https://img.shields.io/badge/License-GPL%203-black)](https://github.com/fmalmeida/bacannot/blob/master/LICENSE)

[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/github.com/fmalmeida/bacannot)

<p align="center">

<h1 align="center">bacannot pipeline</h2>
Expand Down
121 changes: 121 additions & 0 deletions bin/addBedtoolsIntersect.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
#!/usr/bin/Rscript
# Setting Help
'usage: addBedtoolsIntersect.R [--txt=<file> --gff=<file> --type=<chr> --source=<chr> --out=<chr>]

options:
-g, --gff=<file> GFF file to merge annotation
-t, --txt=<file> Bedtools intersect file
--type=<chr> Feature type [default: BLAST]
--source=<chr> Feature source [default: CDS]
-o, --out=<chr> Output file name [default: out.gff]' -> doc

# Parse parameters
suppressMessages(library(docopt))
opt <- docopt(doc)

if (is.null(opt$gff)){
stop("At least one argument must be supplied (gff file)\n", call.=FALSE)
}

if (is.null(opt$txt)){
stop("At least one argument must be supplied (intersection file)\n", call.=FALSE)
}

# Load libraries
suppressMessages(library(ballgown))
suppressMessages(library(DataCombine))
suppressMessages(library(dplyr))
suppressMessages(library(stringr))
suppressMessages(library(tidyr))

# Function used to remove redundancy
reduce_row = function(i) {
d <- unlist(strsplit(i, split=","))
paste(unique(d), collapse = ',')
}

# Function to get Attribute Fields
getAttributeField <- function (x, field, attrsep = ";") {
s = strsplit(as.character(x), split = attrsep, fixed = TRUE)
sapply(s, function(atts) {
a = strsplit(atts, split = "=", fixed = TRUE)
m = match(field, sapply(a, "[", 1))
if (!is.na(m)) { rv = a[[m]][2]
}
else {
rv = as.character(NA)
}
return(rv)
})
}

# Operator to discard patterns found
'%ni%' <- Negate('%in%')

if (file.info(opt$txt)$size > 0) {

# Load GFF file
gff <- gffRead(opt$gff)

# Create a column in the intersection file with ids
gff$ID <- getAttributeField(gff$attributes, "ID", ";")

# Load intersection file
bedtools_intersect <- read.csv(opt$txt, header = F, sep = "\t")
colnames(bedtools_intersect) <- c("seqname1", "source1", "feature1", "start1", "end1", "score1", "strand1", "frame1", "attributes1",
"seqname2", "source2", "feature2", "start2", "end2", "score2", "strand2", "frame2", "attributes2",
"len")

# Create a column in the intersection file with ids
bedtools_intersect$ID <- getAttributeField(bedtools_intersect$attributes2, "ID", ";")

# save ids
ids <- bedtools_intersect$ID

# Subset based on gene IDs
## Lines with our IDs
sub <- gff %>%
filter(ID %in% ids) %>%
select(seqname, source, feature, start, end, score, strand, frame, attributes, ID)
## Lines without our IDs
not <- gff %>%
filter(ID %ni% ids) %>%
select(seqname, source, feature, start, end, score, strand, frame, attributes)

# Change fields values
## source
s <- sub$source
sn <- as.character(opt$source)
snew <- paste(s, sn, sep = ",")
sub$source <- snew

## feature
f <- sub$feature
fn <- as.character(opt$type)
fnew <- paste(f, fn, sep = ",")
sub$feature <- fnew

## attributes
sub <- merge.data.frame(sub, bedtools_intersect, by = "ID", all = TRUE)
new_ID <- paste(opt$source, "_ID=", sep = "", collapse = "")
sub$attributes1 <- gsub(pattern = "ID=", replacement = as.character(new_ID), x=sub$attributes1)
sub <- unite(sub, "attributes", c("attributes", "attributes1"), sep = ";") %>%
select(seqname, source, feature, start, end, score, strand, frame, attributes)

# Merge files
merged_df <- merge.data.frame(sub, not, all = TRUE)
feat <- merged_df$feature
merged_df$feature <- sapply(feat, reduce_row)
source <- merged_df$source
merged_df$source <- sapply(source, reduce_row)
merged_df <- merged_df[str_order(merged_df$attributes, numeric = TRUE), ]

# Write output
write.table(merged_df, file = opt$out, quote = FALSE, sep = "\t", col.names = FALSE, row.names = FALSE)

} else {
# Load GFF file
gff <- gffRead(opt$gff)
# Write output
write.table(gff, file = opt$out, quote = FALSE, sep = "\t", col.names = FALSE, row.names = FALSE)
}
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
3 changes: 3 additions & 0 deletions bin/build_image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash
name=$(basename $(pwd))
docker build -t fmalmeida/bacannot:${1}_${name} .
78 changes: 78 additions & 0 deletions bin/calculate_methylation_frequency.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
#! /usr/bin/env python3

import sys
import csv
import argparse
import gzip

class SiteStats:
def __init__(self, g_size, g_seq):
self.num_reads = 0
self.called_sites = 0
self.called_sites_methylated = 0
self.group_size = g_size
self.sequence = g_seq

def update_call_stats(key, num_called_cpg_sites, is_methylated, sequence):
if key not in sites:
sites[key] = SiteStats(num_called_cpg_sites, sequence)

sites[key].num_reads += 1
sites[key].called_sites += num_called_cpg_sites
if is_methylated > 0:
sites[key].called_sites_methylated += num_called_cpg_sites

parser = argparse.ArgumentParser( description='Calculate methylation frequency at genomic CpG sites')
parser.add_argument('-c', '--call-threshold', type=float, required=False, default=2.0)
parser.add_argument('-s', '--split-groups', action='store_true')
args, input_files = parser.parse_known_args()
assert(args.call_threshold is not None)

sites = dict()
# iterate over input files and collect per-site stats
for f in input_files:
if f[-3:] == ".gz":
in_fh = gzip.open(f, 'rt')
else:
in_fh = open(f)
csv_reader = csv.DictReader(in_fh, delimiter='\t')
for record in csv_reader:

num_sites = int(record['num_motifs'])
llr = float(record['log_lik_ratio'])

# Skip ambiguous call
if abs(llr) < args.call_threshold * num_sites:
continue
sequence = record['sequence']

is_methylated = llr > 0

# if this is a multi-cpg group and split_groups is set, break up these sites
if args.split_groups and num_sites > 1:
c = str(record['chromosome'])
s = int(record['start'])
e = int(record['end'])

# find the position of the first CG dinucleotide
sequence = record['sequence']
cg_pos = sequence.find("CG")
first_cg_pos = cg_pos
while cg_pos != -1:
key = (c, s + cg_pos - first_cg_pos, s + cg_pos - first_cg_pos)
update_call_stats(key, 1, is_methylated, "split-group")
cg_pos = sequence.find("CG", cg_pos + 1)
else:
key = (str(record['chromosome']), int(record['start']), int(record['end']))
update_call_stats(key, num_sites, is_methylated, sequence)

# header
print("\t".join(["chromosome", "start", "end", "num_motifs_in_group", "called_sites", "called_sites_methylated", "methylated_frequency", "group_sequence"]))

sorted_keys = sorted(list(sites.keys()), key = lambda x: x)

for key in sorted_keys:
if sites[key].called_sites > 0:
(c, s, e) = key
f = float(sites[key].called_sites_methylated) / sites[key].called_sites
print("%s\t%s\t%s\t%d\t%d\t%d\t%.3f\t%s" % (c, s, e, sites[key].group_size, sites[key].called_sites, sites[key].called_sites_methylated, f, sites[key].sequence))
17 changes: 17 additions & 0 deletions bin/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
hmmer:
bin: CHANGE_HMMSEARCH
e_value_threshold: 0.00445
pvog_path: CHANGE_PVOG
phigaro:
mean_gc: 0.46354823199323625
penalty_black: 2.2
penalty_white: 0.7
threshold_max_abs: 52.96
threshold_max_basic: 46.0
threshold_max_without_gc: 11.42
threshold_min_abs: 50.32
threshold_min_basic: 45.39
threshold_min_without_gc: 11.28
window_len: 32
prodigal:
bin: CHANGE_PRODIGAL
File renamed without changes.
Loading