Daniel Fischer, Natural Resources Institute Finland (Luke)
Until now I implemented all my workflows in Snakemake and so far I have been happy with it. However, as it turned out that we will implement here workflows in Nextflow (and also, as it is probably equally good as snakemake), I think is is very interesting to learn about Nextflow. My current snakemake workflows are basic workflows for RNA-Seq (DE analysis), DNA-Seq (variant calling), GBS and Metagenonics. I do not plan to translate them to Nextflow, but if Nextflow is convincing, I am happy to implement future pipeline with it.
Dynamics of direct transmission of Carbapenem resistance in Enterobacterales between human families and their companion animal
Juliana Menezes, Faculdade de Medicina Veterinária, Universidade de Lisboa
During the last fifty years, the number of companion animals has substantially increased to the point that in many regions, most people have regular and intensive contact with pets, which are nowadays, often considered as “family members” enjoying close contact with their owners. The close contact of companion
animals with humans provides excellent opportunities for interspecies transmission of resistant bacteria and their resistance genes in either direction.
Carbapenemase-producing Enterobacterales represent a major public health issue and the frequency of their detection in companion animals has been increasing around the world. Surprisingly, among the pathogenic carbapenemase-producing Escherichia coli found so far, the sequence types (ST) 38, 648 and 410 have an increased proportion of detection. ST410 is considered a high-risk multidrug-resistant (MDR) clone with high potential for transmission between different hosts. Yet, there is a gap of knowledge on the dynamics of direct transmission of carbapenem resistance in Enterobacterales between human families and their companion animals. By using up-to-date and accurate Next Generation Sequencing methods, we aim to identify with high accuracy the dynamics of animal-to-human transmission of antimicrobial resistance. Ultimately safety measures needed for reducing public health risks of antimicrobial resistance under a One Health approach will be established.
Siddharth Jayaraman, The Roslin Institute
We present here BOmA - Bovine Omics Atlas, a web browser to visualise omics data across bovid species. The rapid decrease in the cost of sequencing has facilitated a revolution in the analysis of genotype–phenotype relationships in livestock and the identification of selective sweeps, structural variants and chromatin data associated with performance traits and adaptation. Data presented on BOmA will not only inform future improvement of species such as cattle and water buffalo breeds but allow across species comparisons of omics data.
Bala Kiran Manthri, SLU University
The impact of structural variations on gene expression and phenotypic variability is known to be significant but the estimation is considered to be quite difficult. In this project, we test computational approaches for identifying structural variations and software for evaluating the number of individuals that should be sequenced using long-read sequencing to improve our understanding of structural variability among Swedish cattle breeds at a population level. After a qualitative selection of genomes, a Graph-genome is built. Graph-genomes or Variation-aware genome graphs are constructed from a population of genome sequences, such that each haploid genome in this population is represented by a sequence path through the graph. Once the graph has been built, it is indexed using a specialized algorithm. Sequencing reads are then aligned to the graph, and this is more accurate than the alignment of reads to a single reference genome. The alignments are then reported and stored. In this way, the graph genome is used to call structural variations. These structural variations can give us a better understanding of Swedish dairy cattle.
Meenu Bhati, ETH, Zurich
Mobile genetic elements (MGE) constitute a large fraction of the whole genome ranges from 12 to 85% which contribute to the plasticity of genome in eukaryotes. MGE are one of the major sources of genome evolution by causing de novo mutations with regulatory and structural effects. The current work includes the identification of MGE insertion sites in the cattle genome which is part of BovReg project work package 2. For this, I am working at GIGA, Liege (Belgium) to deploy their MGE insertions sites pipeline in swiss cattle breeds. Currently, we are using whole-genome re-sequenced data of 480 Brown Swiss and Original Braunvieh cattle breeds with coverage ranging from 3 to 60-fold. We have aligned the data using BWA-mem and now using an in-house script “Locater” to identify MGE insertion sites. Under the BovReg project, we are hoping to use the Nextflow to setup a more scalable pipeline.
Praveen Krishna Chitneedi, Leibniz Institute for Farm Animal Biology (FBN)
The genotypes of Holstein*Charolais animals were imputed to whole genome sequence (WGS) level using Beagle V5 and using bash scripts, with step wise imputation from 6k to 50k , 50k to HD and finally from HD to WGS. These imputed genotypes and the phenotypes of different BovReg traits (fertility, milk production, feed efficiency) and plasma metabolites were used to carry out GWAS with GCTA tool. Our GWAS results were provided to different BovReg partners and we will perform mQTL meta analysis using METAL tool and For eQTL analysis, 3,200 existing RNA-Seq datasets available to the BovReg partners from previous national and international projects from 12 tissues relevant to the BovReg target traits will be processed and aligned to the transcriptome annotation provided by WP2. RNA-Seq read counts will be normalized and count matrices created per feature (exon, transcript, gene, splicing) for each RNA-Seq dataset.
Andreia Amaral, Faculdade de Medicina Veterinária da Universidade de Lisboa
IsomiR Window enables the discovery of isomiRs and identification of all annotated non-coding RNAs for animals and plants. This platform comprises two main components, the IsomiR Window pipeline for data processing and the IsomiR Window Web interface. It integrates over ten third-party software for the analysis of small-RNA-seq data and holds a new algorithm that allows the detection of all possible types of isomiRs, 3’ and 5’end isomiRs, 3’ tailing, isomiRs with SNPs and potential RNA editings as well as all possible fuzzy combinations. Currently the IsomiR Window is deployed as a Virtual Machine that includes all third party software plus novel algorithms. In the near future we aim to provide an nextflow and nf-core compliant version."
Gabriel Costa, University of Liège
Under BovReg project, our main goal is to establish a map of functionally active regulatory and structural elements in the bovine genome using t six new bovine cell lines and a comprehensive catalogue of at least 24 tissues collected from individuals: of both sexes, from at least three divergent breeds/crosses kept in different environments. To achieve this goal, GIGA/ULIEGE will focus on polyA+, whole-transcriptome sequencing using RNA-Seq and small RNA-Seq. ChIP-seq [Transcription factor binding (CTCF) and histone modification marks (H3K4me3, H3K4me1, H3K27me3, H3K27ac)] and ATAC-seq will be used to reveal regions open for regulatory processes. In this context, Nextflow and nf-core could be useful as we will apply standard and reproducible bioinformatics pipelines and the new tools and methodology developed by CRG to analyse the data.
Tuan Nguyen, Agriculture Victoria
At Agriculture Victoria Research (AVR), we conduct multiple large-scale sequence level genomics studies in the Australian & international dairy industry. This demands high quality imputation as well as testing of new imputation software, to ensure the integrity of downstream multi-omics analyses. Recently for example we imputed over 200,000 animals to sequence level. Many of these industry animals are genotyped using low-density custom SNP chips, and typically, we impute using a genotyped reference population at medium density, then to high density, at each stage replacing any non-overlapping genotypes before imputation to the current sequence level variant set of the 1000 Bull Genome Project. With numerous SNP chips available on the market and the need to repeat the task regularly, it creates a bottleneck for our analyses as there are multiple steps involved in the process. Therefore, we anticipated that by employing NextFlow we could parallelize (and standardize) multiple SNP chip imputations that would include evaluation of imputed sequence data integrity, as well as a pipeline for testing new imputation software. In addition to the aforementioned interest, AVR is also undertaking short read and ultra-long read sequencing and is a member of the BovReg and FAANG consortium: as a result, we generate a lot of datasets (including GWAS, RNAseq, ChIPseq analyses), these also pose a great opportunity to create an automated and scalable workflow for downstream analyses. I personally am looking forward to learning more and testing the capability of NextFlow to maintain a consistent environment across runs, resume execution upon failure, and most importantly improve scalability for large projects.
Improving genome annotations with RNA-seq data: a new pipeline to combine transcript reconstruction and expression assessment
Cyril Kurylo, INRAE
"In the context of the FAANG global initiative, many RNA-seq datasets are currently being generated for different cell types of different livestock species. It is therefore important to improve and extend the reference gene annotation of these species with transcripts found in these new samples, in a fully automated way. While many RNA-seq pipelines are currently available to build de novo gene models or quantify gene expression levels using a provided gene annotation, none of them allow both transcript reconstruction and expression assessment from RNA-seq data in a reproducible way. To fill in this gap, we have developed the GENE-SWitCH RNA-seq pipeline (https://github.com/FAANG/proj-gs-rna-seq), that uses STAR to map reads and StringTie to reconstruct transcripts and genes using a reference gene annotation as template, and quantify their expression.Our pipeline uses the Nextflow framework and follows the nf-core specifications. It provides a containerized environment that makes it compatible with a variety of high-performance computing platforms and workload orchestrators. It is designed to be easy to use and flexible. As such we require a minimal set of inputs: a set of RNA-seq read files, a reference genome, and its gene annotation. Optionally, a simple tabulated metadata file can also be provided to describe the experimental design and seamlessly merge samples according to specified factors. The pipeline automatically generates a large variety of complementary quality controls, for raw, trimmed and mapped reads, but also for various genomic features (exons, transcripts, genes). Expression tables are also provided with read counts for annotated and predicted genes and transcripts allowing further comparative expression analyses. We believe that the GENE-SWitCH RNA-seq pipeline offers a useful, powerful and easy-to-use way to process RNA-seq data, nicely complementing the existing nf-core catalogue of bioinformatics tools."
Link-HD: a flexible tool to integrate and explore association between multiple microbial communities
Yuliaxis Ramayo Caldas, IRTA
We present a computational approach designed to integrate multiple datasets from a holistic view. Link-HD is a generalization of STATIS-ACT, a family of multivariate methods to integrate multiple datasets. We complement the classical methodology by incorporating distances and transformations to deal with compositional data. Link-HD also comprises clustering, regression techniques, differential abundance testing, enrichment taxonomic analysis and visualization tools that make it useful for analyzing multiple communities and capturing covariation with host performance. The functionalities of Link-HD are exemplified by integrating microbial communities from two gastrointestinal ecosystems: 1) ruminal communities (bacteria, archaea, and protozoa) from 65 cows for which methane yield (CH4y) was individually measured; and 2) the gut microbial communities (bacteria and protozoa) from 400 pigs in which immunity traits were assessed. Our tool enables us integrating multiple microbial data and associate them with host phenotypes. We confirm the relationships between rumen microbiota structure and CH4 emission. In addition, we identify microbial biomarkers associated to CH4. Focusing on the pig dataset, we recover the enterotype structure of the pig gut ecosystem and discover novel associations between pig microbiota structure and the concentrations of acute phase proteins (both C-reactive protein and haptoglobin) in serum and circulating immunoglobulins in plasma. In summary, our results demonstrate the usefulness of Link-HD for integrating heterogeneous data communities and to associate them with complex phenotypes. The source code, examples, and usage manual can be found in Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/LinkHD.html
Jani de Vos, Wageningen University & Research
DNA methylation is the addition of a methyl group to cytosine which is typically located 5’ from a guanine, in so-called CpG sites throughout the genome in vertebrates. Gene expression is influenced by DNA methylation and changes in DNA methylation is susceptible to environmental factors (Goldberg et al., 2007). ‘GENE-SWitCH aims to deliver new underpinning knowledge on the functional genomes of two main monogastric farm species (pig and chicken) and to enable immediate translation to the pig and poultry sectors’ (https://www.gene-switch.eu/). One of the objectives of GENE-SWitCH is to develop an up to date, robust and system transferable bioinformatic pipelines for the analysis of the different sequencing data. DNA methylation has been under investigation for some time and the development of a comprehensive bioinformatics pipeline is useful for achieving standardized results which are important for comparative analysis across samples and species. There are several methylation pipelines available, as well as other methylation analysis software which can be implemented for the development of this pipeline. For the development of this pipeline a set of Pipeline development guidelines, standards and coding principles from GENE-SWitCH will be followed. The existing nf-core/methylseq pipeline for methylation analysis will be further expanded to meet the criterion set by GENE-SWitCH. Additional downstream analysis which are required for other GENE-SWitCH work packages will be implemented into the pipeline, as well as additional visualization tools. The main aim of this methylation pipeline is creating a workflow suitable for a large scale of various analysis.