Skip to content

Latest commit

 

History

History
58 lines (40 loc) · 4.27 KB

Submodule_00_Glossary.md

File metadata and controls

58 lines (40 loc) · 4.27 KB

MDIBL Transcriptome Assembly Learning Module

Glossary of Terms and Acronyms

Modern Biology and Biotechnology can be very intimidating to learn because, beyond the body of knowledge, there is also a large body of terminology that must be understood and internalized to effectively work within it. Going a step further into computational aspects of biology (i.e., bioinformatics and computational biology) increase this difficulty by adding the terminology of computational tools.

Below is what we hope will be a helpful list of terms that provide a reference to help clarify the other pages in this module.

Terms and Concepts

Biology/Biotechnology

Genome : The complete DNA content of an organism, generally broken up into chromosomes or contigs

Transcript : A functional RNA molecule, generated by transcription, which copies one strand of the DNA into RNA.

Transcriptome : The collection of all RNA transcripts encoded in an organism's genome. Generally speaking, most genes can produce one or more RNA transcripts upon being activated (expressed), resulting in transcriptome sizes that are significantly larger than the number of genes. The term transcriptome is sometimes used to refer to more specific sets of RNA transcripts, but in our materials, "transcriptome" with no qualifiers will always mean the complete set of all possible transcripts.

Expressed Transcriptome : The set of transcripts that is present (activated or expressed) and can be measured in a given sample.

Tissue-specific or Cell-specific Transcriptome : The subset of the transcriptome that is expressed in either a specific tissue or cell type.

Transcriptome Profile : An experimental characterization of a sample (either bulk tissue or a single cell) that quantifies the identity and relative abundance of all transcripts measured in the sample.

Sequence Assembly : A computational process in which short fragments of sequence are integrated through alignment and joining to produce a longer sequence.

Transcriptome Assembly : Sequence assembly in which the sequenced molecules are RNA transcripts. A transcriptome assembly will generally produce thousands to hundreds of thousands of putative transcript sequences.

FASTA/FASTQ Sequence file formats : A text file representing one or more biological sequences. In a FASTA file, each sequence includes both a description/header line (which always begins with '>') and one (or more lines of sequence data). In a FASTQ file, sequence quality information is encoded for every nucleotide in the read sequence.

Computational

Workflow or Pipeline (computational) : A series of computational analysis steps carried out with a defined order and dependencies. Workflows can be conceptually defined and carried out within workflow control systems such as Nextflow.

Container System : A program control system that sets up a virtual and protected environment within a larger computer that facilitates the safe installation and execution of programs.

Container Image : The working unit of the container system. A container image includes a specific executable program (or possibly a suite of programs), along with all of the necessary supporting libraries and auxiliary information. The contents of the container are accessible only while the container is active.

Acronyms

API : Application Programming Interface → A way that different programs can interact with each other.

GFF : General Feature Format → One of several plain-text transfer files that are used to map features onto a genome data set. The file is tab-delimited.

GTF : General Transfer Format → One of several plain-text transfer files that are used to map features onto a genome data set. The file is tab-delimited.

HTML : HyperText Markup Language → A markup language often used for web development. Many programs within TransPi produce an HTML output file that often gives a visual representation of the output.