forked from samtools/hts-specs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into cigar-64k
- Loading branch information
Showing
12 changed files
with
229 additions
and
23 deletions.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
\usepackage[margin=1in]{geometry} | ||
\usepackage{longtable} | ||
\usepackage[pdfborder={0 0 0},hyperfootnotes=false]{hyperref} | ||
\usepackage[title]{appendix} | ||
|
||
\newcommand{\mailtourl}[1]{\href{mailto:#1}{\tt #1}} | ||
\newcommand{\tagvalue}[1]{\tt #1} | ||
|
@@ -55,16 +56,17 @@ \section{Standard tags} | |
\hline | ||
{\tt AM} & i & The smallest template-independent mapping quality of segments in the rest \\ | ||
{\tt AS} & i & Alignment score generated by aligner \\ | ||
{\tt BC} & Z & Barcode sequence \\ | ||
{\tt BC} & Z & Barcode sequence identifying the sample \\ | ||
{\tt BQ} & Z & Offset to base alignment quality (BAQ) \\ | ||
{\tt BZ} & Z & Phred quality of the unique molecular barcode bases in the {\tt OX} tag \\ | ||
{\tt CC} & Z & Reference name of the next hit \\ | ||
{\tt CG} & B,I & Intended to store the real {\sf CIGAR} if it contains $>$65535 operations\\ | ||
{\tt CM} & i & Edit distance between the color sequence and the color reference (see also {\tt NM})\\ | ||
{\tt CO} & Z & Free-text comments \\ | ||
{\tt CP} & i & Leftmost coordinate of the next hit \\ | ||
{\tt CQ} & Z & Color read base qualities \\ | ||
{\tt CS} & Z & Color read sequence \\ | ||
{\tt CT} & Z & Complete read annotation tag, used for consensus annotation dummy features.\\ | ||
{\tt CT} & Z & Complete read annotation tag, used for consensus annotation dummy features \\ | ||
{\tt E2} & Z & The 2nd most likely base calls \\ | ||
{\tt FI} & i & The index of segment in the template \\ | ||
{\tt FS} & Z & Segment suffix \\ | ||
|
@@ -76,32 +78,36 @@ \section{Standard tags} | |
{\tt H1} & i & Number of 1-difference hits (see also {\tt NM}) \\ | ||
{\tt H2} & i & Number of 2-difference hits \\ | ||
{\tt HI} & i & Query hit index \\ | ||
{\tt IH} & i & Number of stored alignments in SAM that contains the query in the current record\\ | ||
{\tt IH} & i & Number of stored alignments in SAM that contains the query in the current record \\ | ||
{\tt LB} & Z & Library \\ | ||
{\tt MC} & Z & CIGAR string for mate/next segment\\ | ||
{\tt MC} & Z & CIGAR string for mate/next segment \\ | ||
{\tt MD} & Z & String for mismatching positions \\ | ||
{\tt MF} & ? & Reserved for backwards compatibility reasons \\ | ||
{\tt MI} & Z & Molecular identifier; a string that uniquely identifies the molecule from which the record was derived \\ | ||
{\tt MQ} & i & Mapping quality of the mate/next segment \\ | ||
{\tt NH} & i & Number of reported alignments that contains the query in the current record\\ | ||
{\tt NH} & i & Number of reported alignments that contains the query in the current record \\ | ||
{\tt NM} & i & Edit distance to the reference \\ | ||
{\tt OC} & Z & Original CIGAR \\ | ||
{\tt OP} & i & Original mapping position \\ | ||
{\tt OQ} & Z & Original base quality \\ | ||
{\tt OX} & Z & Original unique molecular barcode bases \\ | ||
{\tt PG} & Z & Program \\ | ||
{\tt PQ} & i & Phred likelihood of the template \\ | ||
{\tt PT} & Z & Read annotations for parts of the padded read sequence \\ | ||
{\tt PU} & Z & Platform unit \\ | ||
{\tt QT} & Z & Barcode ({\tt BC} or {\tt RT}) phred-scaled base qualities \\ | ||
{\tt Q2} & Z & Phred quality of the mate/next segment sequence in the {\tt R2} tag \\ | ||
{\tt QT} & Z & Phred quality of the sample-barcode sequence in the {\tt BC} (or {\tt RT}) tag \\ | ||
{\tt QX} & Z & Quality score of the unique molecular identifier in the {\tt RX} tag \\ | ||
{\tt R2} & Z & Sequence of the mate/next segment in the template \\ | ||
{\tt RG} & Z & Read group \\ | ||
{\tt RT} & Z & Barcode sequence (deprecated; use {\tt BC} instead) \\ | ||
{\tt RX} & Z & Sequence bases of the (possibly corrected) unique molecular identifier \\ | ||
{\tt SA} & Z & Other canonical alignments in a chimeric alignment \\ | ||
{\tt SM} & i & Template-independent mapping quality \\ | ||
{\tt SQ} & ? & Reserved for backwards compatibility reasons \\ | ||
{\tt S2} & ? & Reserved for backwards compatibility reasons \\ | ||
{\tt TC} & i & The number of segments in the template \\ | ||
{\tt U2} & Z & Phred probility of the 2nd call being wrong conditional on the best being wrong \\ | ||
{\tt U2} & Z & Phred probability of the 2nd call being wrong conditional on the best being wrong \\ | ||
{\tt UQ} & i & Phred likelihood of the segment, conditional on the mapping being correct \\ | ||
{\tt X?} & ? & Reserved for end users \\ | ||
{\tt Y?} & ? & Reserved for end users \\ | ||
|
@@ -244,10 +250,44 @@ \subsection{Barcodes} | |
|
||
\begin{description} | ||
\item[BC:Z:\tagvalue{sequence}] | ||
Barcode sequence, with any quality scores stored in the {\tt QT} tag. | ||
|
||
\item[QT:Z:\tagvalue{qualities}] | ||
Phred quality of the barcode sequence in the {\tt BC} (or {\tt RT}) tag. Same encoding as {\sf QUAL}. | ||
Barcode sequence (Identifying the sample/library), with any quality scores (optionally) stored in the {\tt QT} tag. | ||
The {\tt BC} tag should match the {\tt QT} tag in length. | ||
In the case of multiple unique molecular identifiers (e.g., one on each end of the template) the recommended implementation concatenates all the barcodes and places a hyphen (`{\tt -}') between the barcodes from the same template. | ||
|
||
\item[QT:Z:\tagvalue{qualities}] | ||
Phred quality of the sample-barcode sequence in the {\tt BC} (or {\tt RT}) tag. | ||
Same encoding as {\sf QUAL}, i.e., Phred score + 33. | ||
In the case of multiple unique molecular identifiers (e.g., one on each end of the template) the recommended implementation concatenates all the quality strings with spaces (`{\tt \textvisiblespace}') between the different strings from the same template. | ||
|
||
\item[RX:Z:\tagvalue{sequence+}] | ||
Sequence bases from the unique molecular identifier. | ||
These could be either corrected or uncorrected. Unlike {\tt MI}, the value may be non-unique in the file. | ||
Should be comprised of a sequence of bases. | ||
In the case of multiple unique molecular identifiers (e.g., one on each end of the template) the recommended implementation concatenates all the barcodes with a hyphen (`{\tt -}') between the different barcodes. | ||
|
||
If the bases represent corrected bases, the original sequence can be stored in {\tt OX} (similar to {\tt OQ} storing the original qualities of bases.) | ||
|
||
\item[QX:Z:\tagvalue{qualities+}] | ||
Phred quality of the unique molecular identifier sequence in the {\tt RX} tag. | ||
Same encoding as {\sf QUAL}, i.e., Phred score + 33. | ||
The qualities here may have been corrected (Raw bases and qualities can be stored in {\tt OX} and {\tt BZ} respectively.) | ||
The lengths of the {\tt QX} and the {\tt RX} tags must match. | ||
In the case of multiple unique molecular identifiers (e.g., one on each end of the template) the recommended implementation concatenates all the quality strings with a space (`{\tt \textvisiblespace}') between the different strings. | ||
|
||
\item[MI:Z:\tagvalue{str}] | ||
Molecular Identifier. | ||
A unique ID within the SAM file for the source molecule from which this read is derived. | ||
All reads with the same {\tt MI} tag represent the group of reads derived from the same source molecule. | ||
|
||
\item[OX:Z:\tagvalue{sequence+}] | ||
Raw (uncorrected) unique molecular identifier bases, with any quality scores (optionally) stored in the {\tt BZ} tag. | ||
In the case of multiple unique molecular identifiers (e.g., one on each end of the template) the recommended implementation concatenates all the barcodes with a hyphen (`{\tt -}') between the different barcodes. | ||
|
||
\item[BZ:Z:\tagvalue{qualities+}] | ||
Phred quality of the (uncorrected) unique molecular identifier sequence in the {\tt OX} tag. | ||
Same encoding as {\sf QUAL}, i.e., Phred score + 33. | ||
The {\tt OX} tags should match the {\tt BZ} tag in length. | ||
In the case of multiple unique molecular identifiers (e.g., one on each end of the template) the recommended implementation concatenates all the quality strings with a space (`{\tt \textvisiblespace}') between the different strings. | ||
|
||
\item[RT:Z:\tagvalue{sequence}] | ||
Deprecated alternative to {\tt BC} tag originally used at Sanger. | ||
|
@@ -345,4 +385,19 @@ \section{Locally-defined tags} | |
\url{https://github.com/samtools/hts-specs/issues} and/or by sending email | ||
to \mailtourl{[email protected]}. | ||
|
||
\begin{appendices} | ||
\appendix | ||
\section{SAM Tags History}\label{sec:history} | ||
|
||
This lists the date of each tagged SAM version along with changes that | ||
have been made while that version was current. | ||
|
||
\subsection*{1.5: 23 May 2013 to current} | ||
\begin{itemize} | ||
\item Add UMI-related tags (RX, QX, OX, BZ, MI) and clarified usage of sample barcode tag BC. (August 2017) | ||
\item SAMtags.txt (this file) created with tags from SAMv1 | ||
\end{itemize} | ||
|
||
\end{appendices} | ||
|
||
\end{document} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Oops, something went wrong.