Skip to content

Commit

Permalink
Clarification of empty vector representation
Browse files Browse the repository at this point in the history
  • Loading branch information
pd3 committed Dec 16, 2021
1 parent 6835cd3 commit 9a5fdb0
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions VCFv4.3.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1800,6 +1800,7 @@ \subsubsection{Type encoding}
For example, one sample could have CN0:0,CN1:10 and another CN0:0,CN1:10,CN2:10.
In the situation when a genotype field contain vector values of different lengths, these are represented in BCF2 by a vector of the maximum length per sample, with all values in the each vector aligned to the left, and END\_OF\_VECTOR values assigned to all values not present in the original vector.
The BCF2 encoder / decoder must automatically add and remove these END\_OF\_VECTOR values from the vectors. Note that the use of END\_OF\_VECTOR means that it is legal to encode a vector VCF field with MISSING values.
Empty vectors (i.e. vectors with no data available) are represented by one MISSING value followed by as many END\_OF\_VECTOR values as are required to pad the vector to the appropriate length.
For example, suppose I have two samples, each with a FORMAT field X.
Sample A has values [1], while sample B has [2,3].
Expand Down

0 comments on commit 9a5fdb0

Please sign in to comment.