CCF

Dynamic programming method to infer the cumulative coalescent function (CCF) from haplotype data and knowledge about allele age.

This document describes the usage of the dynamic programming technique used to infer the CCF between two genomic sequences, driven by estimates of allele age.

Dating genomic variants and shared ancestry in population-scale sequencing data
Patrick K. Albers and Gil McVean
doi: https://doi.org/10.1101/416610

See manuscript on bioRxiv: https://www.biorxiv.org/content/early/2018/09/13/416610

The CCF algorithm is described in detail in the Supplementary Text.

Compilation ...

... is straightforward, because the implementation is very simple. To compile the program, type

g++ -std=c++11 -O2 -o ccf ccf.cpp

on the command line, or execute the provided ./build.sh script. This generates a single executable called ccf.

Input

The program is executed on the command line:

./ccf /path/to/INPUT.txt

where /path/to/INPUT.txt refers to an input file, INPUT.txt, which is created separately by the user.
The description below demonstrates how such an input file is generated.

Consider two haplotype sequences (variant alleles; encoded as 0s and 1s); one from a given target individual, and one from another, comparator individual. As usual, variation is initially sorted by variant position along the chromosomal sequence. We only take the variants carried by the target genome (1s), and compare those to the same positions in the comparator, such that each variant can be labelled as either being shared (S) or unshared (U). Only the shared/unshared states are retained to form an observation sequence. This sequence is then sorted by the age of the variants (indicated by a, b, c, ..., k in the example below); from youngest to oldest. The sorted sequence of states is the input of the CCF algorithm (for simplicity encoded as 0s and 1s, but not to be confused with 0/1-encoded alleles; see example below).

Example:

Position:     0 -------------------------------- 0.5 -------------------------------- 1

Age:*                f     b  h           d  c     g     j  a  i  k           e      
                     |     |  |           |  |     |     |  |  |  |           |
  Target:      0  0  1  0  1  1  0  0  0  1  1  0  1  0  1  1  1  1  0  0  0  1  0  0
  Comparator:  1  0  1  0  1  0  0  1  0  1  1  1  0  0  1  0  0  1  0  0  0  1  1  0

State:               S     S  U           S  S     U     S  U  U  S           S

* Age of variants:     a < b < c < d < e < f < g < h < i < j < k
Observation sequence:  U   S   S   S   S   S   U   U   U   S   S
   (Encoded sequence:  0   1   1   1   1   1   0   0   0   1   1 )

In the above, the variants that are shared or unshared between the two haplotypes are used to create an encoded sequence of states (bottom). This sequence is written into an input file (INPUT.txt), where each state is given on a new line, which is then supplied to run the ccf executable.
The input file generated from the example above contains the following:

That's it.

Output

The inferred path sequence is printed to the standard output. This can be captured, for example, by typing:

./ccf /path/to/INPUT.txt > OUTPUT.txt

The output (e.g. captured in OUTPUT.txt) is the inferred fraction of the genome shared at each position along the time-sorted input sequence. To arrive at the fraction of the genome shared at different points in time, simply map the ages of the age-sorted variants to the output sequence of states.

Comment

Fractions are inferred at a discrete number of "hidden" states. By default, this is set to 200, which is hardcoded. To change this value, simply modify

#define NSTATES 200

in the ccf.cpp file before compilation.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
ccf.cpp		ccf.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CCF

See manuscript on bioRxiv: https://www.biorxiv.org/content/early/2018/09/13/416610

Compilation ...

Input

Example:

Output

Comment

About

Releases

Packages

Languages

License

pkalbers/ccf

Folders and files

Latest commit

History

Repository files navigation

CCF

See manuscript on bioRxiv: https://www.biorxiv.org/content/early/2018/09/13/416610

Compilation ...

Input

Example:

Output

Comment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages