You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/api/data.rst
+124-1
Original file line number
Diff line number
Diff line change
@@ -398,7 +398,7 @@ The ``load()`` method initializes an instance of the :class:`Phenotypes` class a
398
398
phenotypes.read()
399
399
phenotypes.data # returns a np array of shape p x k
400
400
401
-
Both the ``load()`` and ``read()`` methods support the `samples` parameter that allows you to request a specific set of sample IDs to read from the file.
401
+
Both the ``load()`` and ``read()`` methods support the ``samples`` parameter that allows you to request a specific set of sample IDs to read from the file.
402
402
403
403
.. code-block:: python
404
404
@@ -454,3 +454,126 @@ Classes
454
454
Covariates
455
455
++++++++++
456
456
The :class:`Covariates` class is simply a sub-class of the :class:`Phenotypes` class. It has all of the same methods and properties. There are no major differences between the two classes, except between the file extensions that they use.
457
+
458
+
breakpoints.py
459
+
~~~~~~~~~~~~~~
460
+
Overview
461
+
--------
462
+
This module supports reading and writing files that follow the **.bp** file format specification.
463
+
464
+
Lines from the file are parsed into an instance of the :class:`Breakpoints` class.
465
+
466
+
Documentation
467
+
-------------
468
+
469
+
1. The **.bp** :ref:`format specification <formats-breakpoints>`
470
+
2. The :ref:`breakpoints.py API docs <api-haptools-data-breakpoints>` contain example usage of the :class:`Breakpoints` class
471
+
472
+
Classes
473
+
-------
474
+
Breakpoints
475
+
+++++++++++
476
+
Properties
477
+
**********
478
+
Just like all other classes in the data module, the :class:`Breakpoints` class has a ``data`` property. It is a dictionary, keyed by sample ID, where each value is a two-element list of numpy arrays (one for each chromosome). Each column in the array corresponds with a column in the breakpoints file:
479
+
480
+
1. ``pop`` - A population label (str), like 'YRI'
481
+
2. ``chrom`` - A chromosome name (str), like 'chr19' or simply '19'
482
+
3. ``bp`` - The end position of the block in base pairs (int), like 1001038
483
+
4. ``cm`` - The end position of the block in centiMorgans (float), like 43.078
484
+
485
+
The dtype of each numpy array is stored as a variable called ``HapBlock``. It is available globally in the ``breakpoints`` and ``data`` modules.
486
+
487
+
.. code-block:: python
488
+
489
+
from haptools import data
490
+
data.HapBlock # the dtype of each numpy array in the data property
breakpoints.data # returns a dictionary keyed by sample ID, where each value is a list of np arrays
500
+
501
+
The ``load()`` method initializes an instance of the :class:`Breakpoints` class and calls the ``read()`` method, but you can also call the ``read()`` method manually.
breakpoints.data # returns a dictionary keyed by sample ID, where each value is a list of np arrays
508
+
509
+
Both the ``load()`` and ``read()`` methods support the ``samples`` parameter that allows you to request a specific set of sample IDs to read from the file.
If you're worried that the contents of the **.bp** file will be large, you may opt to parse the file sample-by-sample instead of loading it all into memory at once.
519
+
520
+
In cases like these, you can use the ``__iter__()`` method in a for-loop.
In the end, we're usually only interested in the ancestral labels of a set of variant positions, as a matrix of values. The ``population_array()`` method generates a numpy array denoting the ancestral label of each sample for each variant you specify.
Copy file name to clipboardExpand all lines: docs/formats/breakpoints.rst
+36-9
Original file line number
Diff line number
Diff line change
@@ -4,14 +4,41 @@
4
4
Breakpoints
5
5
===========
6
6
7
-
* ``Sample Header`` - Name of sample following the structure ``Sample_{number}_{hap}`` eg. ``Sample_10_1`` for sample number 10 haplotype 1
8
-
* ``pop`` - Population label corresponding to the index of the population in the dat file so in the example above CEU = 1, YRI = 2
9
-
* ``chr`` - chromosome (1-22, X)
7
+
Breakpoints files (``.bp`` files) store your samples' local ancestry labels. Each line in the file denotes the ancestral population (ex: YRI or CEU) of a portion of a chromosomal strand (or *haplotype block*) of an individual.
10
8
11
-
.. code-block::
9
+
The set of haplotype blocks for an individual are delimited by a sample header of the form ``{sample}_1`` (for the first chromosomal strand) or ``{sample}_2`` (for the second chromosomal strand). Blocks from ``{sample}_1`` must be directly followed by blocks from ``{sample}_2``.
12
10
13
-
Sample Header
14
-
{pop}\t{chr}\t{pos bp}
15
-
...
16
-
Sample Header 2
17
-
...
11
+
Each set of haplotype blocks follows a tab-delimited format with the following fields. Lines within a sample's set of blocks must be sorted according to ``chrom``, ``bp``, and ``cm`` - in that order.
12
+
13
+
.. list-table::
14
+
:widths: 15 15 25
15
+
:header-rows: 1
16
+
17
+
* - Name
18
+
- Type
19
+
- Description
20
+
* - pop
21
+
- string
22
+
- The population label of this haplotype block (ex: CEU or YRI)
23
+
* - chrom
24
+
- string
25
+
- The name of the chromosome to which this haplotype block belongs (ex: chr1)
26
+
* - bp
27
+
- integer
28
+
- The base-pair position of the end of the haplotype block (ex: 1001038)
29
+
* - cm
30
+
- float
31
+
- The centimorgan position of the end of the haplotype block (ex: 43.078)
32
+
33
+
Examples
34
+
--------
35
+
36
+
See `tests/data/outvcf_test.bp <https://github.com/cast-genomics/haptools/blob/main/tests/data/outvcf_test.bp>`_ for an example of a short breakpoint file:
37
+
38
+
.. include:: ../../tests/data/outvcf_test.bp
39
+
:literal:
40
+
41
+
See `tests/data/simple.bp <https://github.com/cast-genomics/haptools/blob/main/tests/data/simple.bp>`_ for a longer example:
0 commit comments