Skip to content

Commit

Permalink
docs: updates for v0.1.0 (#171)
Browse files Browse the repository at this point in the history
Co-authored-by: Michael Lamkin <[email protected]>
  • Loading branch information
aryarm and mlamkin7 authored Jan 30, 2023
1 parent 8ace893 commit 714142f
Show file tree
Hide file tree
Showing 11 changed files with 100 additions and 15 deletions.
2 changes: 2 additions & 0 deletions docs/commands/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ You can use the ``--no-sort`` flag to skip the sorting step if your file is alre
LC_ALL=C sort -k1,4 tests/data/simphenotype.hap | \
haptools index --no-sort --output tests/data/simphenotype.hap.gz /dev/stdin
All files used in these examples are described :doc:`here </project_info/example_files>`.


Detailed Usage
~~~~~~~~~~~~~~
Expand Down
4 changes: 3 additions & 1 deletion docs/commands/karyogram.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Additional Options
You may also specify the following options:

* ``--centromeres <FILE>`` - Path to a file describing the locations of chromosome ends and centromeres. An example file is given here: ``tests/data/centromeres_hg19.txt``. The columns are: chromosome, chrom_start, centromere, chrom_end. For acrocentric chromosomes, the centromere field is ommitted. This file format was taken from `here <https://github.com/armartin/ancestry_pipeline>`_.
* ``--colors "pop1:color1,pop2:color2..."`` - You can optionally specify which colors should be used for each population. If colors are not given, the script chooses reasonable defaults.
* ``--colors "pop1:color1,pop2:color2..."`` - You can optionally specify which colors should be used for each population. These colors entered can be the matplotlib `colors <https://matplotlib.org/stable/gallery/color/named_colors.html>`_ or inputted as hexcode. If colors are not given, the script chooses reasonable defaults.
* ``--title <TITLE>`` - Title for the resulting karyogram.
* ``--verbosity <LEVEL>`` - What level of output the logger should print to stdout. Please see `logging levels <https://docs.python.org/3/library/logging.html>`_ for output levels. Default = INFO [Optional]

Expand All @@ -44,6 +44,8 @@ This will output a file ``test_karyogram.png``. The example is shown below.
.. image:: ../images/test_karyogram.png
:alt: Example karyogram

All files used in this example are described :doc:`here </project_info/example_files>`.


Detailed Usage
~~~~~~~~~~~~~~
Expand Down
2 changes: 2 additions & 0 deletions docs/commands/ld.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ You can select a subset of variants (or haplotypes) using the ``--id`` parameter
haptools ld --from-gts -i rs543363163 -i rs7412 APOe4 tests/data/apoe.vcf.gz tests/data/apoe4.hap
All files used in these examples are described :doc:`here </project_info/example_files>`.

Detailed Usage
~~~~~~~~~~~~~~

Expand Down
5 changes: 1 addition & 4 deletions docs/commands/simgenotype.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,7 @@ If speed is important, it's generally faster to use PGEN files than VCFs.
--pop_field \
--out tests/data/example_simgenotype.pgen
Example model files found in `haptools/example-files/models/ <https://github.com/CAST-genomics/haptools/tree/main/example-files/models>`_ and `haptools/tests/data/ <https://github.com/CAST-genomics/haptools/tree/main/tests/data>`_.
Example map files found in `haptools/tests/data/map/ <https://github.com/CAST-genomics/haptools/tree/main/tests/data/map>`_.
Example ref_vcf file can be found `haptools/tests/data/outvcf_test.vcf <https://github.com/CAST-genomics/haptools/blob/main/tests/data/outvcf_test.vcf>`_.
Example sample_info files found in `haptools/example-files/ <https://github.com/CAST-genomics/haptools/tree/main/example-files>`_ and `haptools/tests/data/outvcf_info.tab <https://github.com/CAST-genomics/haptools/blob/main/tests/data/outvcf_info.tab>`_.
All files used in these examples are described :doc:`here </project_info/example_files>`.


Detailed Usage
Expand Down
2 changes: 2 additions & 0 deletions docs/commands/simphenotype.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ Let's simulate two replicates of a case/control trait that occurs in 60% of samp
--output simulated.pheno \
/dev/stdin tests/data/simphenotype.hap
All files used in these examples are described :doc:`here </project_info/example_files>`.

Detailed Usage
~~~~~~~~~~~~~~

Expand Down
2 changes: 2 additions & 0 deletions docs/commands/transform.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ To match haplotypes as well as their ancestral population labels, use the ``--an
bcftools annotate -Ov --set-id +'%CHROM\_%POS\_%REF\_%FIRST_ALT' | \
haptools transform -o output.vcf.gz /dev/stdin file.hap
All files used in these examples are described :doc:`here </project_info/example_files>`.


Detailed Usage
~~~~~~~~~~~~~~
Expand Down
19 changes: 18 additions & 1 deletion docs/formats/haplotypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,24 @@ This document describes our custom file format specification for haplotypes: the
:align: center
:alt: The .hap file format

This is a tab-separated file composed of different types of lines. The first field of each line is a single, uppercase character denoting the type of line. The following line types are supported.
Motivation
~~~~~~~~~~
``.hap`` files are optimized to store information about haplotypes and the collections of alleles that they are composed of. Notably, they are not designed to store any kind of per-sample information. Instead, :doc:`the transform command </commands/transform/>` can be used to encode each haplotype as a biallelic variant in a VCF, BCF, or PGEN file. Our intent is for the ``.hap`` file format to play a supporting role to these per-sample formats.

Our file format addresses unique challenges. As far as we know, the only file format to store equivalent kinds of information as our custom format is `PLINK 1.9 .blocks.det file <https://www.cog-genomics.org/plink/1.9/formats#blocks>`_ file. However, it may also be possible to store the columns of a ``.hap`` file within the INFO fields of a VCF. Compared to both of these formats, our file format has a few key advantages:

1. Unlike ``.blocks.det`` files, our format is designed to be indexed and queried efficiently via tabix. Our design offers an additional level of querying that is not possible for haplotypes encoded within a VCF.
2. Our format is more flexible than a ``.blocks.det`` or VCF file. In addition to storing SNP alleles within a ``.hap`` file, our format allows for the storage of either haplotype-level metadata (e.g. local ancestry labels, effect sizes) or allele-level metadata (e.g. custom scores or other information).
3. Our format is easier to generate or parse using simple unix commands or ad-hoc scripts because it uses a single field delimiter and guarantees a consistent number of fields for each line in the file.

Please refer to the supplement of our manuscript for a thorough justification of our file format.

.. TODO: add link to the manuscript here
Overview
~~~~~~~~

The ``.hap`` format describes a tab-separated file composed of different types of lines. The first field of each line is a single, uppercase character denoting the type of line. The following line types are supported.

.. list-table::
:widths: 25 25
Expand Down
8 changes: 7 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,12 @@ For example ``haptools simgenotype`` outputs a VCF file with local ancestry info
The VCF and breakpoints file output by ``haptools simgenotype`` can be used as input to ``haptools transform``, which is then used by ``haptools simphenotype`` to simulate phenotypes for a list of haplotypes.
The local ancestry breakpoints from ``haptools simgenotype`` can also be visualized using ``haptools karyogram``.

Detailed information about each command can be found in `the commands section </commands>`_. Examples there utilize files from the `tests directory <https://github.com/CAST-genomics/haptools/tree/main/tests/data>`_.
Detailed information about each command can be found in the *Commands* section of our documentation. Examples there utilize a set of example files described :doc:`here </project_info/example_files>`.

Logging
~~~~~~~

All commands output log messages to standard error. The universal ``--verbosity`` flag controls the level of detail in our logging messages. By default, this is set to ``INFO``, which will yield errors, warnings, and info messages. To get more detailed messages, set it to ``DEBUG``. To get only error messages, set it to ``ERROR``. To get errors *and* warnings, set it to ``WARNING``. Refer to `the Python documentation on logging levels <https://docs.python.org/3/library/logging.html#levels>`_ for more information.

Contributing
~~~~~~~~~~~~
Expand All @@ -51,6 +56,7 @@ Please read our :doc:`contribution guidelines </project_info/contributing>` and
:maxdepth: 1

project_info/installation
project_info/example_files
project_info/contributing

.. toctree::
Expand Down
2 changes: 1 addition & 1 deletion docs/project_info/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ To create a pull request you need to do these steps:
5. Create a new branch off of the :code:`main` branch with :code:`git checkout -b <descriptive_branch_name>`. Please follow `best practices <https://www.conventionalcommits.org/>`_ when naming your branch
6. Setup our development environment by following the instructions in :ref:`dev-setup-instructions` below
7. Make your changes to the code
8. Add additional tests to the :code:`tests/` directory and add comments to the documentation to explain how to use your new code. We use pytest for testing and sphinx/numpydoc for documentation
8. Add additional tests to the :code:`tests/` directory and add comments to the documentation to explain how to use your new code. We use pytest for testing and sphinx/numpydoc for documentation. If you add example code or an example command to the documentation, you should make sure to create an automated test that executes it, as well.
9. Run the automated code-checking steps detailed in :ref:`code-check-instructions` below
10. Commit your changes. Please use informative commit messages and do your best to ensure the commit history is clean and easy to interpret
11. Now you can push your changes to your Github copy of haptools by running :code:`git push origin <descriptive_branch_name>`
Expand Down
52 changes: 52 additions & 0 deletions docs/project_info/example_files.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
.. _project_info-example_files:

=============
Example files
=============

Locating the files
------------------

The examples throughout our documentation make use of two sets of files: test files and example files.

Test files can be found in the `tests/data/ directory <https://github.com/CAST-genomics/haptools/tree/main/tests/data>`_ of our Github repository. These are short, simplified files used exclusively by our automated test suite.

Example files can be found in the `example-files/ directory <https://github.com/CAST-genomics/haptools/tree/main/example-files>`_ of our Github repository. Unlike test files, we expect example files to be useful in your own commands. For example, if you use simgenotype with the 1000 Genomes dataset, you can use our `1000G sample_info file <https://github.com/cast-genomics/haptools/blob/main/example-files/1000genomes_sampleinfo.tsv>`_. We have also included a set of `model files <https://github.com/cast-genomics/haptools/blob/main/example-files/models>`_ that you can use to create pre-configured admixed populations.

.. _running-an-example-command:

Running an example command
--------------------------
To run any of the example code or commands in our documentation, follow these steps.

1. :doc:`Install haptools </project_info/installation>`
2. Clone our Github repository

.. code-block:: bash
git clone https://github.com/CAST-genomics/haptools.git
3. Change to the cloned directory

.. code-block:: bash
cd haptools
4. Execute the example command

Running all examples
--------------------
All of our examples are included within our test suite, which is executed regularly by our continuous integration system. To check that all of the examples work on your system, you can just have ``pytest`` automatically run all of our tests.

1. Follow the :ref:`first three steps above <running-an-example-command>`
2. Install ``pytest``

.. code-block:: bash
pip install 'pytest>=6.2.5'
3. Run our tests

.. code-block:: bash
pytest tests/
17 changes: 10 additions & 7 deletions haptools/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,14 +56,17 @@ def main():
"--colors",
type=str,
required=False,
help="Optional color dictionary. Format is e.g. 'YRI:blue,CEU:green'",
help=(
"Optional color dictionary. Input can be from the matplotlib list of colors "
"or in hexcode. Format is e.g. 'YRI:blue,CEU:green'"
),
)
@click.option(
"-v",
"--verbosity",
type=click.Choice(["CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG", "NOTSET"]),
default="INFO",
show_default="only errors",
show_default=True,
help="The level of verbosity desired",
)
def karyogram(bp, sample, out, title, centromeres, colors, verbosity):
Expand Down Expand Up @@ -193,7 +196,7 @@ def karyogram(bp, sample, out, title, centromeres, colors, verbosity):
"--verbosity",
type=click.Choice(["CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG", "NOTSET"]),
default="INFO",
show_default="only errors",
show_default=True,
help="The level of verbosity desired",
)
def simgenotype(
Expand Down Expand Up @@ -408,7 +411,7 @@ def simgenotype(
"--verbosity",
type=click.Choice(["CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG", "NOTSET"]),
default="INFO",
show_default="only errors",
show_default=True,
help="The level of verbosity desired",
)
def simphenotype(
Expand Down Expand Up @@ -573,7 +576,7 @@ def simphenotype(
"--verbosity",
type=click.Choice(["CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG", "NOTSET"]),
default="INFO",
show_default="only errors",
show_default=True,
help="The level of verbosity desired",
)
def transform(
Expand Down Expand Up @@ -736,7 +739,7 @@ def transform(
"--verbosity",
type=click.Choice(["CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG", "NOTSET"]),
default="INFO",
show_default="only errors",
show_default=True,
help="The level of verbosity desired",
)
def ld(
Expand Down Expand Up @@ -831,7 +834,7 @@ def ld(
"--verbosity",
type=click.Choice(["CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG", "NOTSET"]),
default="INFO",
show_default="only errors",
show_default=True,
help="The level of verbosity desired",
)
def index(
Expand Down

0 comments on commit 714142f

Please sign in to comment.