Skip to content

Commit

Permalink
Changed executable name to fix errors, still need to update README
Browse files Browse the repository at this point in the history
  • Loading branch information
djwhiteastro committed Dec 11, 2018
1 parent a7dc6ab commit dcbad2f
Show file tree
Hide file tree
Showing 3 changed files with 97 additions and 14 deletions.
109 changes: 96 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,105 @@
# SeqQC
# BioExcel_SeqQC

TO BE UPDATED SOON.

Python scripts for a Sequence Quality Control pipeline, based on workflows
Python package to run a Sequence Quality Control pipeline, based on workflows
defined by IGMM.

Each tool can be run standalone, or as a whole workflow.

Requirements:
## Requirements:

- FastQC
- Cutadapt
- Python (only tested with 2.7.10/13)
- Python 3.x
- Pyyaml

We recommend using the conda package manager, and making use of virtual
environments. This tool also exists in the bioconda channel. This has the
benefit of automatically installing all pre-requisites when installing this
tool.

## Installation

There are two main ways to install the package.

### Conda package installation

#### Set up a new conda environment (optional):

```bash
$ conda create -n my_env -c bioconda python=3
```

This creates a clean Python3 environment in which to install and run the tool.
If you have a conda environment you already wish to use, make sure you add the
bioconda channel to the environment, or your conda package as a whole.

#### Install BioExcel_SeqQC
```bash
$ conda install bioexcel_seqqc
```

This one line will install BioExcel_SeqQC and all of it's dependencies.

### Manual installation

If you wish to install manually, follow the steps below. We still recommend
using some kind of virtual environment. Before running the workflow, install
the pre-requisite tools and ensure they are contained in your $PATH

```bash
$ git clone https://github.com/bioexcel/BioExcel_SeqQC.git
$ cd BioExcel_SeqQC
$ python setup.py install
```

## Usage

Once installed, there are several ways to use the tool. The easiest is to call
the executable script, which runs the whole workflow based on several options
and arguments the user can modify. Find these using

```bash
$ bioexcel_seqqc -h
```

An example of basic usage of the pipeline is:

```bash
$ bioexcel_seqqc --files in1.fa in2.fa --threads 4 --outdir ./output
```

### Editing configuration for checkFastQC stage

The tool runs an automated set of checks based on output from FastQC. The
default decision making is based on our partner preference, but these can be
changed. First, output an example configuration file (which contains the
default values):

```bash
$ bioexcel_seqqc --printconfig
```

The file lists the summary outputs from FastQC, and what decisions to make
depending on whether the files should be trimmed, rechecked, and take into
account whether they have been trimmed automatically.

### Python Module

In addition to the executable version, the tool is installed as a Python
package, so each stage can be imported as a module into other scripts, if the
user wishes to perform more unique/complicated/expanded workflows. Each function
creates and returns a python subprocess.

```python
import bioexcel_seqqc
import bioexcel_seqqc.runfastqc as rfq
import bioexcel_seqqc.runtrim as rt

# Do things before running FastQC

Above packages can be easily obtained via installing BCBio, which also includes packages used in further downstream analysis. - Make sure paths are correctly set up if done so.
fqc_process = rfq.run_fqc(infiles, fqcdir, tmpdir, threads)
fqc.wait()

## Future plans
# Do things after FastQC, and before trimming low quality reads

- Provide CWL-compliant examples and tool-descriptors for basic usage (cannot do loops within CWL so must still use Python as workflow management, but CWL runners could run whole scripts/workflow with correct tool descriptions)
- Create Singularity container of workflow
- Make it easier for user to configure logic behind PASS/WARN/FAIL flags from FastQC
trim_process = rt.trimQC(infiles, trimdir, threads):
trim_process.wait()
```
File renamed without changes.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
description=('Sequence Quality Control workflow python package'),
author='Darren White',
author_email='[email protected]',
scripts=['bin/bioexcel_seqqc'],
scripts=['bin/bxcl_seqqc'],
packages=['bioexcel_seqqc'],
package_dir={'bioexcel_seqqc': 'bioexcel_seqqc'},
install_requires=['pyyaml'],
Expand Down

0 comments on commit dcbad2f

Please sign in to comment.