Skip to content

Commit

Permalink
Merge pull request #10 from tomdstanton/Abaumannii_release
Browse files Browse the repository at this point in the history
Abaumannii release
  • Loading branch information
tomdstanton authored Oct 5, 2022
2 parents 0f07766 + 7d8cb94 commit 03fa488
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 31 deletions.
32 changes: 18 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ If **Kaptive** has lower confidence in the match it may mean that your assembly

**Kaptive** cannot reliably extract or annotate novel locus sequences – if you think you have a novel locus type you should investigate this further. If you think you may have a variant of a known locus, and you haven't already done so, you could try rerunning Kaptive with the appropriate variant database.

If you do have a novel locus or novel variant and you would like it to be added to the database, [please let us know](https://github.com/kelwyres/Kaptive-Web/issues).
If you do have an intact novel locus or novel variant and you would like it to be added to the database, [please let us know](https://github.com/kelwyres/Kaptive-Web/issues).

If you use **Kaptive Web** in your research, please cite this paper alongside the appropriate [reference database citations](https://github.com/kelwyres/Kaptive-Web#citation):
[Kaptive Web: user-friendly capsule and lipopolysaccharide serotype prediction for _Klebsiella_ genomes. doi: 10.1101/260125](https://www.biorxiv.org/content/early/2018/02/05/260125)
Expand Down Expand Up @@ -71,7 +71,7 @@ This is a categorical measure of match quality that was originally optimised for
* `Low` = the locus was found in a single piece or with ≥90% coverage, with ≤ 3 missing genes and ≤ 2 extra genes.
* `None` = did not qualify for any of the above.

We have found that these categorial measures also work quite well with the _Klebsiella_ O locus database, as well as the _A. baumannii_ K and OC locus databases.
We have found that these categorical measures also work quite well with the _Klebsiella_ O locus database, as well as the _A. baumannii_ K and OC locus databases.

WARNING: If you use the variant _Klebsiella_ K locus database please inspect your results carefully and decide for yourself what constitutes a confident match!

Expand Down Expand Up @@ -134,12 +134,14 @@ If you have a locus database that you would like to be added to **Kaptive Web**

#### _Klebsiella_ K locus databases

The _Klebsiella_ K locus primary reference database (`Klebsiella_k_locus_primary_reference.gbk`) comprises full-length (_galF_ to _ugd_) annotated sequences for each distinct _Klebsiella_ K locus, where available:
The _Klebsiella_ K locus primary reference database (`Klebsiella_k_locus_primary_reference.gbk`) comprises
full-length (_galF_ to _ugd_) annotated sequences for each distinct _Klebsiella_ K locus, where available:
* KL1 - KL77 correspond to the loci associated with each of the 77 serologically defined K-type references.
* KL101 and above are defined from DNA sequence data on the basis of gene content.

Note that insertion sequences (IS) are excluded from this database since we assume that the ancestral sequence was likely IS-free and IS transposase genes are not specific to the K locus.
Synthetic IS-free K locus sequences were generated for K loci for which no naturally occurring IS-free variants have been identified to date.
Synthetic IS-free K locus sequences were generated for K loci for which no naturally occurring IS-free variants have been identified to date. Read more about the _Klebsiella_ K-locus
databases on the [Kaptive wiki!](https://github.com/katholt/Kaptive/wiki/Databases-distributed-with-Kaptive#klebsiella-k-locus-database)

The variants database (`Klebsiella_k_locus_variant_reference.gbk`) comprises full-length annotated sequences for variants of the distinct loci:
* IS variants are named as KLN -1, -2 etc e.g. KL15-1 is an IS variant of KL15.
Expand All @@ -154,39 +156,42 @@ Database versions:
* Kaptive v0.6.0 includes four novel primary _Klebsiella_ K locus references defined on the basis of gene content (KL162-KL165) in this [paper.](https://www.biorxiv.org/content/10.1101/557785v1)
* Kaptive v0.7.1 and above contain updated versions of the KL53 and KL126 loci (see table below for details). The updated KL126 locus sequence will be described in McDougall, F. et al. 2020 in prep.
* Kaptive v0.7.2 and above include a novel primary _Klebsiella_ K locus reference defined on the basis of gene content (KL166), which will be described in Li, M. et al. 2020. Characterization of clinically isolated hypermucoviscous _Klebsiella pneumoniae_ in Japan. _In prep._
* Kaptive v0.7.3 and above include four novel primary _Klebsiella_ K locus references defined on the basis of gene content (KL167-KL170), which will be described in Gorrie, C. et al. 2020. Opportunity and diversity: A year of _Klebsiella pneumoniae_ infections in hospital. _In prep._
* Kaptive v0.7.3 and above include four novel primary _Klebsiella_ K locus references defined on the basis of gene content (KL167-KL170), which will be described in [Gorrie, C. et al. Nat Commun (2022)](https://doi.org/10.1038/s41467-022-30717-6)
* Kaptive v2.0.0 and above includes 16 novel primary _Klebsiella_ K locus references defined on the basis of gene content (KL171–KL186), as described in [Lam, MMC. et al. Microbial Genomics (2022).](https://doi.org/10.1099/mgen.0.000800)

Changes to the _Klebsiella_ K locus primary reference database:

| Locus | Change | Reason | Date of change | Kaptive version no. |
| ------------- | ------------- | ------------- | ------------- | ------------- |
| Locus | Change | Reason | Date of change | Kaptive version no. |
| ------------ | ------------- | ------------- | ------------- | ------------- |
| KL53 | Annotation update: _wcaJ_ changed to _wbaP_ | Error in original annotation | 21 July 2020 | v 0.7.1 |
| KL126 | Sequence update: new sequence from isolate FF923 includes _rmlBADC_ genes between _gnd_ and _ugd_ | Assembly scaffolding error in original sequence from isolate A-003-I-a-1 | 21 July 2020 | v 0.7.1 |

| KL126 | Sequence update: new sequence from isolate FF923 includes _rmlBADC_ genes between _gnd_ and _ugd_ | Assembly scaffolding error in original sequence from isolate A-003-I-a-1 | 21 July 2020 | v 0.7.1 |

#### _Klebsiella_ O locus database

The _Klebsiella_ O locus database (`Klebsiella_o_locus_primary_reference.gbk`) contains annotated sequences for 12 distinct _Klebsiella_ O loci.

O locus classification requires some special logic, as the O1 and O2 serotypes contain the same locus genes. It is two additional genes elsewhere in the chromosome (_wbbY_ and _wbbZ_) which results in the O1 antigen. Kaptive therefore looks for these genes to properly call an assembly as either O1 or O2. When only one of the two additional genes can be found, the result is ambiguous and Kaptive will report a locus type of O1/O2.

Read more about the O locus and its classification here: [The diversity of _Klebsiella_ pneumoniae surface polysaccharides](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5320592/).
Read more about the O locus and its classification here: [The diversity of _Klebsiella_ pneumoniae surface polysaccharides](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5320592/)
and on the [Kaptive wiki!](https://github.com/katholt/Kaptive/wiki/Databases-distributed-with-Kaptive#klebsiella-o-locus-database)

Database versions:
* Kaptive v0.4.0 and above include the original version of the _Klebsiella_ O locus database, as described in [Wick, R. et al. J Clin Microbiol (2019).](http://jcm.asm.org/content/56/6/e00197-18)

* Kaptive v0.4.0 and above include the original version of the _Klebsiella_ O locus database, as described in [Wick, R. et al. J Clin Microbiol (2019).](http://jcm.asm.org/content/56/6/e00197-18)
* Kaptive v2.0.0 and above includes O locus database special logic for types influenced by non-locus genes, as described in [Lam, MMC. et al. Microbial Genomics (2022).](https://doi.org/10.1099/mgen.0.000800)

#### _Acinetobacter baunannii_ K and OC locus databases

The _A. baumannii_ K (capsule) locus reference database (`Acinetobacter_baumannii_k_locus_primary_reference.gbk`) contains annotated sequences for 92 distinct K loci.
The _A. baumannii_ K (capsule) locus reference database (`Acinetobacter_baumannii_k_locus_primary_reference.gbk`) contains annotated sequences for 237 distinct K loci.
The _A. baumannii_ OC (lipooligosaccharide outer core) locus reference database (`Acinetobacter_baumannii_OC_locus_primary_reference.gbk`) contains annotated sequences for 12 distinct OC loci.

WARNING: These databases have been developed and tested specifically for _A. baumannii_ and may not be suitable for screening other _Acinetobacter_ species. You can check that your assembly is a true _A. baumannii_ by screening for the _oxaAB_ gene e.g. using blastn.

Database versions:
* Kaptive v0.7.0 and above include the original _A. baumannii_ K and OC locus databases, as described in [Wyres, KL. et al. Microbial Genomics (2020).](https://doi.org/10.1099/mgen.0.000339)
* Kaptive v2.0.1 and v2.0.2 include 145 novel _A. baumannii_ K locus references and special logic respectively for types influenced by non-locus genes, as described in [this preprint.](https://doi.org/10.1099/mgen.0.000800)

Lists of papers describing each of the individual _A. baumannii_ reference loci can be found [here](https://github.com/katholt/Kaptive/tree/master/extras).
Likewise, you can read more about the _Acinetobacter baunannii_ K and OC locus databases on the [Kaptive wiki!](https://github.com/katholt/Kaptive/wiki/Databases-distributed-with-Kaptive#acinetobacter-baunannii-k-and-oc-locus-databases)



Expand All @@ -207,7 +212,6 @@ Kaptive uses 'tblastn' to screen for the presence of each locus gene with a cove
A small number of the original _Klebsiella_ K locus references are truncated, containing only a partial <i>ugd</i> sequence. The reference annotations for these loci do not include <i>ugd</i>, so are not identified by the 'tblastn' search. Instead <b>Kaptive</b> reports the closest match to the partial sequence (if it exceeds the 90% coverage threshold).



## Installation
If you would like to install and run your own version of **Kaptive Web**, follow the instructions [here](./INSTALL.md).

Expand Down
35 changes: 22 additions & 13 deletions models/job_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -491,18 +491,22 @@ def draw_locus_image(reference_db, job_result_path, upload_path, job_uuid, seq_n
svg_temp_path = os.path.join(locus_image_folder_path, assemble_name + '_temp.svg')
png_path = os.path.join(locus_image_folder_path, assemble_name + '.png')

for record in SeqIO.parse(gbk_file, 'genbank'):
for feature in record.features:
if feature.type == 'source' and 'note' in feature.qualifiers:
for note in feature.qualifiers['note']:
if ':' in note:
if note.split(':')[1].strip() == locus:
SeqIO.write(record, gbk_path, 'genbank')
logger.debug('[' + job_uuid + '] ' + "Locus genbank file created.")
elif '=' in note:
if note.split('=')[1].strip() == locus:
SeqIO.write(record, gbk_path, 'genbank')
logger.debug('[' + job_uuid + '] ' + "Locus genbank file created.")
try: # This try-except clause prevents the re-submission of the same job if the locus image cannot be drawn due
# to a problematic record in the reference database. Tom Stanton - 03.10.2022
for record in SeqIO.parse(gbk_file, 'genbank'):
for feature in record.features:
if feature.type == 'source' and 'note' in feature.qualifiers:
for note in feature.qualifiers['note']:
if ':' in note:
if note.split(':')[1].strip() == locus:
SeqIO.write(record, gbk_path, 'genbank')
logger.debug('[' + job_uuid + '] ' + "Locus genbank file created.")
elif '=' in note:
if note.split('=')[1].strip() == locus:
SeqIO.write(record, gbk_path, 'genbank')
logger.debug('[' + job_uuid + '] ' + "Locus genbank file created.")
except Exception as e:
logger.error('[' + job_uuid + '] ' + 'Could not parse: ' + gbk_file + ' ' + str(e))

if os.path.exists(gbk_path):
A_rec = SeqIO.read(gbk_path, "genbank")
Expand Down Expand Up @@ -567,7 +571,7 @@ def draw_locus_image(reference_db, job_result_path, upload_path, job_uuid, seq_n
label=True,
name=gene_name[i] + gene_cov[i] + gene_id[i],
label_position="middle",
label_size=14,
label_size=20, # Increased from 14 - Tom Stanton 03.10.2022
label_angle=20,
label_strand=1)
i += 1
Expand Down Expand Up @@ -622,6 +626,11 @@ def draw_locus_image(reference_db, job_result_path, upload_path, job_uuid, seq_n
if elem.attrib['transform'] == "scale(1,-1) translate(0,-100)":
elem.attrib['transform'] = "scale(1,-1) translate(0,-300)"

for elem in svg.xpath('//*[attribute::style]'): # Removes the black lines and background
if not elem.text and 'path' in elem.tag: # Tom Stanton 03.10.2022
parent = elem.getparent()
parent.remove(elem)

svg = le.tostring(svg, pretty_print=True, encoding="utf-8")
cairosvg.svg2png(svg, write_to=png_path)
img = np.array(Image.open(png_path))
Expand Down
12 changes: 8 additions & 4 deletions views/default/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -52,16 +52,20 @@ <h2>Citations</h2>
<p></p>
<p></p>
<p style="font-size:20px"><b><i>K. pneumoniae</i> species complex capsule (K) locus databases</b></p>
<p style="font-size:20px">Wyres KL, Wick RR, Gorrie C, Jenney A, Follador R, Thomson NR and Holt KE 2016.<br> Identification of <i>Klebsiella</i> capsule synthesis loci from whole genome data. <i>Microbial Genomics</i> doi: <a href="http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000102" rel="nofollow">10.1099/mgen.0.000102 </a></p>
<p style="font-size:20px">Lam MMC, Wick RR, Judd LM, Holt KE and Wyres KL 2022.<br> Kaptive 2.0: updated capsule and lipopolysaccharide locus typing for the Klebsiella pneumoniae species complex. <i>Microbial Genomics</i> doi: <a href="https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000800" rel="nofollow">https://doi.org/10.1099/mgen.0.000800 </a></p>
<p></p>
<p style="font-size:20px"><b><i>K. pneumoniae</i> species complex LPS (O) locus database</b></p>
<p style="font-size:20px">Wick RR, Heinz E, Holt KE and Wyres KL 2018.<br> Kaptive Web: user-friendly capsule and lipopolysaccharide serotype prediction for <i>Klebsiella</i> genomes. <i>Journal of Clinical Microbiology</i>: <a href="https://jcm.asm.org/content/56/6/e00197-18.long" rel="nofollow">56(6). e00197-18</a></p>
<p></p>
<p style="font-size:20px"><b><i>A. baumannii</i> capsule (K) and LPS (OC) locus databases</b></p>
<p style="font-size:20px"><b><i>A. baumannii</i> capsule (K) locus databases</b></p>
<p style="font-size:20px">Cahill SM, Hall RM and Kenyon JJ 2022.<br> An update to the database for <i>Acinetobacter baumannii</i> capsular polysaccharide locus typing extends the extensive and diverse repertoire of genes found at and outside the K locus. <i>bioRxiv</i> doi: <a href="https://www.biorxiv.org/content/10.1101/2022.05.19.492579v1" rel="nofollow">https://doi.org/10.1101/2022.05.19.492579</a></p>
<p></p>
<p style="font-size:20px"><b><i>A. baumannii</i> LPS (OC) locus databases</b></p>
<p style="font-size:20px">Wyres KL, Cahill SM, Holt KE, Hall RM and Kenyon JJ 2020.<br> Identification of <i>Acinetobacter baumannii</i> loci for capsular polysaccharide (KL) and lipooligosaccharide outer core (OCL) synthesis in genome assemblies using curated reference databases compatible with Kaptive. <i>Microbial Genomics</i> doi: <a href="https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000339" rel="nofollow">10.1099/mgen.0.000339</a></p>
<p></p>
<p style="font-size:20px">A full list of citations for the individual <i>A. baumannii</i> K- and OC- locus reference sequences can be found <a href="https://github.com/katholt/Kaptive/tree/master/extras" rel="nofollow">here</a>.</p>
<p></p>
<p style="font-size:20px"><b>Current database version (as of 9th September 2020): Kaptive v0.7.3</b></p>
<p style="font-size:20px"><b>Current database version (as of 4th October 2022): Kaptive v2.0.3</b></p>

</div>
<table class="box">
Expand All @@ -77,7 +81,7 @@ <h2>Citations</h2>
</tr>
<tr>
<td style="vertical-align: top; padding:0 15px 0 15px;"><p><strong>Kaptive</strong> is an open source project for the biomedical community, licensed under GNU General Public License version 3.</p></td>
<td style="vertical-align: top; padding:0 15px 0 15px;"><p><strong>Kaptive</strong> is developed and maintained by Kelly Wyres, Ryan Wick and Kathryn Holt at the Holt Lab, Monash University.</p></td>
<td style="vertical-align: top; padding:0 15px 0 15px;"><p><strong>Kaptive</strong> is developed and maintained by Kelly Wyres, Ryan Wick, Tom Stanton and Kathryn Holt at the Holt Lab, Monash University.</p></td>
<td style="vertical-align: top; padding:0 15px 0 15px;"><p>The command line version of Kaptive can be deployed on your computer with <i>Klebsiella</i> or <i> A. baumannii</i> K-locus, O/OC-locus or other databases.</p></td>
</tr>
<tr>
Expand Down

0 comments on commit 03fa488

Please sign in to comment.