A tool to search for linked gene arrays in bacterial datasets
For details on how to use SLING, please see the SLING wiki page.
To cite SLING:
Horesh G, Harms A, Fino C, Parts L, Gerdes K, Heinz E, et al. SLING: a tool to search for linked genes in bacterial datasets. Nucleic Acids Res. 2018; doi:10.1093/nar/gky738
Added option to provide gzipped fasta files.
Major changes:
-
Input: GFF format can be provided alone if the FASTA sequence is at the end of the file (for instance, PROKKA output GFF files).
-
No need to provide ID for all previous steps (prepare, scan, filter) when running any other task. SLING will assume the ID is the same as the current ID. This can be modified if running steps separately with multiple IDs (see wiki).
-
GROUP step now uses a length coverage cutoff for the alignment (how much of the query length does the alignment cover), for two proteins to be considered the same in the sequence similarity network. The default length coverage is 0.75.
-
Default values changed for maximum overlap (from 300 to 50) and minimum blast identity (from 30 to 75).
-
Sixframe ORFs and annotation ORFs are now in a single file at the end of the PREPARE step and are treated in a single file along the entire program.
-
Outputs now return the sequences in nucleotide rather than protein sequence. It is easier to convert nucleotide to protein. The headers have slightly changed. Please refer to the wiki.
-
Major code clean-up and restructuring.
Minor changes:
-
Fixed a bug with
create_db
which was relying on a deprecated version of SLING using a configuration file. -
filter
now works with a pool object and is more efficient. -
Added more checks for input of structural requirements by the user.
-
Nucleotide sequences which have more than 5% unknown bases (Ns or Xs), are removed already in the preparation step.
-
ORFs from the annotation file (GFF) which are shorter than the stated
min_orf_length
in the preparation step are removed from further analysis. -
Installation requirements set for packages (fixed networkx bug)