Ung Inhibitor Heuristics

This GitHub repo contains data and scripts from our paper A multimodal approach towards genomic identification of protein inhibitors of uracil-DNA glycosylase.

Scripts

The file filter_runs.py contains an example heuristics filter run. The broad workflow for a heuristics run is as follows:

Read in data from a single list of proteins (single_inputs.py) or a whole genome or list of genomes (genome_inputs.py)
Remove duplicates and trim translated sequences to the first start codon (sequence_processing.py)
Filter sequences by acidity and hydrophobicity (general_filters.py)
Filter based on glycine/proline residues or ratio of acidic and basic residues (residue_filters.py)
Filter based on presence of ESI motif and surrounding residues (either esi_filters.py or lenient_esi_filters.py which follow the lenient definition of the ESI motif).

Additional Scripts

plot_histograms.py contains scripts for plotting the distribution of lengths, molecular weights, isoelectric points, and various other physical properties across a set of proteins. This makes use of sequence_analysis.py which contains functions to get these properties for a single protein sequence.

plot_hit_rates.py is a simple dummy plot script to plot a bar chart from a list of floating point values

p56_filters.py contains functions for filtering protein sequences based on p56-type Ung inhibitor motifs (EXXYG and FXDSY). These are similar to the ESI-motif filters for Ugi and SAUGI-type UngIns but were not presented or used in the manuscript.

window_analysis.py and window_filters.py contain functions to filter proteins based on sliding-window metrics such as molecular weight and pI of discrete stretches of residues. These were explored during heuristics development to measure characteristics of putative β-strands containing ESI motifs, for example. This was not presented in the manuscript but scripts are provided here for reference.

Data

There are six data files:

All_Ugis+SaUgis.fasta contains protein sequences for all Ugi and SaUgi-type Ung inhibitors, used to make curated MSAs
supplementary_dna_sequences.fasta contains DNA sequences from Supplementary Section 1 of the Supplementary Information.
supplementary_protein_sequences.fasta contains protein sequences from Supplementary Section 1 of the Supplementary Information.
supplementary_table_s6_heuristics_matches.fasta contains protein sequences from Supplementary Table S6 of the Supplementary Information, corresponding to selected heuristics hit sequences.
Filter Tests.xlsx contains information and data from testing and tuning of heuristics filters, as well as information about each filter script.
s6_genome.fasta is an example genome from Staphylococcus phage S6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ung Inhibitor Heuristics

Scripts

Additional Scripts

Data

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
All_Ugis+SaUgis.fasta		All_Ugis+SaUgis.fasta
Filter Tests.xlsx		Filter Tests.xlsx
README.md		README.md
esi_filters.py		esi_filters.py
filter_runs.py		filter_runs.py
general_filters.py		general_filters.py
genome_inputs.py		genome_inputs.py
lenient_esi_filters.py		lenient_esi_filters.py
p56_filters.py		p56_filters.py
plot_histograms.py		plot_histograms.py
plot_hit_rates.py		plot_hit_rates.py
residue_filters.py		residue_filters.py
s6_genome.fasta		s6_genome.fasta
sequence_analysis.py		sequence_analysis.py
sequence_processing.py		sequence_processing.py
single_inputs.py		single_inputs.py
split_fasta.py		split_fasta.py
supplementary_dna_sequences.fasta		supplementary_dna_sequences.fasta
supplementary_protein_sequences.fasta		supplementary_protein_sequences.fasta
supplementary_table_s6_heuristics_matches.fasta		supplementary_table_s6_heuristics_matches.fasta
window_analysis.py		window_analysis.py
window_filters.py		window_filters.py

naailkhan28/ung_inhibitor_heuristics

Folders and files

Latest commit

History

Repository files navigation

Ung Inhibitor Heuristics

Scripts

Additional Scripts

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages