I'm currently working on a Python script that utilizes data from a multiple sequence alignment to color-code amino acids in a PDB structure. The script calculates amino acid frequencies and conservation, enabling the visualization of conservation patterns. This information is invaluable for identifying conserved residues, giving insights over the possible role in important roles (e.g interaction with other subinits, active sites, structural...). Additionally, the script generates a frequency table, which serves as the basis for creating a Multiple Sequence Alignment (MSA) visualization using the pyMSAviz
Python package.
Using BLAST, 500 sequence of the studied enzyme was obtained. A Multiple Sequence Alignment (MSA) was compiled using Clustal algorithm. The obtained fasta sequence is used as imput for the pyton script. The frequency of the most prevalent amino acid for each position has been determined as displayed in the following table.
Subsequently, the PDB file was modified in the bf
field. In PyMol, amino acids with an identity lower than 80 were selected (select br. b<80) and colored in gray. A gradient from blue to red was used to color amino acids with higher identity (spectrum b, blue_red, minimum=80, maximum=100), resulting in the following image:
Finally, the Multiple Sequence Alignment (MSA) visualization of the desired range (in this case position 100-160) is obtained as shown here:
Here is an example of a metalloenzyme with three putative active sites (Cu_A, Cu_B, and Cu_C). Cu_A exhibits low conservation, suggesting it is likely not an active site.
To add the bar corresponding to the conservation values, follow these steps: on the studies object click on action, and copy it to a new object (named, for example, obj01). Then, run "ramp_new color_bar, obj01, [min, max], [blue,white]. [min, max] representing the values of the b_factor chosen in the previous steps, and [blue, white] indicating the color scale chosen for visualization.