pip install berteome
Berteome makes use of the masked language model of BERT to determine predictions for all residues in a protein sequence.
The main berteome
library can be imported as follows:
from berteome import berteome
The modelLoader
class can be used to show what models are supported by
berteome
.
berteome_models = berteome.modelLoader()
berteome_models.supported_models
['Rostlab/prot_bert',
'facebook/esm2_t33_650M_UR50D',
'facebook/esm1b_t33_650M_UR50S']
All of these models are distributed through huggingface, and berteome makes great use of it’s API.
To load prot_bert model, run the following:
bert_tokenizer, bert_model = berteome_models.load_model("Rostlab/prot_bert")
Downloading: 0%| | 0.00/81.0 [00:00<?, ?B/s]
Downloading: 0%| | 0.00/112 [00:00<?, ?B/s]
Downloading: 0%| | 0.00/86.0 [00:00<?, ?B/s]
Downloading: 0%| | 0.00/361 [00:00<?, ?B/s]
Downloading: 0%| | 0.00/1.68G [00:00<?, ?B/s]
Some weights of the model checkpoint at Rostlab/prot_bert were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
The language models utilized by berteome
were trained using a masked
token approach. In this approach, a random amino acid is masked in a
protein and the model is trained to predict what the amino acid should
be. These models do this on an incredibly large amount of protein
sequences, to the point that they begin to learn the language of protein
sequence space as we currently know it. For instance, it can start to
learn, which residues are unlikely to exist at a given point in a
protein. Using these models, you can place a mask at any given residue
in the protein, and the model will generate a probability score for all
the possible amino acids that could go there.
berteome
allows the user to take the models and begin to really
investigate these predictions for a given protein sequence, by masking
every single residue in the protein sequence and predicting the
probabilities for all the possible amino acids. The result is a nice,
easy to work with pandas data frame. To make this dataframe for a very
simple peptide sequence (MENDEL
), do the following:
mendel_berteome = berteome.modelPredDF("MENDEL",bert_tokenizer, bert_model)
mendel_berteome.df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
wt | wtIndex | wtScore | n_effective | topAA | topAAscore | A | C | D | E | ... | M | N | P | Q | R | S | T | V | W | Y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | M | 1 | 0.076602 | 16.680519 | E | 0.118906 | 0.036697 | 0.011504 | 0.048245 | 0.118906 | ... | 0.076602 | 0.072661 | 0.024722 | 0.038672 | 0.043105 | 0.070280 | 0.056544 | 0.049927 | 0.007781 | 0.021699 |
1 | E | 2 | 0.074830 | 17.599154 | L | 0.106501 | 0.045721 | 0.015662 | 0.041921 | 0.074830 | ... | 0.043581 | 0.062667 | 0.025277 | 0.036911 | 0.055543 | 0.064425 | 0.049955 | 0.056789 | 0.012691 | 0.029893 |
2 | N | 3 | 0.041990 | 14.518531 | E | 0.184364 | 0.043564 | 0.009685 | 0.162590 | 0.184364 | ... | 0.041484 | 0.041990 | 0.019992 | 0.025515 | 0.029433 | 0.048106 | 0.030303 | 0.054742 | 0.007430 | 0.024924 |
3 | D | 4 | 0.049748 | 17.561047 | L | 0.109088 | 0.042083 | 0.013244 | 0.049748 | 0.086194 | ... | 0.040080 | 0.060822 | 0.032024 | 0.039689 | 0.046228 | 0.062323 | 0.044901 | 0.058937 | 0.010875 | 0.026596 |
4 | E | 5 | 0.086915 | 17.921406 | L | 0.090807 | 0.046641 | 0.018770 | 0.079822 | 0.086915 | ... | 0.028962 | 0.062234 | 0.023879 | 0.030534 | 0.040489 | 0.065195 | 0.044938 | 0.068038 | 0.012156 | 0.038034 |
5 | L | 6 | 0.060736 | 16.068075 | E | 0.152547 | 0.038191 | 0.009217 | 0.065189 | 0.152547 | ... | 0.040042 | 0.096484 | 0.020712 | 0.035022 | 0.046888 | 0.049071 | 0.046247 | 0.048276 | 0.010486 | 0.022727 |
6 rows × 26 columns
<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24" width="24px">
<style> .colab-df-container { display:flex; flex-wrap:wrap; gap: 12px; } .colab-df-convert { background-color: #E8F0FE; border: none; border-radius: 50%; cursor: pointer; display: none; fill: #1967D2; height: 32px; padding: 0 0 0 0; width: 32px; } .colab-df-convert:hover { background-color: #E2EBFA; box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15); fill: #174EA6; } [theme=dark] .colab-df-convert { background-color: #3B4455; fill: #D2E3FC; } [theme=dark] .colab-df-convert:hover { background-color: #434B5C; box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15); filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3)); fill: #FFFFFF; } </style> <script>
const buttonEl =
document.querySelector('#df-e5f727fc-7e2a-4d82-b367-c0539274e080 button.colab-df-convert');
buttonEl.style.display =
google.colab.kernel.accessAllowed ? 'block' : 'none';
async function convertToInteractive(key) {
const element = document.querySelector('#df-e5f727fc-7e2a-4d82-b367-c0539274e080');
const dataTable =
await google.colab.kernel.invokeFunction('convertToInteractive',
[key], {});
if (!dataTable) return;
const docLinkHtml = 'Like what you see? Visit the ' +
'<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
+ ' to learn more about interactive tables.';
element.innerHTML = '';
dataTable['output_type'] = 'display_data';
await google.colab.output.renderOutput(dataTable, element);
const docLink = document.createElement('div');
docLink.innerHTML = docLinkHtml;
element.appendChild(docLink);
}
</script>
</div>
This dataframe is where the true berteomic magic begins. Each row corresponds to each residue in the input protein sequence.
Here is a breakdown of some the columns in the dataframe.
-
wt
represents the actual amino acid at the given position ` -
wtIndex
is just a one-based index of the residue which makes plotting easier, may not stick around forever though..- -
wtScore
is a very interesting and important value. For a given protein, one would hope that the model would predict that the masked residue would be the same as the wild-type in the sequence. This column gives us the actual probability that the model provided for the wild type residue at that position. -
n_effective
is a measure of site-specific variability which gives a proxy of how many amino acids could occupy that site and is defined as$N_{eff}(i) = exp(-\sum p_{ji} \ln p_{ji})$ -
topAA
is the top scoring amino acid at a given position in the protein -
topAAscore
is the score of the top scoring amino acid at a given position in the protein
The remaining columns are simply the probabilities of each possible amino acid generated by the model when placing a mask at every residue in the input protein.
The average score for the wild type sequence and the top sequence are
recorded as following using the scoreSeq()
function
print(mendel_berteome.wtSeq, mendel_berteome.wtSeqScore)
MENDEL 0.06513695385878104
print(mendel_berteome.topAASeq, mendel_berteome.topAASeqScore)
ELELLE 0.127035315825644
To test the score of another given protein of the same length as the
input provide it to scoreSeq()
mendel_berteome.scoreSeq("LEDNEM")
0.08294879426692443
For a given berteome dataframe, to investigate how correlated the
predictions of the different amino acids are to each other, the
aa_correlation()
can be used to generate a correlation dataframe
mendel_berteome.aa_correlation()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | C | D | E | F | G | H | I | K | L | M | N | P | Q | R | S | T | V | W | Y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | 1.000000 | 0.728715 | 0.235810 | -0.389880 | 0.879478 | 0.295939 | 0.745629 | 0.281994 | -0.521591 | 0.733512 | -0.720194 | -0.611639 | 0.079973 | -0.433475 | -0.010752 | 0.051076 | -0.411044 | 0.833235 | 0.585926 | 0.854028 |
C | 0.728715 | 1.000000 | -0.335086 | -0.816555 | 0.854112 | 0.231240 | 0.948531 | 0.774243 | -0.042334 | 0.466360 | -0.382031 | -0.235096 | 0.369489 | 0.063834 | 0.313217 | 0.638680 | 0.247711 | 0.876376 | 0.736407 | 0.923179 |
D | 0.235810 | -0.335086 | 1.000000 | 0.765980 | 0.084237 | -0.105943 | -0.311785 | -0.822663 | -0.909457 | 0.087421 | -0.275042 | -0.581996 | -0.599214 | -0.924922 | -0.890910 | -0.671449 | -0.903984 | 0.053589 | -0.545103 | -0.021774 |
E | -0.389880 | -0.816555 | 0.765980 | 1.000000 | -0.555584 | -0.275365 | -0.756599 | -0.960437 | -0.445062 | -0.449607 | 0.096590 | -0.027763 | -0.732526 | -0.612387 | -0.710517 | -0.797275 | -0.600745 | -0.555534 | -0.767346 | -0.570185 |
F | 0.879478 | 0.854112 | 0.084237 | -0.555584 | 1.000000 | 0.456554 | 0.850721 | 0.485917 | -0.477467 | 0.699526 | -0.622552 | -0.579098 | 0.359107 | -0.254099 | -0.072739 | 0.316781 | -0.244826 | 0.988906 | 0.546931 | 0.916871 |
G | 0.295939 | 0.231240 | -0.105943 | -0.275365 | 0.456554 | 1.000000 | 0.469717 | 0.397913 | -0.077729 | 0.311335 | -0.730916 | 0.058536 | 0.495873 | 0.101611 | 0.103227 | -0.197846 | -0.268709 | 0.464575 | 0.501189 | 0.351613 |
H | 0.745629 | 0.948531 | -0.311785 | -0.756599 | 0.850721 | 0.469717 | 1.000000 | 0.780563 | -0.042422 | 0.403466 | -0.613977 | -0.096189 | 0.331730 | 0.020781 | 0.334186 | 0.428619 | 0.133945 | 0.884543 | 0.852824 | 0.949147 |
I | 0.281994 | 0.774243 | -0.822663 | -0.960437 | 0.485917 | 0.397913 | 0.780563 | 1.000000 | 0.529266 | 0.250584 | -0.168636 | 0.251695 | 0.680964 | 0.638904 | 0.732240 | 0.718683 | 0.641502 | 0.519188 | 0.816000 | 0.560873 |
K | -0.521591 | -0.042334 | -0.909457 | -0.445062 | -0.477467 | -0.077729 | -0.042422 | 0.529266 | 1.000000 | -0.363205 | 0.430718 | 0.773594 | 0.335643 | 0.889435 | 0.850884 | 0.411444 | 0.872260 | -0.447166 | 0.317516 | -0.325412 |
L | 0.733512 | 0.466360 | 0.087421 | -0.449607 | 0.699526 | 0.311335 | 0.403466 | 0.250584 | -0.363205 | 1.000000 | -0.360750 | -0.779562 | 0.554163 | -0.037801 | 0.062683 | 0.196178 | -0.320043 | 0.588138 | 0.326964 | 0.436263 |
M | -0.720194 | -0.382031 | -0.275042 | 0.096590 | -0.622552 | -0.730916 | -0.613977 | -0.168636 | 0.430718 | -0.360750 | 1.000000 | 0.161152 | 0.038821 | 0.444465 | 0.054119 | 0.430616 | 0.575785 | -0.620563 | -0.596729 | -0.652699 |
N | -0.611639 | -0.235096 | -0.581996 | -0.027763 | -0.579098 | 0.058536 | -0.096189 | 0.251695 | 0.773594 | -0.779562 | 0.161152 | 1.000000 | -0.116807 | 0.493941 | 0.512855 | -0.030990 | 0.583399 | -0.486083 | 0.203204 | -0.307123 |
P | 0.079973 | 0.369489 | -0.599214 | -0.732526 | 0.359107 | 0.495873 | 0.331730 | 0.680964 | 0.335643 | 0.554163 | 0.038821 | -0.116807 | 1.000000 | 0.711244 | 0.444691 | 0.584905 | 0.362911 | 0.320277 | 0.353894 | 0.130555 |
Q | -0.433475 | 0.063834 | -0.924922 | -0.612387 | -0.254099 | 0.101611 | 0.020781 | 0.638904 | 0.889435 | -0.037801 | 0.444465 | 0.493941 | 0.711244 | 1.000000 | 0.778685 | 0.589468 | 0.823070 | -0.252472 | 0.281362 | -0.272660 |
R | -0.010752 | 0.313217 | -0.890910 | -0.710517 | -0.072739 | 0.103227 | 0.334186 | 0.732240 | 0.850884 | 0.062683 | 0.054119 | 0.512855 | 0.444691 | 0.778685 | 1.000000 | 0.432145 | 0.713224 | -0.081317 | 0.706977 | 0.066199 |
S | 0.051076 | 0.638680 | -0.671449 | -0.797275 | 0.316781 | -0.197846 | 0.428619 | 0.718683 | 0.411444 | 0.196178 | 0.430616 | -0.030990 | 0.584905 | 0.589468 | 0.432145 | 1.000000 | 0.762126 | 0.338214 | 0.276284 | 0.313994 |
T | -0.411044 | 0.247711 | -0.903984 | -0.600745 | -0.244826 | -0.268709 | 0.133945 | 0.641502 | 0.872260 | -0.320043 | 0.575785 | 0.583399 | 0.362911 | 0.823070 | 0.713224 | 0.762126 | 1.000000 | -0.191172 | 0.265531 | -0.089424 |
V | 0.833235 | 0.876376 | 0.053589 | -0.555534 | 0.988906 | 0.464575 | 0.884543 | 0.519188 | -0.447166 | 0.588138 | -0.620563 | -0.486083 | 0.320277 | -0.252472 | -0.081317 | 0.338214 | -0.191172 | 1.000000 | 0.557270 | 0.944823 |
W | 0.585926 | 0.736407 | -0.545103 | -0.767346 | 0.546931 | 0.501189 | 0.852824 | 0.816000 | 0.317516 | 0.326964 | -0.596729 | 0.203204 | 0.353894 | 0.281362 | 0.706977 | 0.276284 | 0.265531 | 0.557270 | 1.000000 | 0.694812 |
Y | 0.854028 | 0.923179 | -0.021774 | -0.570185 | 0.916871 | 0.351613 | 0.949147 | 0.560873 | -0.325412 | 0.436263 | -0.652699 | -0.307123 | 0.130555 | -0.272660 | 0.066199 | 0.313994 | -0.089424 | 0.944823 | 0.694812 | 1.000000 |
<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24" width="24px">
<style> .colab-df-container { display:flex; flex-wrap:wrap; gap: 12px; } .colab-df-convert { background-color: #E8F0FE; border: none; border-radius: 50%; cursor: pointer; display: none; fill: #1967D2; height: 32px; padding: 0 0 0 0; width: 32px; } .colab-df-convert:hover { background-color: #E2EBFA; box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15); fill: #174EA6; } [theme=dark] .colab-df-convert { background-color: #3B4455; fill: #D2E3FC; } [theme=dark] .colab-df-convert:hover { background-color: #434B5C; box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15); filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3)); fill: #FFFFFF; } </style> <script>
const buttonEl =
document.querySelector('#df-b5a45e62-8d46-44b2-8dd4-42bd09618271 button.colab-df-convert');
buttonEl.style.display =
google.colab.kernel.accessAllowed ? 'block' : 'none';
async function convertToInteractive(key) {
const element = document.querySelector('#df-b5a45e62-8d46-44b2-8dd4-42bd09618271');
const dataTable =
await google.colab.kernel.invokeFunction('convertToInteractive',
[key], {});
if (!dataTable) return;
const docLinkHtml = 'Like what you see? Visit the ' +
'<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
+ ' to learn more about interactive tables.';
element.innerHTML = '';
dataTable['output_type'] = 'display_data';
await google.colab.output.renderOutput(dataTable, element);
const docLink = document.createElement('div');
docLink.innerHTML = docLinkHtml;
element.appendChild(docLink);
}
</script>
</div>
berteome
can also be used to generate single residue substitution
variants for the top k amino acids for a given residue in a protein. To
generate the top 3 mutational variants for MENDEL
the generate
submodule can be loaded and used as follows:
from berteome import generate
generate.top_k_variants(mendel_berteome, 3)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
sub | seq | |
---|---|---|
0 | 0subE | EENDEL |
1 | 0subK | KENDEL |
2 | 0subN | NENDEL |
3 | 1subL | MLNDEL |
4 | 1subK | MKNDEL |
5 | 1subI | MINDEL |
6 | 2subE | MEEDEL |
7 | 2subD | MEDDEL |
8 | 2subL | MELDEL |
9 | 3subL | MENLEL |
10 | 3subK | MENKEL |
11 | 3subE | MENEEL |
12 | 4subL | MENDLL |
13 | 4subD | MENDDL |
14 | 4subI | MENDIL |
15 | 5subE | MENDEE |
16 | 5subK | MENDEK |
17 | 5subN | MENDEN |
<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24" width="24px">
<style> .colab-df-container { display:flex; flex-wrap:wrap; gap: 12px; } .colab-df-convert { background-color: #E8F0FE; border: none; border-radius: 50%; cursor: pointer; display: none; fill: #1967D2; height: 32px; padding: 0 0 0 0; width: 32px; } .colab-df-convert:hover { background-color: #E2EBFA; box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15); fill: #174EA6; } [theme=dark] .colab-df-convert { background-color: #3B4455; fill: #D2E3FC; } [theme=dark] .colab-df-convert:hover { background-color: #434B5C; box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15); filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3)); fill: #FFFFFF; } </style> <script>
const buttonEl =
document.querySelector('#df-2a466348-412a-470f-9ddd-c41f12392126 button.colab-df-convert');
buttonEl.style.display =
google.colab.kernel.accessAllowed ? 'block' : 'none';
async function convertToInteractive(key) {
const element = document.querySelector('#df-2a466348-412a-470f-9ddd-c41f12392126');
const dataTable =
await google.colab.kernel.invokeFunction('convertToInteractive',
[key], {});
if (!dataTable) return;
const docLinkHtml = 'Like what you see? Visit the ' +
'<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
+ ' to learn more about interactive tables.';
element.innerHTML = '';
dataTable['output_type'] = 'display_data';
await google.colab.output.renderOutput(dataTable, element);
const docLink = document.createElement('div');
docLink.innerHTML = docLinkHtml;
element.appendChild(docLink);
}
</script>
</div>
This returns a dataframe with L x k possible single amino acid
variants. - sub
is the substitution id that indicates which residue
was substitued with what amino acid following the pattern
{residue_number}sub{substituted_amino_acid}
- seq
is the new variant
sequence.
If you’d like to take the amino acid probabilities at each residue
position to randomly generate proteins from the probability dataframe
provided by berteome, you can use n_random_seqs
generate.n_random_seqs(mendel_berteome, 10)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
seq | score | |
---|---|---|
0 | WYPDRI | 0.035417 |
1 | AIWDFM | 0.042938 |
2 | VIATPE | 0.064649 |
3 | APVMHK | 0.048056 |
4 | WRTYTY | 0.031315 |
5 | YESFPH | 0.037034 |
6 | YDEGGA | 0.065425 |
7 | PHTVQL | 0.037249 |
8 | FHNHWM | 0.025564 |
9 | ERAPYK | 0.066202 |
<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24" width="24px">
<style> .colab-df-container { display:flex; flex-wrap:wrap; gap: 12px; } .colab-df-convert { background-color: #E8F0FE; border: none; border-radius: 50%; cursor: pointer; display: none; fill: #1967D2; height: 32px; padding: 0 0 0 0; width: 32px; } .colab-df-convert:hover { background-color: #E2EBFA; box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15); fill: #174EA6; } [theme=dark] .colab-df-convert { background-color: #3B4455; fill: #D2E3FC; } [theme=dark] .colab-df-convert:hover { background-color: #434B5C; box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15); filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3)); fill: #FFFFFF; } </style> <script>
const buttonEl =
document.querySelector('#df-5fb450f0-be93-455e-8ffe-acb8fa0bd578 button.colab-df-convert');
buttonEl.style.display =
google.colab.kernel.accessAllowed ? 'block' : 'none';
async function convertToInteractive(key) {
const element = document.querySelector('#df-5fb450f0-be93-455e-8ffe-acb8fa0bd578');
const dataTable =
await google.colab.kernel.invokeFunction('convertToInteractive',
[key], {});
if (!dataTable) return;
const docLinkHtml = 'Like what you see? Visit the ' +
'<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
+ ' to learn more about interactive tables.';
element.innerHTML = '';
dataTable['output_type'] = 'display_data';
await google.colab.output.renderOutput(dataTable, element);
const docLink = document.createElement('div');
docLink.innerHTML = docLinkHtml;
element.appendChild(docLink);
}
</script>
</div>
seq
is the randomly generated sequencescore
is the average score of the amino acids chosen in the randomly generated sequence
from berteome import berteome_plot
If you would like to visualize what how wtScore
varies across the
sequence, do the following:
berteome_plot.wtScore_plot(mendel_berteome)
(<Figure size 432x288 with 1 Axes>,
<matplotlib.axes._subplots.AxesSubplot at 0x7fc806ab2460>)
Additionally, you can plot the n_effective to visualize sites that the model infers as having a lower likelyhood of possible substitutions.
berteome_plot.n_effective_plot(mendel_berteome)
(<Figure size 432x288 with 1 Axes>,
<matplotlib.axes._subplots.AxesSubplot at 0x7fc80699d070>)
berteome
also provides a method for visually inspecting the
correlations of the amino acid predictions
berteome_plot.aa_correlation_plot(mendel_berteome)
<seaborn.matrix.ClusterGrid at 0x7fc80640bb80>
If you would like to get a visual of the berteome
predictions in the
form of a seqlogo
, that can also be accomplished! Doing so potentially
reqires having a few additional dependencies installed, something along
the lines of:
!apt install ghostscript
!apt-get install -y pdf2svg
berteome_plot.seqlogo_plot(mendel_berteome)
To build the library run the following
nbdev export
Then, pip install in a development environment
pip install -e '.[dev]'
I do quite a bit of work on a chromebook, which allows for doing stuff
on github through codespace and also on google colab. To install a
particular commit hash of berteome
you can do the following:
!pip uninstall berteome
Found existing installation: berteome 0.1.5
Uninstalling berteome-0.1.5:
Would remove:
/usr/local/lib/python3.8/dist-packages/berteome-0.1.5.dist-info/*
/usr/local/lib/python3.8/dist-packages/berteome/*
Proceed (y/n)? y
Successfully uninstalled berteome-0.1.5
!pip install "berteome @ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2"
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2
Cloning https://github.com/tijeco/berteome (to revision 1e104ce687ed38a21ff72e6b58960aff6e0be6a2) to /tmp/pip-install-luvjcq66/berteome_93d26519699d4c9c816c0799b9962b68
Running command git clone --filter=blob:none --quiet https://github.com/tijeco/berteome /tmp/pip-install-luvjcq66/berteome_93d26519699d4c9c816c0799b9962b68
Running command git rev-parse -q --verify 'sha^1e104ce687ed38a21ff72e6b58960aff6e0be6a2'
Running command git fetch -q https://github.com/tijeco/berteome 1e104ce687ed38a21ff72e6b58960aff6e0be6a2
Running command git checkout -q 1e104ce687ed38a21ff72e6b58960aff6e0be6a2
Resolved https://github.com/tijeco/berteome to commit 1e104ce687ed38a21ff72e6b58960aff6e0be6a2
Preparing metadata (setup.py) ... done
Requirement already satisfied: pip in /usr/local/lib/python3.8/dist-packages (from berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (22.0.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (21.3)
Requirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (from berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.3.5)
Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (from berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.21.6)
Collecting seqlogo
Downloading seqlogo-5.29.8.tar.gz (28 kB)
Preparing metadata (setup.py) ... done
Collecting transformers
Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.8/5.8 MB 42.6 MB/s eta 0:00:00
Requirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (from berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.13.0+cu116)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2022.7)
Collecting weblogo
Downloading weblogo-3.7.12-py3-none-any.whl (571 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 571.7/571.7 KB 39.2 MB/s eta 0:00:00
Collecting ghostscript
Downloading ghostscript-0.7-py2.py3-none-any.whl (25 kB)
Requirement already satisfied: pytest in /usr/local/lib/python3.8/dist-packages (from seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (3.6.4)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torch->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (4.4.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (3.8.2)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.6/7.6 MB 78.9 MB/s eta 0:00:00
Collecting huggingface-hub<1.0,>=0.10.0
Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 182.4/182.4 KB 19.9 MB/s eta 0:00:00
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.8/dist-packages (from transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (6.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.8/dist-packages (from transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2022.6.2)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.8/dist-packages (from transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (4.64.1)
Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2.25.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.15.0)
Requirement already satisfied: setuptools>=38.6.0 in /usr/local/lib/python3.8/dist-packages (from ghostscript->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (57.4.0)
Requirement already satisfied: py>=1.5.0 in /usr/local/lib/python3.8/dist-packages (from pytest->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.11.0)
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from pytest->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (22.2.0)
Requirement already satisfied: atomicwrites>=1.0 in /usr/local/lib/python3.8/dist-packages (from pytest->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.4.1)
Requirement already satisfied: pluggy<0.8,>=0.5 in /usr/local/lib/python3.8/dist-packages (from pytest->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (0.7.1)
Requirement already satisfied: more-itertools>=4.0.0 in /usr/local/lib/python3.8/dist-packages (from pytest->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (9.0.0)
Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests->transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (4.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests->transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2022.12.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests->transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2.10)
Requirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from weblogo->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.7.3)
Building wheels for collected packages: berteome, seqlogo
Building wheel for berteome (setup.py) ... done
Created wheel for berteome: filename=berteome-0.1.5-py3-none-any.whl size=18184 sha256=5e861df9e62c18af9645aea4b0e8032d8d03e3aacbd11dd66a8567aa1bfe2e6d
Stored in directory: /root/.cache/pip/wheels/b4/3a/ca/cdd13884728b51fc6a0b5a4d093d746507172dacf725c147dd
Building wheel for seqlogo (setup.py) ... done
Created wheel for seqlogo: filename=seqlogo-5.29.8-py2.py3-none-any.whl size=19417 sha256=dccf1fe6c88ff6821b6c5ffca1fe1fcb6a69cd35d05e55bcf250a8ad81538c3c
Stored in directory: /root/.cache/pip/wheels/e7/f2/16/c7eb18def88636c56ccc5bf482af7ba59a135dc0eb437a125d
Successfully built berteome seqlogo
Installing collected packages: tokenizers, ghostscript, weblogo, huggingface-hub, transformers, seqlogo, berteome
Successfully installed berteome-0.1.5 ghostscript-0.7 huggingface-hub-0.11.1 seqlogo-5.29.8 tokenizers-0.13.2 transformers-4.25.1 weblogo-3.7.12