AlphaPeptDeep (peptdeep
for short) aims to easily build new deep
learning models for shotgun proteomics studies. Transfer learning is
also easy to apply using AlphaPeptDeep.
It contains some built-in models such as retention time (RT), collision cross section (CCS), and tandem mass spectrum (MS2) prediction for given peptides. With these models, one can easily generate a predicted library from fasta files.
For details, check out our publications.
For documentation, see readthedocs.
- alphabase: Infrastructure for AlphaX Ecosystem
- alphapept: DDA search engine
- alphapeptdeep: Deep learning for proteomics
- alpharaw: Raw data accessing
- alphaviz: MS data and result visualization
- alphatims: timsTOF data accessing
- peptdeep_hla: the DL model that predict if a peptide is presented by indivudual HLA or not.
- Dimethyl: the MS2/RT/CCS models for Dimethyl-labeled peptides.
Wen-Feng Zeng, Xie-Xuan Zhou, Sander Willems, Constantin Ammar, Maria Wahle, Isabell Bludau, Eugenia Voytik, Maximillian T. Strauss & Matthias Mann. AlphaPeptDeep: a modular deep learning framework to predict peptide properties for proteomics. Nat Commun 13, 7238 (2022). https://doi.org/10.1038/s41467-022-34904-3
AlphaPeptDeep was developed by the Mann Labs at the Max Planck Institute of Biochemistry and the University of Copenhagen and is freely available with an Apache License. External Python packages (available in the requirements folder) have their own licenses, which can be consulted on their respective websites.
AlphaPeptDeep can be installed and used on all major operating systems (Windows, macOS and Linux).
There are three different types of installation possible:
- One-click GUI installer: Choose this installation if you only want the GUI and/or keep things as simple as possible.
- Pip installer: Choose this installation if you want to use peptdeep as a Python package in an existing Python (recommended Python 3.8 or 3.9) environment (e.g. a Jupyter notebook). If needed, the GUI and CLI can be installed with pip as well.
- Developer installer: Choose this installation if you are familiar with CLI tools, conda and Python. This installation allows access to all available features of peptdeep and even allows to modify its source code directly. Generally, the developer version of peptdeep outperforms the precompiled versions which makes this the installation of choice for high-throughput experiments.
The GUI of peptdeep is a completely stand-alone tool that requires no knowledge of Python or CLI tools. Click on one of the links below to download the latest release for:
Older releases remain available on the release page, but no backwards compatibility is guaranteed.
Note that, as GitHub does not allow large release files, these installers do not have GPU support.
To create GPU version installers: clone the source code, install the GPU-version of pytorch see here,
and then use the build_installer_*.sh
and build_package_*.sh
script in the respective release/[macos, linux, windows]
folder to build the installer locally.
For Linux you need to additionally pass the "GPU" flag, i.e. run
release/linux/build_installer_linux.sh GPU
release/linux/build_package_linux.sh
PythonNET must be installed to access Thermo or Sciex raw data.
Legacy, should be replaced by AlphaRaw in the near future.
Automatically installed for Windows.
- Install Mono from mono-project website Mono Linux. NOTE, the installed mono version should be at least 6.10, which requires you to add the ppa to your trusted sources!
- Install PythonNET with
pip install pythonnet
.
- Install brew and pkg-config:
brew install pkg-config
3. Install Mono from mono-project website Mono Mac- Register the Mono-Path to your system: For macOS Catalina, open the configuration of zsh via the terminal:
- Type
nano ~/.zshrc
to open the configuration of the terminal- Append the mono path to your
PKG_CONFIG_PATH
:export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/lib/pkgconfig:/Library/Frameworks/Mono.framework/Versions/Current/lib/pkgconfig:$PKG_CONFIG_PATH
.- Save everything and execute
. ~/.zshrc
- Install PythonNET with
pip install pythonnet
.
peptdeep can be installed in an existing Python environment with a
single bash
command. This bash
command can also be run directly
from within a Jupyter notebook by prepending it with a !
:
pip install peptdeep
Installing peptdeep like this avoids conflicts when integrating it in other tools, as this does not enforce strict versioning of dependancies. However, if new versions of dependancies are released, they are not guaranteed to be fully compatible with peptdeep. This should only occur in rare cases where dependencies are not backwards compatible.
TODO You can always force peptdeep to use dependancy versions which are known to be compatible with:
pip install "peptdeep[stable]"
NOTE: You might need to run
pip install pip
before installing peptdeep like this. Also note the double quotes"
.
For those who are really adventurous, it is also possible to directly
install any branch (e.g. @development
) with any extras
(e.g. #egg=peptdeep[stable,development-stable]
) from GitHub with e.g.
pip install "git+https://github.com/MannLabs/alphapeptdeep.git@development#egg=peptdeep[stable,development-stable]"
To enable GPU, GPU version of PyTorch is required, it can be installed with:
pip install torch --extra-index-url https://download.pytorch.org/whl/cu116 --upgrade
Note that this may depend on your NVIDIA driver version. Run the command to check your NVIDIA driver:
nvidia-smi
For latest pytorch version, see pytorch.org.
peptdeep can also be installed in editable (i.e. developer) mode with a
few bash
commands. This allows to fully customize the software and
even modify the source code to your specific needs. When an editable
Python package is installed, its source code is stored in a transparent
location of your choice. While optional, it is advised to first (create
and) navigate to e.g. a general software folder:
mkdir ~/alphapeptdeep/project/folder
cd ~/alphapeptdeep/project/folder
The following commands assume you do not perform any additional cd
commands anymore.
Next, download the peptdeep repository from GitHub either directly or
with a git
command. This creates a new peptdeep subfolder in your
current directory.
git clone https://github.com/MannLabs/alphapeptdeep.git
For any Python package, it is highly recommended to use a separate conda virtual environment, as otherwise dependancy conflicts can occur with already existing packages.
conda create --name peptdeep python=3.9 -y
conda activate peptdeep
Finally, peptdeep and all its dependancies need to be
installed. To take advantage of all features and allow development (with
the -e
flag), this is best done by also installing the development
dependencies instead of only
the core dependencies:
pip install -e ".[development]"
By default this installs loose dependancies (no explicit versioning),
although it is also possible to use stable dependencies
(e.g. pip install -e ".[stable,development-stable]"
).
By using the editable flag -e
, all modifications to the peptdeep
source code folder are directly reflected when running
peptdeep. Note that the peptdeep folder cannot be moved and/or renamed
if an editable version is installed. In case of confusion, you can
always retrieve the location of any Python module with e.g. the command
import module
followed by module.__file__
.
There are three ways to use peptdeep:
NOTE: The first time you use a fresh installation of peptdeep, it is often quite slow because some functions might still need compilation on your local operating system and architecture. Subsequent use should be a lot faster.
If the GUI was not installed through a one-click GUI installer, it can
be launched with the following bash
command:
peptdeep gui
This command will start a web server and automatically open the default browser:
There are several options in the GUI (left panel):
- Server: Start/stop the task server, check tasks in the task queue
- Settings: Configure common settings, load/save current settings
- Model: Configure DL models for prediction or transfer learning
- Transfer: Refine the models
- Library: Predict a library
- Rescore: Perform ML feature extraction and Percolator
The CLI can be run with the following command (after activating the
conda
environment with conda activate peptdeep
or if an alias was
set to the peptdeep executable):
peptdeep -h
It is possible to get help about each function and their (required)
parameters by using the -h
flag. AlphaPeptDeep provides several
commands for different tasks:
Run a command to check usages:
peptdeep $command -h
For example:
peptdeep library -h
peptdeep export-settings C:/path/to/settings.yaml
This command will export the default settings into the settings.yaml
as a template, users can edit the yaml file to run other commands.
Here is a section of the yaml file which controls global parameters for different tasks:
model_url: "https://github.com/MannLabs/alphapeptdeep/releases/download/pre-trained-models/pretrained_models.zip"
task_type: library
task_type_choices:
- library
- train
- rescore
thread_num: 8
torch_device:
device_type: gpu
device_type_choices:
- gpu
- mps
- cpu
device_ids: []
log_level: info
log_level_choices:
- debug
- info
- warning
- error
- critical
common:
modloss_importance_level: 1.0
user_defined_modifications: {}
# For example,
# user_defined_modifications:
# "Dimethyl2@Any_N-term":
# composition: "H(2)2H(2)C(2)"
# modloss_composition: "H(0)" # can be without if no modloss
# "Dimethyl2@K":
# composition: "H(2)2H(2)C(2)"
# "Dimethyl6@Any_N-term":
# composition: "2H(4)13C(2)"
# "Dimethyl6@K":
# composition: "2H(4)13C(2)"
peak_matching:
ms2_ppm: True
ms2_tol_value: 20.0
ms1_ppm: True
ms1_tol_value: 20.0
model_mgr:
default_nce: 30.0
default_instrument: Lumos
mask_modloss: True
model_type: generic
model_choices:
- generic
- phos
- hla # same as generic
- digly
external_ms2_model: ''
external_rt_model: ''
external_ccs_model: ''
instrument_group:
ThermoTOF: ThermoTOF
Astral: ThermoTOF
Lumos: Lumos
QE: QE
timsTOF: timsTOF
SciexTOF: SciexTOF
Fusion: Lumos
Eclipse: Lumos
Velos: Lumos # not important
Elite: Lumos # not important
OrbitrapTribrid: Lumos
ThermoTribrid: Lumos
QE+: QE
QEHF: QE
QEHFX: QE
Exploris: QE
Exploris480: QE
predict:
batch_size_ms2: 512
batch_size_rt_ccs: 1024
verbose: True
multiprocessing: True
The model_mgr
section in the yaml defines the common settings for
MS2/RT/CCS prediction.
peptdeep cmd-flow ...
Support CLI parameters to control global_settings
for CLI users. It supports three workflows: train
, library
or train library
, controlled by CLI parameter --task_workflow
, for example, --task_workflow train library
. All settings in global_settings are converted to CLI parameters using --
as the dict level indicator, for example, global_settings["library"]["var_mods"]
corresponds to --library--var_mods
. See test_cmd_flow.sh for example.
There are three kinds of parameter types:
- value type (int, float, bool, str): The CLI parameter only has a single value, for instance:
--model_mgr--default_instrument 30.0
. - list type (list): The CLI parameter has a list of values seperated by a space, for instance
--library--var_mods "Oxidation@M" "Acetyl@Protein_N-term"
. - dict type (dict): Only three parameters are
dict type
,--library--labeling_channels
,--model_mgr--transfer--psm_modification_mapping
, and--common--user_defined_modifications
. Here are the examples: ---library--labeling_channels
: labeling channels for the library. Example:--library--labeling_channels "0:Dimethyl@Any_N-term;Dimethyl@K" "4:xx@Any_N-term;xx@K"
---model_mgr--transfer--psm_modification_mapping
: converting other search engines' modification names to alphabase modifications for transfer learning. Example:--model_mgr--transfer--psm_modification_mapping "Dimethyl@Any_N-term:_(Dimethyl-n-0);_(Dimethyl)" "Dimethyl@K:K(Dimethyl-K-0);K(Dimethyl)"
. Note thatX(UniMod:id)
format can directly be recognized by alphabase. ---common--user_defined_modification
: user defined modifications. Example:--common--user_defined_modification "NewMod1@Any_N-term:H(2)2H(2)C(2)" "NewMod2@K:H(100)O(2)C(2)"
peptdeep library settings_yaml
This command will predict a spectral library for given settings_yaml
file (exported by export-settings). All the
essential settings are in the library
section in the settings_yaml
file:
library:
infile_type: fasta
infile_type_choices:
- fasta
- sequence_table
- peptide_table # sequence with mods and mod_sites
- precursor_table # peptide with charge state
infiles:
- xxx.fasta
fasta:
protease: 'trypsin'
protease_choices:
- 'trypsin'
- '([KR])'
- 'trypsin_not_P'
- '([KR](?=[^P]))'
- 'lys-c'
- 'K'
- 'lys-n'
- '\w(?=K)'
- 'chymotrypsin'
- 'asp-n'
- 'glu-c'
max_miss_cleave: 2
add_contaminants: False
fix_mods:
- Carbamidomethyl@C
var_mods:
- Acetyl@Protein_N-term
- Oxidation@M
special_mods: [] # normally for Phospho or GlyGly@K
special_mods_cannot_modify_pep_n_term: False
special_mods_cannot_modify_pep_c_term: False
labeling_channels: {}
# For example,
# labeling_channels:
# 0: ['Dimethyl@Any_N-term','Dimethyl@K']
# 4: ['Dimethyl:2H(2)@Any_N-term','Dimethyl:2H(2)@K']
# 8: [...]
min_var_mod_num: 0
max_var_mod_num: 2
min_special_mod_num: 0
max_special_mod_num: 1
min_precursor_charge: 2
max_precursor_charge: 4
min_peptide_len: 7
max_peptide_len: 35
min_precursor_mz: 200.0
max_precursor_mz: 2000.0
decoy: pseudo_reverse
decoy_choices:
- pseudo_reverse
- diann
- None
max_frag_charge: 2
frag_types:
- b
- y
rt_to_irt: True
generate_precursor_isotope: False
output_folder: "{PEPTDEEP_HOME}/spec_libs"
output_tsv:
enabled: False
min_fragment_mz: 200
max_fragment_mz: 2000
min_relative_intensity: 0.001
keep_higest_k_peaks: 12
translate_batch_size: 1000000
translate_mod_to_unimod_id: False
peptdeep will load sequence data based on library:infile_type
and library:infiles
for library prediction.
library:infiles
contains the list of files with
library:infile_type
defined in
library:infile_type_choices
:
- fasta: Protein fasta files, peptdeep will digest the protein sequences into peptide sequences.
- sequence_table: Tab/comma-delimited txt/tsv/csv
(text) files which contain the column
sequence
for peptide sequences. - peptide_table: Tab/comma-delimited txt/tsv/csv
(text) files which contain the columns
sequence
,mods
, andmod_sites
. peptdeep will not add modifications for peptides of this file type. - precursor_table: Tab/comma-delimited txt/tsv/csv
(text) files which contain the columns
sequence
,mods
,mod_sites
, andcharge
. peptdeep will not add modifications and charge states for peptides of this file type.
See examples:
import pandas as pd
df = pd.DataFrame({
'sequence': ['ACDEFGHIK','LMNPQRSTVK','WYVSTR'],
'mods': ['Carbamidomethyl@C','Acetyl@Protein_N-term;Phospho@S',''],
'mod_sites': ['2','0;7',''],
'charge': [2,3,1],
})
df[['sequence']]
sequence | |
---|---|
0 | ACDEFGHIK |
1 | LMNPQRSTVK |
2 | WYVSTR |
df[['sequence','mods','mod_sites']]
sequence | mods | mod_sites | |
---|---|---|---|
0 | ACDEFGHIK | Carbamidomethyl@C | 2 |
1 | LMNPQRSTVK | Acetyl@Protein_N-term;Phospho@S | 0;7 |
2 | WYVSTR |
df
sequence | mods | mod_sites | charge | |
---|---|---|---|---|
0 | ACDEFGHIK | Carbamidomethyl@C | 2 | 2 |
1 | LMNPQRSTVK | Acetyl@Protein_N-term;Phospho@S | 0;7 | 3 |
2 | WYVSTR | 1 |
Columns of
proteins
andgenes
are optional for these txt/tsv/csv files.
peptdeep supports multiple files for library prediction, for example (in the yaml file):
library:
...
infile_type: fasta
infiles:
- /path/to/fasta/human.fasta
- /path/to/fasta/yeast.fasta
...
The library in HDF5 (.hdf) format will be saved into
library:output_folder
. If library:output_tsv:enabled
is True, a TSV
spectral library that can be processed by DIA-NN and Spectronaut will
also be saved into library:output_folder
.
peptdeep transfer settings_yaml
This command will apply transfer learning to refine RT/CCS/MS2 models
based on model_mgr:transfer:psm_files
and
model_mgr:transfer:psm_type
. All yaml settings (exported by
export-settings) related to this command are:
model_mgr:
transfer:
model_output_folder: "{PEPTDEEP_HOME}/refined_models"
epoch_ms2: 20
warmup_epoch_ms2: 10
batch_size_ms2: 512
lr_ms2: 0.0001
epoch_rt_ccs: 40
warmup_epoch_rt_ccs: 10
batch_size_rt_ccs: 1024
lr_rt_ccs: 0.0001
verbose: False
grid_nce_search: False
grid_nce_first: 15.0
grid_nce_last: 45.0
grid_nce_step: 3.0
grid_instrument: ['Lumos']
psm_type: alphapept
psm_type_choices:
- alphapept
- pfind
- maxquant
- diann
- speclib_tsv
psm_files: []
ms_file_type: alphapept_hdf
ms_file_type_choices:
- alphapept_hdf
- thermo_raw
- mgf
- mzml
ms_files: []
psm_num_to_train_ms2: 100000000
psm_num_per_mod_to_train_ms2: 50
psm_num_to_test_ms2: 0
psm_num_to_train_rt_ccs: 100000000
psm_num_per_mod_to_train_rt_ccs: 50
psm_num_to_test_rt_ccs: 0
top_n_mods_to_train: 10
psm_modification_mapping: {}
# alphabase modification to modifications of other search engines
# For example,
# psm_modification_mapping:
# Dimethyl@Any_N-term:
# - _(Dimethyl-n-0)
# - _(Dimethyl)
# Dimethyl:2H(2)@K:
# - K(Dimethyl-K-2)
# ...
For DDA data, peptdeep can also extract MS2 intensities from the
spectrum files from model_mgr:transfer:ms_files
and
model_mgr:transfer:ms_file_type
for all PSMs. This will enable the
transfer learning of the MS2 model.
For DIA data, only RT and CCS (if timsTOF) models will be refined.
For example of the settings yaml:
model_mgr:
transfer:
...
psm_type: pfind
psm_files:
- /path/to/pFind.spectra
- /path/to/other/pFind.spectra
ms_file_type: thermo_raw
ms_files:
- /path/to/raw1.raw
- /path/to/raw2.raw
...
The refined models will be saved in
model_mgr:transfer:model_output_folder
. After transfer learning, users
can apply the new models by replacing model_mgr:external_ms2_model
,
model_mgr:external_rt_model
and model_mgr:external_ccs_model
with
the saved ms2.pth
, rt.pth
and ccs.pth
in
model_mgr:transfer:model_output_folder
. This is useful to perform
sample-specific library prediction.
This command will apply Percolator to rescore DDA PSMs in
percolator:input_files:psm_files
and
percolator:input_files:psm_type
. All yaml settings (exported by
export-settings) related to this command are:
percolator:
require_model_tuning: True
raw_num_to_tune: 8
require_raw_specific_tuning: True
raw_specific_ms2_tuning: False
psm_num_per_raw_to_tune: 200
epoch_per_raw_to_tune: 5
multiprocessing: True
top_k_frags_to_calc_spc: 10
calibrate_frag_mass_error: False
max_perc_train_sample: 1000000
min_perc_train_sample: 100
percolator_backend: sklearn
percolator_backend_choices:
- sklearn
- pytorch
percolator_model: linear
percolator_model_choices:
pytorch_as_backend:
- linear # not fully tested, performance may be unstable
- mlp # not implemented yet
sklearn_as_backend:
- linear # logistic regression
- random_forest
lr_percolator_torch_model: 0.1 # learning rate, only used when percolator_backend==pytorch
percolator_iter_num: 5 # percolator iteration number
cv_fold: 1
fdr: 0.01
fdr_level: psm
fdr_level_choices:
- psm
- precursor
- peptide
- sequence
use_fdr_for_each_raw: False
frag_types: ['b_z1','b_z2','y_z1','y_z2']
input_files:
psm_type: alphapept
psm_type_choices:
- alphapept
- pfind
psm_files: []
ms_file_type: alphapept_hdf
ms_file_type_choices:
- alphapept_hdf
- thermo_raw # if alpharaw is installed
- mgf
- mzml
ms_files: []
other_score_column_mapping:
alphapept: {}
pfind:
raw_score: Raw_Score
msfragger:
hyperscore: hyperscore
nextscore: nextscore
maxquant: {}
output_folder: "{PEPTDEEP_HOME}/rescore"
Transfer learning will be applied when rescoring if percolator:require_model_tuning
is True.
The corresponding MS files (percolator:input_files:ms_files
and
percolator:input_files:ms_file_type
) must be provided to extract
experimental fragment intensities.
peptdeep install-models [--model-file url_or_local_model_zip] --overwrite True
Running peptdeep for the first time, it will download and install models
from models on github
defined in ‘model_url’ in the default yaml settings. This command will
update pretrained_models.zip
from --model-file url_or_local_model_zip
.
It is also possible to use other models instead of the pretrained_models by providing model_mgr:external_ms2_model
,
model_mgr:external_rt_model
and model_mgr:external_ccs_model
.
Using peptdeep from Python script or notebook provides the most flexible way to access all features in peptdeep.
We will introduce several usages of peptdeep via Python notebook:
Most of the default parameters and attributes peptdeep functions and
classes are controlled by peptdeep.settings.global_settings
which is a
dict
.
from peptdeep.settings import global_settings
The default values of global_settings
is defined in
default_settings.yaml.
Pipeline APIs provides the same functionalities with CLI, including library prediction, transfer learning, and rescoring.
from peptdeep.pipeline_api import (
generate_library,
transfer_learn,
rescore,
)
All these functionalities take a settings_dict
as the inputs, the dict
structure is the same as the settings yaml file. See the documatation of generate_library
, transfer_learn
, rescore
in https://alphapeptdeep.readthedocs.io/en/latest/module_pipeline_api.html.
from peptdeep.pretrained_models import ModelManager
ModelManager
class is the main entry to access MS2/RT/CCS models. It provides functionalities to train/refine the models and then use the new models to predict the data.
Check tutorial_model_manager.ipynb for details.
from peptdeep.protein.fasta import PredictSpecLibFasta
PredictSpecLibFasta
class provides functionalities to deal with fasta files or protein
sequences and spectral libraries.
Check out tutorial_speclib_from_fasta.ipynb for details.
from peptdeep.rescore.percolator import Percolator
Percolator
class provides functionalities to rescore DDA PSMs search by pFind
and
AlphaPept
, (and MaxQuant
if output FDR=100%), …
Check out test_percolator.ipynb for details.
from peptdeep.model.model_interface import ModelInterface
import peptdeep.model.generic_property_prediction # model shop
Building new DL models for peptide property prediction is one of the key features of AlphaPeptDeep. The key functionalities are ModelInterface
and the pre-designed models and model interfaces in the model shop (module peptdeep.model.generic_property_prediction
).
For example, we can built a HLA classifier that distinguishes HLA peptides from non-HLA peptides, see https://github.com/MannLabs/PeptDeep-HLA for details.
In case of issues, check out the following:
-
Issues. Try a few different search terms to find out if a similar problem has been encountered before.
-
Discussions. Check if your problem or feature requests has been discussed before.
If you like this software, you can give us a star to boost our visibility! All direct contributions are also welcome. Feel free to post a new issue or clone the repository and create a pull request with a new branch. For an even more interactive participation, check out the discussions and the Contributors License Agreement.
In order to have release notes automatically generated, changes need to be tagged with labels.
The following labels are used (should be safe-explanatory):
breaking-change
, bug
, enhancement
.
This package uses a shared release process defined in the alphashared repository. Please see the instructions there.
It is highly recommended to use the provided pre-commit hooks, as the CI pipeline enforces all checks therein to pass in order to merge a branch.
The hooks need to be installed once by
pre-commit install
You can run the checks yourself using:
pre-commit run --all-files
See the HISTORY.md for a full overview of the changes made in each version.