Take a break, brew a cup of tea while π§ͺ MetaboT π΅ digs into mass spec data!
π§ͺ MetaboT π΅ is an AI system that accelerates mass spectrometry-based metabolomics data mining. Leveraging advanced large language models and knowledge graph technologies, π§ͺ MetaboT π΅ translates natural language queries into SPARQL requestsβenabling researchers to explore and interpret complex metabolomics datasets. Built in Python and powered by state-of-the-art libraries, π§ͺ MetaboT π΅ offers an intuitive chat interface that bridges the gap between data complexity and user-friendly access. π§ͺ MetaboT π΅ can be installed locally and you can try our demo instance on an open 1,600 plant extract dataset available at https://metabot.holobiomicslab.eu.
Take a break, brew a cup of tea π΅, and have some fun with words while π§ͺ MetaboT π΅ digs into mass spec data! Enjoy your brew and happy puzzling!
Comprehensive documentation is available at https://holobiomicslab.github.io/MetaboT/. It includes:
- Installation and Quick Start Guides
- User Guide with configuration details
- API Reference for core components, agents, and graph management
- Usage Examples for both basic and advanced scenarios
- Contributing Guidelines
The documentation is automatically built and deployed using GitHub Actions on every push to the main branch.
To preview and build the documentation locally:
# Install the required dependencies
pip install mkdocs mkdocs-material mkdocstrings mkdocstrings-python
# To serve documentation locally, run:
mkdocs serve
# To build the documentation, run:
mkdocs build
If you use or reference π§ͺ MetaboT π΅ in your research, please cite it as follows:
π§ͺ MetaboT π΅: An LLM-based Multi-Agent Framework for Interactive Analysis of Mass Spectrometry Metabolomics Knowledge
Madina Bekbergenova, Lucas Pradi, Benjamin Navet, Emma Tysinger, Matthieu Feraud, Yousouf Taghzouti, Martin Legrand, Tao Jiang, Franck Michel, Yan Zhou Chen, Soha Hassoun, Olivier Kirchhoffer, Jean-Luc Wolfender, Florence Mehl, Marco Pagni, Wout Bittremieux, Fabien Gandon, Louis-FΓ©lix Nothias. PREPRINT (Version 1) available at Research Square
Institutions:
- UniversitΓ© CΓ΄te d'Azur, CNRS, ICN, Nice, France
- Interdisciplinary Institute for Artificial Intelligence (3iA) CΓ΄te d'Azur, Sophia-Antipolis, France
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
- INRIA, UniversitΓ© CΓ΄te d'Azur, CNRS, I3S, France
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
- Department of Chemical and Biological Engineering, Tufts University, Medford, MA 02155, USA
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, Centre MΓ©dical Universitaire, Geneva, Switzerland
- School of Pharmaceutical Sciences, University of Geneva, Centre MΓ©dical Universitaire, Geneva, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
Lab Websites:
Funding Support:
This work was supported by the French government through the France 2030 investment plan managed by the National Research Agency (ANR), as part of the Initiative of Excellence UniversitΓ© CΓ΄te d'Azur (ANR-15-IDEX-01) and served as an early prototype for the MetaboLinkAI project (ANR-24-CE93-0012-01). This work also benefited from project 189921 funded by the Swiss National Foundation (SNF).
To use π§ͺ MetaboT π΅, your mass spectrometry processing and annotation results must first be represented as a knowledge graph, with the corresponding endpoint deployed. You can utilize the Experimental Natural Products Knowledge Graph library for this purpose. See the ENPKG repository
By default, π§ͺ MetaboT π΅ connects to the public ENPKG endpoint for the ENPKG knowledge graph, which hosts an open and reusable annotated mass spectrometry dataset derived from a chemodiverse collection of 1,600 plant extracts. For further details, please refer to the associated publication.
- CPU: Any modern processor
- RAM: At least 8GB
This package has been tested on:
- macOS: Sonoma (14.5)
- Linux: Ubuntu 22.04 LTS, Debian 11
It should also work on other Unix-based systems. For more details on compatibility, check out GitHub Issues if you run into troubles.
-
Conda Installation
- Ensure Conda (Anaconda/Miniconda) is installed.
- Conda Installation Docs
-
API Keys Required API keys:
- Get an API key for your chosen language model:
- OpenAI API Key: Get it from OpenAI Platform
- DeepSeek API Key: Get it from DeepSeek
- Claude API Key: Get it from Anthropic
- Or other models supported by LiteLLM
Disclaimer: Most LLM APIs are commercial and paid services. Our default model is gpt-4o, and its usage will incur costs according to the provider's pricing policy.
Data Privacy: Please note that data submitted to LLM APIs is subject to their respective privacy policies. Avoid sending sensitive or confidential information, as data may be logged for quality assurance and research purposes.
Optional API keys:
- LangSmith API Key: This is used to see the interactions traces LangSmith. This is free.
Create a
.env
file in the root directory with your credentials:OPENAI_API_KEY=your_openai_key_here LANGCHAIN_API_KEY=your_langsmith_key_here LANGCHAIN_ENDPOINT=https://api.smith.langchain.com LANGCHAIN_PROJECT=metabot_project
Note: The system can also be used with other LLM models, namely: Meta-Llama-3_1-70B-Instruct and deepseek-reasoner. For Meta-Llama-3_1-70B-Instruct (which runs on OVH Cloud β see OVH Cloud), add the API key OVHCLOUD_API_KEY to your
.env
file; for deepseek-reasoner, add DEEPSEEK_API_KEY. Detailed information on how to configure other LLM models is available here. Currently, all agents use the OpenAI model gpt-4o (including the SPARQL generation chain). Furthermore, if the initial query yields no results, a SPARQL improvement chain using the OpenAI o3-mini model is activated. - Get an API key for your chosen language model:
-
Clone the Repository
git clone https://github.com/holobiomicslab/MetaboT.git cd MetaboT git checkout dev
-
Create and Activate the Conda Environment
For macOS:
conda env create -f environment.yml conda activate metabot
For Linux:
# Update system dependencies first sudo apt-get update sudo apt-get install -y python3-dev build-essential # Then create and activate the conda environment conda env create -f environment.yml conda activate metabot
For Windows (using WSL):
Install WSL if you haven't already:
wsl --install
Open WSL and install the required packages:
sudo apt-get update sudo apt-get install -y python3-dev build-essential
Install Miniconda in WSL:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh source ~/.bashrc
Create and activate the conda environment:
conda env create -f environment.yml conda activate MetaboT
Pro-tip: If you hit any issues with psycopg2, the
environment.yml
usespsycopg2-binary
for maximum compatibility.
The application is structured as a Python module with dot notation importsβso choose your style, whether absolute (e.g., app.core.module1.module2
) or relative (e.g., ..core.module1.module2
).
To launch the application, use Python's -m
option. The main entry point is in app.core.main
.
To try one of the standard questions, run the following command:
cd MetaboT
python -m app.core.main -q 1
Here, the number following -q
specifies the question number from the standard questions which can be viewed in app/data/standard_questions.txt
.
Expected output includes runtime metrics and a welcoming prompt. π
python -m app.core.main -c "Your custom question"
To launch the application through Streamlit, set the required environment variables, install the dependencies, and run the app. In your terminal, execute:
export ADMIN_OPENAI_KEY=your_openai_api_key
export LANGCHAIN_API_KEY=your_langchain_api_key
pip install -r requirements.txt
streamlit run streamlit_webapp/streamlit_app.py
If you encounter an error stating that the app
directory cannot be found (e.g., "ModuleNotFoundError: No module named 'app'"), it means Python is unable to locate the module. To resolve this, add the current directory to your PYTHONPATH
by running:
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
This command ensures that Python can locate the app
directory.
If you prefer to run the application in a containerized environment, Docker support is provided. Make sure Docker and docker-compose are installed on your system.
To build the Docker image, run:
docker-compose build
To launch the application and run the first standard question, execute:
docker-compose run metabot python -m app.core.main -q 1
This command will start the container, run the application inside Docker, and process the first standard question from [app/data/standard_questions.txt]. You can adjust parameters as needed.
.
βββ README.md
βββ app
β βββ config
β β βββ langgraph.json
β β βββ logging.ini
β β βββ logs
β β β βββ app.log
β β βββ params.ini
β β βββ sparql.ini
β βββ core
β β βββ agents
β β β βββ agents_factory.py
β β β βββ enpkg
β β β β βββ agent.py
β β β β βββ prompt.py
β β β β βββ tool_chemicals.py
β β β β βββ tool_smiles.py
β β β β βββ tool_target.py
β β β β βββ tool_taxon.py
β β β βββ entry
β β β β βββ agent.py
β β β β βββ prompt.py
β β β β βββ tool_filesparser.py
β β β βββ interpreter
β β β β βββ agent.py
β β β β βββ prompt.py
β β β β βββ tool_interpreter.py
β β β β βββ tool_spectrum.py
β β β βββ sparql
β β β β βββ agent.py
β β β β βββ prompt.py
β β β β βββ tool_merge_result.py
β β β β βββ tool_sparql.py
β β β β βββ tool_wikidata_query.py
β β β βββ validator
β β β β βββ agent.py
β β β β βββ prompt.py
β β β β βββ tool_validator.py
β β β βββ supervisor
β β β βββ agent.py
β β β βββ prompt.py
β β βββ graph_management
β β β βββ RdfGraphCustom.py
β β βββ main.py
β β βββ memory
β β β βββ custom_sqlite_file.py
β β β βββ database_manager.py
β β β βββ test_db_connection.py
β β β βββ tools_database.py
β β βββ utils.py
β β βββ workflow
β β β βββ langraph_workflow.py
β β βββ tests
β β β βββ evaluation.py
β β β βββ test_utils.py
β βββ data
β β βββ submitted_plants.csv
β β βββ npc_class.csv
β β βββ evaluation_dataset.csv
β βββ graphs
β β βββ graph.pkl
β β βββ schema.ttl
β βββ notebooks
βββ docs
β βββ api-reference
β βββ assets
β βββ examples
β βββ getting-started
β βββ user-guide
β βββ contributing.md
β βββ index.md
βββ streamlit_webapp
β βββ streamlit_app.py
β βββ streamlit_utils.py
βββ environment.yml
βββ mkdocs.yml
βββ requirements.txt
Create a dedicated folder for your agent within the app/core/agents/
directory. See here.
-
Agent (
agent.py
): Copy from an existing agent unless your tool requires private class property access. Refer to "If Your Tool Serves as an Agent" for special cases.Psst... don't let the complexities of Python imports overcomplicate your flowβtrust the process!
-
Prompt (
prompt.py
): Adapt the prompt for your specific context/tasks. Configure theMODEL_CHOICE
, default isllm-o
for gpt-4o (perapp/config/params.ini
). -
Tools (
tool_xxxx.py
) (optional): Inherit from the LangChainBaseTool
, defining:name
,description
,args_schema
- A Pydantic model for input validation
- The
_run
method for execution
Modify the supervisor prompt (see supervisor prompt) to detect and select your agent. Our AI PR-Agent π€ is triggered automatically through issues and pull requests, so you'll be in good hands!
Update app/config/langgraph.json
to include your agent in the workflow. For reference, see langgraph.json.
For LLM-interaction, make sure additional class properties are set in agent.py
(refer to tool_sparql.py and agent.py). Keep it snazzy and smart!
Contributing to π§ͺ MetaboT π΅
We use the dev
branch for pushing our contributions here on GitHub. Please create your own branch (either user-centric like dev_benjamin
or feature-centric like dev_langgraph
) and submit a pull request to the dev
branch when you're ready for review. Our AI PR-Agent π€ is always standing by to help trigger pull requests and even handle issues smartlyβbecause why not let a smarty pants bot lend a hand.
Documentation Standards
- Use Google Docstring Format
- Consider the Mintlify Doc Writer for VSCode for automatically stylish and precise docstrings.
Code Formatting
- Stick to PEP8
- Leverage the Black Formatter for a neat, uniform style.
Because code deserves to look as sharp as your ideas. π
Pass keys as parameters instead of environment variables for scalable production deployments.
Centralized logging resides in app/config/logging.ini
. See here.
Use the following snippet at the start of your Python scripts:
from pathlib import Path
import logging.config
parent_dir = Path(__file__).parent.parent
config_path = parent_dir / "config" / "logging.ini"
logging.config.fileConfig(config_path, disable_existing_loggers=False)
logger = logging.getLogger(__name__)
Pro-tip: Use logger
over print
for more elegant and traceable output.
- Explore the ENPKG project on GitHub: https://github.com/enpkg
- Visit the HolobiomicsLab GitHub organization: https://github.com/holobiomicslab
- Access the detailed π§ͺ MetaboT π΅ documentation at: https://holobiomicslab.github.io/MetaboT/
We warmly welcome your contributions! Here's how to dive in:
- Fork & Clone
- Fork the repo on GitHub and clone your fork.
- Create a Feature Branch
- Branch from
dev
(e.g.,dev_your_branch_name
).
- Branch from
- Develop Your Feature
- Write clean code with clear documentation (Google Docstring format is preferred).
- Our AI PR-Agent π€ automatically kicks in when you raise an issue or a pull request.
- Commit
- Use atomic commits with present-tense messages:
git commit -m "Add new agent for processing chemical data"
- That's the secret sauce to a smooth GitHub PR journey!
- Use atomic commits with present-tense messages:
- Submit a Pull Request
- Push your changes and create a PR against the
dev
branch. Fill out all necessary details, including links to related issues (e.g., GitHub Issues).
- Push your changes and create a PR against the
- Update documentation, run tests, and ensure your code is formatted.
- The AI PR-Agent is active and will provide first-line feedback!
- Write meaningful tests.
- Maintain rich inline documentation.
- Adhere to PEP8 and best practices.
For bug reports or feature requests, please use our GitHub Issues page.
Your contributions make π§ͺ MetaboT π΅ awesome! Thank you for being part of our journey and for keeping the code as sharp as your wit. ππ
π§ͺ MetaboT π΅ is open source and released under the Apache License 2.0. This license allows you to freely use, modify, and distribute the software, provided that you include the original copyright notice and license terms.
Take a break, brew a cup of tea, and have some fun with words while π§ͺ MetaboT π΅ digs into mass spec data!
Here's a little puzzle to steep your brain:
- Unscramble the letters in t-e-a-m-o-b-o-t to reveal the secret spice behind our data wizard!
- What do you get when you mix a hot cup of tea with a powerful AI? Absolutely tea-rific insights!
Remember: While you relax with your favorite treat, π§ͺ MetaboT π΅ is busy infusing data with meaning. Sip, smile, and let the insights steep into brilliance!
Enjoy your brew and happy puzzling!