Intelligent code search and analysis through Neo4j knowledge graphs
New to this project? Check out the (more) comprehensive guides:
- Getting Started Guide - Comprehensive setup and usage documentation
Graph-Codebase-MCP is a specialized tool for creating knowledge graphs of codebases, combining Neo4j graph database with Model Context Protocol (MCP) to provide intelligent code search and analysis capabilities. The project utilizes Abstract Syntax Tree (AST) to analyze Python code structures and employs OpenAI Embeddings for semantic encoding, storing code entities and relationships in Neo4j to form a comprehensive knowledge graph.
Through the MCP server interface, AI agents can understand and search code more intelligently, surpassing the limitations of traditional text search and achieving a deeper understanding of code structure and semantics.
The following diagram shows a knowledge graph of example codebase:
The graph illustrates the network of relationships between files (pink), classes (blue), functions and methods (yellow), and variables (green), including:
- Import relationships between files (IMPORTS_FROM)
- Specific symbol imports from files (IMPORTS_DEFINITION)
- Class inheritance relationships (EXTENDS)
- Function call relationships (CALLS)
- Definition relationships between classes and their methods/attributes (DEFINES)
This structured representation enables AI to more effectively understand the structure and semantic relationships within code.
- Multi-Language Code Parsing: Support for 6+ programming languages using both legacy AST parsers and modern ast-grep adapters
- Semantic Embeddings: Generate vector representations for code elements using configurable embedding providers (OpenAI, Google Gemini, DeepInfra)
- Knowledge Graph Construction: Store parsed code entities and relationships in Neo4j to form a comprehensive, queryable knowledge graph
- Cross-File Dependency Analysis: Track imports, symbols, and dependencies across file boundaries for complete codebase understanding
- Parallel Indexing: Dramatically speed up large codebase processing with automatic parallelization and intelligent fallback strategies
- MCP Query Interface: Provide an AI-agent-friendly interface following the Model Context Protocol standard
- Relationship Queries: Support complex queries including function call chains, inheritance hierarchies, and dependency networks
- Python
- JavaScript / TypeScript
- Java
- C++
- Rust
- Go
- Python 3.10 or higher (3.14 free-threaded recommended for best performance)
- Neo4j graph database (version 5.x recommended)
- Docker (optional, for containerized deployment)
Graph-Codebase-MCP supports parallel indexing to dramatically speed up processing of large codebases. When available, the system automatically selects the optimal execution strategy based on your Python version and codebase size. The system will automatically fallback to sequential processing for small codebases (< 50 files) or when free-threading isnt available. It also scales based on CPU cores
The system uses a two-pass architecture:
- First Pass (Parallel): Each worker independently parses files and builds module definitions
- Second Pass (Sequential): Resolves cross-file imports using the complete module index
Parallel indexing is enabled by default. You can customize behavior via environment variables:
# Enable/disable parallel indexing (default: true)
PARALLEL_INDEXING_ENABLED=true
# Maximum worker threads/processes (default: min(cpu_count, 8))
MAX_WORKERS=8
# Minimum files required to use parallel mode (default: 50)
MIN_FILES_FOR_PARALLEL=50
# Neo4j connection pool size (default: MAX_WORKERS * 2)
NEO4J_MAX_CONNECTION_POOL_SIZE=16Connection pool exhausted
- Increase
NEO4J_MAX_CONNECTION_POOL_SIZE(recommended:MAX_WORKERS * 2) - Reduce
MAX_WORKERSif system resources are limited
Performance not improving
- Ensure you have Python 3.14 free-threaded for best results
- Check CPU utilization - you may already be I/O bound
git clone https://github.com/zadzanl/graph-codebase-mcp-extend.git
cd graph-codebase-mcp-extendpip install -r requirements.txtTo unlock true parallel speedups on multi-core CPUs, you can install Python 3.14 free-threaded (GIL disabled). Standard Python works fine and the app will fall back to safe modes automatically, but free-threaded can deliver 2x+ speedups on large codebases.
- Windows/macOS installers from python.org include an option to install a free-threaded build.
- Verify with either method:
python -VVshows "free-threading build" in the version string- In Python:
import sys; hasattr(sys, "_is_gil_enabled") and sys._is_gil_enabled()returnsFalse
References:
- Python HOWTO: Python support for free threading (3.14)
- What’s New in Python 3.14 – Free-threaded mode improvements
Create a .env file in the project root (see Embedding Provider Configuration for more details):
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password
OPENAI_API_KEY=your_openai_api_key
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
PARALLEL_INDEXING_ENABLED=true
MAX_WORKERS=4
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password
EMBEDDING_PROVIDER=google
EMBEDDING_MODEL=text-embedding-004
GEMINI_API_KEY=your_gemini_api_key
PARALLEL_INDEXING_ENABLED=true
MAX_WORKERS=4
{
"mcpServers": {
"graph-codebase-mcp": {
"command": "python",
"args": [
"src/main.py",
"--codebase-path",
"path/to/your/codebase"
],
"env": {
"NEO4J_URI": "bolt://localhost:7687",
"NEO4J_USER": "neo4j",
"NEO4J_PASSWORD": "password",
"EMBEDDING_PROVIDER": "openai",
"EMBEDDING_MODEL": "text-embedding-3-small",
"OPENAI_API_KEY": "your_openai_api_key"
}
}
}
}If using Docker:
docker run -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/password neo4j:latestAccess the Neo4j browser at: http://localhost:7474
Execute the main program to analyze the codebase and build the knowledge graph:
python src/main.py --codebase-path /path/to/your/codebasepython src/mcp_server.pyThis project supports various code-related queries, such as:
- Find all callers of a specific function:
"find all callers of function:process_data" - Find the inheritance structure of a specific class:
"show inheritance hierarchy of class:DataProcessor" - Query the dependencies of a file:
"list dependencies of file:main.py" - Find code related to a specific module:
"search code related to module:data_processing" - Cross-file tracking of symbol imports and usage:
"trace imports and usages of class:Employee" - Analyze the dependency network between files:
"analyze dependency network starting from file:main.py"
graph-codebase-mcp/
├── src/
│ ├── ast_parser/ # Multi-language AST parsing module
│ │ ├── parser.py # Legacy Python AST parser
│ │ ├── multi_parser.py # Multi-language parser coordinator
│ │ ├── language_detector.py # Automatic language detection
│ │ └── adapters/ # Language-specific ast-grep adapters
│ │ ├── python_adapter.py
│ │ ├── javascript_adapter.py
│ │ ├── java_adapter.py
│ │ ├── cpp_adapter.py
│ │ ├── rust_adapter.py
│ │ └── go_adapter.py
│ ├── embeddings/ # Embedding provider module
│ │ ├── factory.py # Provider factory (OpenAI, Google Gemini, DeepInfra)
│ │ ├── openai_compatible.py # OpenAI-compatible API client
│ │ ├── base.py # Base embedding provider interface
│ │ └── embedder.py # Code embedding processor
│ ├── neo4j_storage/ # Neo4j database operations
│ │ └── graph_db.py # Neo4j graph database interface
│ ├── parallel/ # Parallel processing module
│ │ └── pool_manager.py # Thread/process pool manager
│ ├── utils/ # Utility functions
│ │ └── runtime_detection.py # Python runtime detection (3.14 free-threading)
│ ├── mcp/ # MCP Server implementation
│ │ └── server.py # MCP server entry point
│ ├── main.py # Main program entry point
│ └── mcp_server.py # MCP server startup script
├── tests/ # Comprehensive test suite
├── docs/ # Documentation and diagrams
│ └── images/ # Visual resources
├── .env # Environment configuration
├── requirements.txt # Dependencies
└── README.md # This file
- Languages: Python 3.10+ (Python 3.14 free-threaded recommended for best performance)
- Code Analysis: Python AST module, ast-grep, Tree-sitter
- Multi-Language Support: Dedicated adapters for Python, JavaScript/TypeScript, Java, C++, Rust, Go
- Vector Embeddings: OpenAI, Google Gemini, or DeepInfra APIs (OpenAI-compatible)
- Graph Database: Neo4j 5.x with connection pooling
- Parallel Processing: ThreadPoolExecutor (Python 3.14) or ProcessPoolExecutor with automatic selection
- Interface Protocol: Model Context Protocol (MCP) Python SDK
- Web Framework: Starlette/Uvicorn for MCP server hosting
MIT License
- Neo4j GraphRAG Python Package
- Model Context Protocol
- Neo4j Python Driver Documentation
- Python AST Module Documentation
The system supports multiple embedding providers. Configure via environment variables:
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=your_openai_api_keyEMBEDDING_PROVIDER=google
EMBEDDING_MODEL=text-embedding-004
GEMINI_API_KEY=your_gemini_api_keyEMBEDDING_PROVIDER=deepinfra
EMBEDDING_MODEL=your_model_name
DEEPINFRA_API_KEY=your_deepinfra_api_keyEMBEDDING_PROVIDER=generic
EMBEDDING_MODEL=your_model_name
EMBEDDING_API_KEY=your_api_key
EMBEDDING_API_BASE_URL=https://your-provider-endpoint/v1| Provider | Model | Dimensions | Notes |
|---|---|---|---|
| OpenAI | text-embedding-3-small | 1536 | Recommended default |
| OpenAI | text-embedding-3-large | 3072 | Higher quality |
| text-embedding-004 | 768 | Legacy, being deprecated | |
| gemini-embedding-001 | 3072 | Latest, recommended |