Skip to content

Conversation

@zavgorodnii
Copy link

Description of the Change

Problem

PentAGI currently relies on vector-based semantic search (pgvector) for memory and knowledge storage. While effective for similarity matching, this approach lacks the ability to capture and query explicit relationships between entities such as tools, targets, vulnerabilities, and techniques. As a result, the system cannot answer complex questions like "What tools have been successful against Apache servers?" or "Show me the sequence of actions that led to privilege escalation."

Solution

This PR integrates Graphiti, a temporal knowledge graph system powered by Neo4j, to provide advanced semantic understanding and relationship tracking for AI agent operations. The integration uses a custom vxcontrol fork (pentagi-graphiti) that includes specialized entity and edge types for pentesting purposes.

Key Implementation Details:

  1. Client Wrapper (pkg/graphiti/client.go): Provides a simplified, non-blocking interface to the Graphiti API with health checks, timeout protection, and graceful degradation when disabled or unavailable.

  2. Provider Integration (pkg/providers/performer.go): Automatically captures two types of events:

    • Agent responses (reasoning, analysis, and decisions) for all agent types
    • Tool executions (commands, arguments, results, and status) excluding agent-type tools
  3. Templates (pkg/templates/graphiti/): Two templates format captured data:

    • agent_response.tmpl: Structures agent outputs with context
    • tool_execution.tmpl: Captures tool details including barrier function classification
  4. Infrastructure (docker-compose.yml): Adds Neo4j (graph database) and Graphiti (API layer) services with proper health checks and dependencies.

  5. Configuration: Three new environment variables control the feature:

    • GRAPHITI_ENABLED (default: false) - Feature flag
    • GRAPHITI_URL - Graphiti API endpoint
    • GRAPHITI_TIMEOUT - Operation timeout in seconds

The integration is designed to be completely optional and non-intrusive. When disabled or when operations fail, the system logs warnings but continues normal operation without interruption.

Closes #

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • 🔧 Configuration change
  • 🧪 Test update
  • 🛡️ Security update

Areas Affected

  • Core Services (Frontend UI/Backend API)
  • AI Agents (Researcher/Developer/Executor)
  • Security Tools Integration
  • Memory System (Vector Store/Knowledge Base)
  • Monitoring Stack (Grafana/OpenTelemetry)
  • Analytics Platform (Langfuse)
  • External Integrations (LLM/Search APIs)
  • Documentation
  • Infrastructure/DevOps

Testing and Verification

Test Configuration

PentAGI Version: main branch (post-integration)
Docker Version: 24.0+
Host OS: Linux amd64
LLM Provider: OpenAI (required for Graphiti entity extraction)
Enabled Features: [Neo4j, Graphiti]

Test Steps

  1. Set GRAPHITI_ENABLED=true in .env file
  2. Configure Neo4j credentials and OpenAI API key
  3. Build pentagi-graphiti image (in the respective repo): docker build -t graphiti-pentagi:latest
  4. Start services: docker compose up -d
  5. Create a test flow with multiple agent interactions and tool executions
  6. Verify Graphiti service health: docker compose logs graphiti
  7. Access Neo4j Browser at http://localhost:7474 and verify graph data
  8. Test with GRAPHITI_ENABLED=false to verify graceful degradation
  9. Verify existing flows work without interruption when Graphiti fails

Test Results

  • ✅ Agent responses successfully captured to Graphiti
  • ✅ Tool executions logged with full context
  • ✅ Neo4j graph populated with entities and relationships
  • ✅ System continues operation when Graphiti is disabled
  • ✅ Failed Graphiti operations log warnings but don't interrupt workflow
  • ✅ No performance degradation (async operations)

Security Considerations

New Security Requirements:

  • Neo4j credentials must be set securely (avoid default passwords in production)
  • OpenAI API key required for Graphiti entity extraction
  • Neo4j ports (7474, 7687) exposed only to localhost by default

Recommendations:

  1. Change default NEO4J_PASSWORD in production deployments
  2. Keep Neo4j and Graphiti services on internal network only
  3. Knowledge graph contains sensitive pentesting details - apply appropriate access controls
  4. Consider encryption at rest for Neo4j data in sensitive environments

No Changes To:

  • Existing authentication mechanisms
  • Data access patterns
  • User permissions model

Performance Impact

Resource Usage:

  • Neo4j: ~512MB-1GB RAM for typical usage
  • Graphiti: Minimal CPU/memory (Python API layer)
  • Disk: Graph size grows with usage (~10-100KB per flow)
  • Network: Small HTTP requests (~1-10KB per message)

Latency:

  • Async operations don't block agent execution
  • Timeout protection prevents hanging (configurable, default 30s)
  • Failed operations gracefully degrade without impact

Documentation Updates

  • README.md updates

    • Added Graphiti to features list
    • Updated Container Architecture diagram
    • Added comprehensive "Knowledge Graph Integration" section
    • Added Neo4j/Graphiti configuration examples
    • Added security notes for Neo4j credentials
  • API documentation updates

    • N/A (no API changes)
  • Configuration documentation updates

    • Added Graphiti settings section to backend/docs/config.md
    • Documented all three configuration variables
    • Included usage examples and code snippets
  • GraphQL schema updates

    • N/A (no schema changes)
  • [] Other

Deployment Notes

New Environment Variables Required:

# Neo4j (Graph Database)
NEO4J_PORT=7687
NEO4J_URI=neo4j://neo4j:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_secure_password  # CHANGE THIS!

# Graphiti (Knowledge Graph)
GRAPHITI_ENABLED=false  # Set to true to enable
GRAPHITI_URL=http://graphiti:8000
GRAPHITI_TIMEOUT=30

# OpenAI (Required for Graphiti entity extraction)
OPEN_AI_KEY=sk-your-key-here

Pre-Deployment Steps:

  1. Build the pentagi-graphiti Docker image:

    git clone https://github.com/vxcontrol/pentagi-graphiti.git
    cd pentagi-graphiti
    docker build -t graphiti-pentagi:latest .
  2. Add environment variables to .env file

  3. Start services: docker compose up -d

Optional Configuration:

  • Feature is disabled by default (GRAPHITI_ENABLED=false)
  • Can be enabled/disabled without rebuilding
  • Existing deployments continue working without changes

Rollback:

  • Set GRAPHITI_ENABLED=false to disable
  • No data migration required
  • No breaking changes to existing functionality

Checklist

Code Quality

  • My code follows the project's coding standards
  • I have added/updated necessary documentation
  • I have added tests to cover my changes
  • All new and existing tests pass
  • I have run go fmt and go vet (for Go code)
  • I have run npm run lint (for TypeScript/JavaScript code)

Security

  • I have considered security implications
  • Changes maintain or improve the security model
  • Sensitive information has been properly handled

Compatibility

  • Changes are backward compatible
  • Breaking changes are clearly marked and documented
  • Dependencies are properly updated

Documentation

  • Documentation is clear and complete
  • Comments are added for non-obvious code
  • API changes are documented

Additional Notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants