Add Prometheus metrics support and update configuration #46

cshiels-ie · 2025-10-31T17:30:32Z

Introduced a new metrics service for Prometheus integration, including counters and histograms for tracking tool executions, HTTP requests, and errors.
Updated the logger to record metrics for tool access, including execution duration and success/error status.
Added configuration options for enabling metrics in the YAML configuration file.
Enhanced the README with instructions on enabling and accessing Prometheus metrics.
Updated the main application to conditionally expose a metrics endpoint based on configuration.

AI Generated Details/Description:

Add Prometheus Metrics Support for Observability

Summary

This PR adds comprehensive Prometheus metrics support to the AAP MCP Server, enabling production-ready monitoring and observability of the service.

Changes

New Features

1. Metrics Service (`src/metrics.ts`)

Created a dedicated MetricsService class using prom-client
Collects default Node.js metrics (CPU, memory, GC, event loop lag)
Implements custom MCP-specific metrics with proper labels

2. Available Metrics

HTTP Metrics:

http_requests_total - Counter for all HTTP requests (labeled by method, route, status_code)
http_request_duration_seconds - Histogram of request duration

MCP Tool Metrics:

mcp_tool_executions_total - Counter for tool executions (labeled by tool_name, service, category, status)
mcp_tool_execution_duration_seconds - Histogram of tool execution duration (labeled by tool_name, service, category)
mcp_tool_errors_total - Counter for tool errors (labeled by tool_name, service, category, error_type)
mcp_active_tools - Gauge of currently active tools per service
mcp_active_sessions - Gauge of active MCP sessions

API Call Metrics:

mcp_api_calls_total - Counter for AAP API calls (labeled by service, endpoint, method, status_code)

System Metrics:

Standard Node.js process metrics (CPU, memory, GC, etc.)

3. Integration Points

Logger Integration: src/logger.ts now records metrics for every tool execution
HTTP Middleware: Tracks all incoming HTTP requests automatically (when metrics enabled)
Tool Execution: Captures timing, success/error status, and category information

4. Configuration

Metrics can be enabled/disabled via configuration:

# aap-mcp.yaml
enable_metrics: true

Or via environment variable:

export ENABLE_METRICS=true

5. Endpoints

GET /metrics - Prometheus-formatted metrics endpoint (only available when metrics are enabled)

Key Enhancements

Category Tracking: Added getCategoryForTool() helper function to automatically determine which category a tool belongs to (e.g., job_management, inventory_management, etc.)
Automatic Labeling: All tool metrics now include category labels for better filtering and dashboards
Zero-Impact When Disabled: When metrics are disabled, there's minimal overhead
Production-Ready: Uses industry-standard Prometheus client with proper histogram buckets

Usage

Enable Metrics

# aap-mcp.yaml
enable_metrics: true

Access Metrics

curl http://localhost:3000/metrics

Configure Prometheus

# prometheus.yml
scrape_configs:
  - job_name: 'aap-mcp-server'
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:3000']
    metrics_path: '/metrics'

Example Queries

# Request rate per second
rate(http_requests_total[5m])

# Tool execution success rate by category
rate(mcp_tool_executions_total{status="success"}[5m]) 
  / rate(mcp_tool_executions_total[5m])

# 95th percentile tool execution time
histogram_quantile(0.95, rate(mcp_tool_execution_duration_seconds_bucket[5m]))

# Error rate by category
rate(mcp_tool_errors_total{category="job_management"}[1m]) * 60

Benefits

Observability: Full visibility into service health and performance
Alerting: Set up alerts on error rates, latency, or resource usage
Debugging: Identify slow tools, error patterns, and bottlenecks
Capacity Planning: Track resource usage trends over time
Category Insights: Monitor performance and errors by functional category

Breaking Changes

None. Metrics are opt-in and disabled by default.

Testing

Metrics endpoint responds with valid Prometheus format
Metrics update correctly after tool executions
HTTP request metrics track all endpoints
Category labels correctly identify tool categories
No impact when metrics are disabled

Documentation

Updated README.md with Prometheus metrics section including:

Configuration instructions
Available metrics list
Prometheus integration examples
Example queries

Dependencies

Added prom-client@^15.1.3 for Prometheus metrics collection

Future Enhancements

Grafana dashboard templates
Custom alerting rules
Metrics for session lifecycle
Per-user metrics (if needed)

- Introduced a new metrics service for Prometheus integration, including counters and histograms for tracking tool executions, HTTP requests, and errors. - Updated the logger to record metrics for tool access, including execution duration and success/error status. - Added configuration options for enabling metrics in the YAML configuration file. - Enhanced the README with instructions on enabling and accessing Prometheus metrics. - Updated the main application to conditionally expose a metrics endpoint based on configuration.

goneri · 2025-10-31T17:42:28Z

You also need to import the package-lock.json (this should address the CI errors 🤞 ).

goneri · 2025-11-03T20:50:57Z

thank you @cshiels-ie

cshiels-ie added 3 commits November 3, 2025 09:52

UpdatedPackage-lock

f6e8359

Package-lock fix

9354378

Prettier Fixes

5fd5d95

goneri approved these changes Nov 3, 2025

View reviewed changes

goneri merged commit 764778e into ansible:main Nov 3, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Prometheus metrics support and update configuration #46

Add Prometheus metrics support and update configuration #46

Uh oh!

cshiels-ie commented Oct 31, 2025

Uh oh!

goneri commented Oct 31, 2025

Uh oh!

Uh oh!

goneri commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Prometheus metrics support and update configuration #46

Add Prometheus metrics support and update configuration #46

Uh oh!

Conversation

cshiels-ie commented Oct 31, 2025

Add Prometheus Metrics Support for Observability

Summary

Changes

New Features

1. Metrics Service (src/metrics.ts)

2. Available Metrics

3. Integration Points

4. Configuration

5. Endpoints

Key Enhancements

Usage

Enable Metrics

Access Metrics

Configure Prometheus

Example Queries

Benefits

Breaking Changes

Testing

Documentation

Dependencies

Future Enhancements

Uh oh!

goneri commented Oct 31, 2025

Uh oh!

Uh oh!

goneri commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Metrics Service (`src/metrics.ts`)