Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,21 +140,21 @@ When adding new tools, follow this pattern:
@mcp.tool()
def your_new_tool(
app_id: str,
server: Optional[str] = None,
server_spec: ServerSpec,
# other parameters
) -> YourReturnType:
"""
Brief description of what this tool does.

Args:
app_id: The Spark application ID
server: Optional server name to use
server_spec: ServerSpec

Returns:
Description of return value
"""
ctx = mcp.get_context()
client = get_client_or_default(ctx, server)
client = get_client(ctx, server_spec)

# Your implementation here
return client.your_method(app_id)
Expand All @@ -163,7 +163,7 @@ def your_new_tool(
**Don't forget to add tests:**

```python
@patch("tools.get_client_or_default")
@patch("tools.get_client")
def test_your_new_tool(self, mock_get_client):
"""Test your new tool functionality"""
# Setup mocks
Expand All @@ -172,7 +172,7 @@ def test_your_new_tool(self, mock_get_client):
mock_get_client.return_value = mock_client

# Call the tool
result = your_new_tool("spark-app-123")
result = your_new_tool("spark-app-123", self.DEFAULT_SERVER_SPEC)

# Verify results
self.assertEqual(result, expected_result)
Expand Down
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# MCP Server for Apache Spark History Server


[![CI](https://github.com/DeepDiagnostix-AI/mcp-apache-spark-history-server/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/DeepDiagnostix-AI/mcp-apache-spark-history-server/actions)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
Expand Down Expand Up @@ -112,6 +113,10 @@ servers:
auth: # optional
username: "user"
password: "pass"

# Enable dynamic EMR clusters mode to specify EMR clusters directly in tool calls
dynamic_emr_clusters_mode: true

mcp:
transports:
- streamable-http # streamable-http or stdio.
Expand Down Expand Up @@ -243,13 +248,35 @@ servers:
url: "http://staging-spark-history:18080"
```

With static configuration:

πŸ’ User Query: "Can you get application <app_id> using production server?"

πŸ€– AI Tool Request:
```json
{
"app_id": "<app_id>",
"server": "production"
"server_spec": {
"static_server_spec": {
"server_name": "production"
}
}
}
```

With dynamic EMR configuration:

πŸ’ User Query: "Can you get application <app_id> using cluster j-I4VIWMNGOIP7"

πŸ€– AI Tool Request:
```json
{
"app_id": "<app_id>",
"server_spec": {
"dynamic_emr_server_spec": {
"emr_cluster_id": "j-I4VIWMNGOIP7"
}
}
}
```
πŸ€– AI Tool Response:
Expand Down Expand Up @@ -289,6 +316,7 @@ SHS_SERVERS_*_AUTH_TOKEN - Token for a specific server
SHS_SERVERS_*_VERIFY_SSL - Whether to verify SSL for a specific server (true/false)
SHS_SERVERS_*_TIMEOUT - HTTP request timeout in seconds for a specific server (default: 30)
SHS_SERVERS_*_EMR_CLUSTER_ARN - EMR cluster ARN for a specific server
SHS_DYNAMIC_EMR_CLUSTERS_MODE - Enable dynamic EMR clusters mode (default: false)
```

## πŸ€– AI Agent Integration
Expand Down
3 changes: 3 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ servers:
# emr_persistent_ui:
# emr_cluster_arn: "<EMR Cluster ARN>"

# dynamic_emr_clusters_mode: true # To be able to add cluster id or name to the prompt

mcp:
transports:
- streamable-http # streamable-http or stdio. you can only specify one right now.
Expand All @@ -55,3 +57,4 @@ mcp:
# SHS_SERVERS_*_AUTH_TOKEN - Token for a specific server
# SHS_SERVERS_*_VERIFY_SSL - Whether to verify SSL for a specific server (true/false)
# SHS_SERVERS_*_EMR_CLUSTER_ARN - EMR cluster ARN for a specific server
# SHS_DYNAMIC_EMR_CLUSTERS_MODE - Enable dynamic EMR clusters mode (default: false)
6 changes: 6 additions & 0 deletions deploy/kubernetes/helm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,12 @@ config:
url: "http://dev-spark-history:18080"
```

#### 1a. Dynamic EMR Clusters Configuration
```yaml
config:
dynamic_emr_clusters_mode: true
```

#### 2. Authentication Setup
```yaml
auth:
Expand Down
35 changes: 31 additions & 4 deletions examples/aws/emr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@

[![Watch the demo video](https://img.shields.io/badge/YouTube-Watch%20Demo-red?style=for-the-badge&logo=youtube)](https://www.youtube.com/watch?v=FaduuvMdGxI)

If you are an existing Amazon EMR user looking to analyze your Spark Applications, then you can follow the steps below to start using the Spark History Server MCP in 5 simple steps.
If you are an existing Amazon EMR user looking to analyze your Spark Applications, you have **two options**:

1. **πŸ”§ Static Configuration** - Pre-configure EMR clusters in `config.yaml`
2. **⚑ Dynamic Configuration** - Specify EMR clusters directly in tool calls

The dynamic approach is particularly useful when analyzing Spark applications across multiple EMR clusters without pre-configuration.

## Step 1: Setup project on your laptop

Expand All @@ -24,16 +29,38 @@ task install # Install dependencies

Amazon EMR-EC2 users can use a service-managed [Persistent UI](https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html) which automatically creates the Spark History Server for Spark applications on a given EMR Cluster. You can directly go to Step 3 and configure the MCP server with an EMR Cluster Id to analyze the Spark applications on that cluster.

## Step 3: Configure the MCP Server to use the EMR Persistent UI
## Step 3: Configure the MCP Server

You have **two configuration options**:

### Option A: πŸ”§ Static Configuration

- Identify the Amazon EMR Cluster Id for which you want the MCP server to analyze the Spark applications
- Edit SHS MCP Config: [config.yaml](../../../config.yaml) to add the EMR Cluster Id

```yaml
emr_persistent_ui:
emr_cluster_arn: "<emr_cluster_arn>"
servers:
emr_persistent_ui:
emr_cluster_arn: "<emr_cluster_arn>"

dynamic_emr_clusters_mode: false # Disable dynamic mode
```

### Option B: ⚑ Dynamic Configuration

Enable dynamic EMR clusters mode in [config.yaml](../../../config.yaml):

```yaml
dynamic_emr_clusters_mode: true # Enable dynamic mode

# No need to pre-configure servers - specify clusters in tool calls
```

With dynamic mode, you can specify EMR clusters directly in AI queries:
- **By ARN**: `"arn:aws:emr:us-east-1:123456789012:cluster/j-1234567890ABC"`
- **By Cluster ID**: `"j-1234567890ABC"`
- **By Cluster Name**: `"my-production-cluster"` (active clusters only)

**Note**: The MCP Server manages the creation of the Persistent UI and its authentication using tokens with Persistent UI. You do not need to open the Persistent UI URL in a Web Browser. Please ensure the user running the MCP has access to create and view the Persistent UI for that cluster by following the [EMR Documentation](https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html#app-history-spark-UI-permissions).

## Step 4: Start the MCP Server
Expand Down
2 changes: 1 addition & 1 deletion src/spark_history_mcp/api/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
"""API clients for interacting with Spark History Server."""
"""API clients."""
62 changes: 62 additions & 0 deletions src/spark_history_mcp/api/client_factory.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
from typing import Optional

from spark_history_mcp.api.emr_persistent_ui_client import EMRPersistentUIClient
from spark_history_mcp.api.spark_client import SparkRestClient
from spark_history_mcp.config.config import ServerConfig


def create_spark_client_from_config(server_config: ServerConfig) -> SparkRestClient:
"""
Create a SparkRestClient from a ServerConfig.

This function handles both regular Spark History Servers and EMR Persistent UI configurations.

Args:
server_config: The server configuration

Returns:
SparkRestClient instance properly configured
"""
# Check if this is an EMR server configuration
if server_config.emr_cluster_arn:
return create_spark_emr_client(server_config.emr_cluster_arn, server_config)
else:
# Regular Spark REST client
return SparkRestClient(server_config)


def create_spark_emr_client(
emr_cluster_arn: str, server_config: Optional[ServerConfig] = None
) -> SparkRestClient:
"""
Create a SparkRestClient from EMR cluster arn and optional ServerConfig.

This function handles EMR Persistent UI applications.

Args:
emr_cluster_arn: The EMR cluster ARN
server_config: The server configuration

Returns:
SparkRestClient instance properly configured
"""
if server_config is None:
server_config = ServerConfig()
server_config.emr_cluster_arn = emr_cluster_arn
emr_client = EMRPersistentUIClient(server_config)

# Initialize EMR client (create persistent UI, get presigned URL, setup session)
base_url, session = emr_client.initialize()

# Create a modified server config with the base URL
if server_config is None:
server_config = ServerConfig()
else:
server_config = server_config.model_copy()
server_config.url = base_url

# Create SparkRestClient with the session
spark_client = SparkRestClient(server_config)
spark_client.session = session # Use the authenticated session

return spark_client
Loading