Skip to content

Commit edeff23

Browse files
committed
implement dynamic emr clusters mode
users will now be able to specify EMR cluster name/id/arn dynamically in the query. to enable this, set dyanmic_emr_cluster_mode in config.yaml to true. for usage with name, the cluster must be active (since cluster name can be resued) - added "dyanmic_emr_cluster_mode" boolean in config this mode can't be used together with static servers specification - all tool calls now require "server_spec" parameter servers_spec = { "static_server_spec": { "server_name": str, "default_client": bool }, "dynamic_emr_server_spec": { "emr_cluster_arn": str, "emr_cluster_id": str, "emr_cluster_arn": str } } in static mode, the static_server_spec is used. in dyanmic mode the dynamic_emr_server_spec is used. - dynamically created spark clients are cached: - by arn: lifetime - by id: lifetime - by name: for the session - created EMRclient to find the relevant cluster when needed Signed-off-by: yanivkrol <[email protected]>
1 parent b5fe828 commit edeff23

File tree

17 files changed

+1590
-213
lines changed

17 files changed

+1590
-213
lines changed

CONTRIBUTING.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,6 @@ def your_new_tool(
163163
**Don't forget to add tests:**
164164

165165
```python
166-
@patch("tools.get_client_or_default")
167166
def test_your_new_tool(self, mock_get_client):
168167
"""Test your new tool functionality"""
169168
# Setup mocks
@@ -172,7 +171,7 @@ def test_your_new_tool(self, mock_get_client):
172171
mock_get_client.return_value = mock_client
173172

174173
# Call the tool
175-
result = your_new_tool("spark-app-123")
174+
result = your_new_tool("spark-app-123", self.DEFAULT_SERVER_SPEC)
176175

177176
# Verify results
178177
self.assertEqual(result, expected_result)

README.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,10 @@ servers:
112112
auth: # optional
113113
username: "user"
114114
password: "pass"
115+
116+
# Enable dynamic EMR clusters mode to specify EMR clusters directly in tool calls
117+
dynamic_emr_clusters_mode: true
118+
115119
mcp:
116120
transports:
117121
- streamable-http # streamable-http or stdio.
@@ -243,13 +247,35 @@ servers:
243247
url: "http://staging-spark-history:18080"
244248
```
245249

250+
With static configuration:
251+
246252
💁 User Query: "Can you get application <app_id> using production server?"
247253

248254
🤖 AI Tool Request:
249255
```json
250256
{
251257
"app_id": "<app_id>",
252-
"server": "production"
258+
"server_spec": {
259+
"static_server_spec": {
260+
"server_name": "production"
261+
}
262+
}
263+
}
264+
```
265+
266+
With dynamic EMR configuration:
267+
268+
💁 User Query: "Can you get application <app_id> using cluster j-I4VIWMNGOIP7"
269+
270+
🤖 AI Tool Request:
271+
```json
272+
{
273+
"app_id": "<app_id>",
274+
"server_spec": {
275+
"dynamic_emr_server_spec": {
276+
"emr_cluster_id": "j-I4VIWMNGOIP7"
277+
}
278+
}
253279
}
254280
```
255281
🤖 AI Tool Response:
@@ -289,6 +315,7 @@ SHS_SERVERS_*_AUTH_TOKEN - Token for a specific server
289315
SHS_SERVERS_*_VERIFY_SSL - Whether to verify SSL for a specific server (true/false)
290316
SHS_SERVERS_*_TIMEOUT - HTTP request timeout in seconds for a specific server (default: 30)
291317
SHS_SERVERS_*_EMR_CLUSTER_ARN - EMR cluster ARN for a specific server
318+
SHS_DYNAMIC_EMR_CLUSTERS_MODE - Enable dynamic EMR clusters mode (default: false)
292319
```
293320
294321
## 🤖 AI Agent Integration

config.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ servers:
3636
# emr_persistent_ui:
3737
# emr_cluster_arn: "<EMR Cluster ARN>"
3838

39+
# dynamic_emr_clusters_mode: true # To be able to add cluster id or name to the prompt
40+
3941
mcp:
4042
transports:
4143
- streamable-http # streamable-http or stdio. you can only specify one right now.
@@ -55,3 +57,4 @@ mcp:
5557
# SHS_SERVERS_*_AUTH_TOKEN - Token for a specific server
5658
# SHS_SERVERS_*_VERIFY_SSL - Whether to verify SSL for a specific server (true/false)
5759
# SHS_SERVERS_*_EMR_CLUSTER_ARN - EMR cluster ARN for a specific server
60+
# SHS_DYNAMIC_EMR_CLUSTERS_MODE - Enable dynamic EMR clusters mode (default: false)

deploy/kubernetes/helm/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,12 @@ config:
8181
url: "http://dev-spark-history:18080"
8282
```
8383
84+
#### 1a. Dynamic EMR Clusters Configuration
85+
```yaml
86+
config:
87+
dynamic_emr_clusters_mode: true
88+
```
89+
8490
#### 2. Authentication Setup
8591
```yaml
8692
auth:

examples/aws/emr/README.md

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,12 @@
44

55
[![Watch the demo video](https://img.shields.io/badge/YouTube-Watch%20Demo-red?style=for-the-badge&logo=youtube)](https://www.youtube.com/watch?v=FaduuvMdGxI)
66

7-
If you are an existing Amazon EMR user looking to analyze your Spark Applications, then you can follow the steps below to start using the Spark History Server MCP in 5 simple steps.
7+
If you are an existing Amazon EMR user looking to analyze your Spark Applications, you have **two options**:
8+
9+
1. **🔧 Static Configuration** - Pre-configure EMR clusters in `config.yaml`
10+
2. **⚡ Dynamic Configuration** - Specify EMR clusters directly in tool calls
11+
12+
The dynamic approach is particularly useful when analyzing Spark applications across multiple EMR clusters without pre-configuration.
813

914
## Step 1: Setup project on your laptop
1015

@@ -24,16 +29,38 @@ task install # Install dependencies
2429

2530
Amazon EMR-EC2 users can use a service-managed [Persistent UI](https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html) which automatically creates the Spark History Server for Spark applications on a given EMR Cluster. You can directly go to Step 3 and configure the MCP server with an EMR Cluster Id to analyze the Spark applications on that cluster.
2631

27-
## Step 3: Configure the MCP Server to use the EMR Persistent UI
32+
## Step 3: Configure the MCP Server
33+
34+
You have **two configuration options**:
35+
36+
### Option A: 🔧 Static Configuration
2837

2938
- Identify the Amazon EMR Cluster Id for which you want the MCP server to analyze the Spark applications
3039
- Edit SHS MCP Config: [config.yaml](../../../config.yaml) to add the EMR Cluster Id
3140

3241
```yaml
33-
emr_persistent_ui:
34-
emr_cluster_arn: "<emr_cluster_arn>"
42+
servers:
43+
emr_persistent_ui:
44+
emr_cluster_arn: "<emr_cluster_arn>"
45+
46+
dynamic_emr_clusters_mode: false # Disable dynamic mode
47+
```
48+
49+
### Option B: ⚡ Dynamic Configuration
50+
51+
Enable dynamic EMR clusters mode in [config.yaml](../../../config.yaml):
52+
53+
```yaml
54+
dynamic_emr_clusters_mode: true # Enable dynamic mode
55+
56+
# No need to pre-configure servers - specify clusters in tool calls
3557
```
3658

59+
With dynamic mode, you can specify EMR clusters directly in AI queries:
60+
- **By ARN**: `"arn:aws:emr:us-east-1:123456789012:cluster/j-1234567890ABC"`
61+
- **By Cluster ID**: `"j-1234567890ABC"`
62+
- **By Cluster Name**: `"my-production-cluster"` (active clusters only)
63+
3764
**Note**: The MCP Server manages the creation of the Persistent UI and its authentication using tokens with Persistent UI. You do not need to open the Persistent UI URL in a Web Browser. Please ensure the user running the MCP has access to create and view the Persistent UI for that cluster by following the [EMR Documentation](https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html#app-history-spark-UI-permissions).
3865

3966
## Step 4: Start the MCP Server
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
"""API clients for interacting with Spark History Server."""
1+
"""API clients."""
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
from typing import Optional
2+
3+
from spark_history_mcp.api.emr_persistent_ui_client import EMRPersistentUIClient
4+
from spark_history_mcp.api.spark_client import SparkRestClient
5+
from spark_history_mcp.config.config import ServerConfig
6+
7+
8+
def create_spark_client_from_config(server_config: ServerConfig) -> SparkRestClient:
9+
"""
10+
Create a SparkRestClient from a ServerConfig.
11+
12+
This function handles both regular Spark History Servers and EMR Persistent UI configurations.
13+
14+
Args:
15+
server_config: The server configuration
16+
17+
Returns:
18+
SparkRestClient instance properly configured
19+
"""
20+
# Check if this is an EMR server configuration
21+
if server_config.emr_cluster_arn:
22+
return create_spark_emr_client(server_config.emr_cluster_arn, server_config)
23+
else:
24+
# Regular Spark REST client
25+
return SparkRestClient(server_config)
26+
27+
28+
def create_spark_emr_client(
29+
emr_cluster_arn: str, server_config: Optional[ServerConfig] = None
30+
) -> SparkRestClient:
31+
"""
32+
Create a SparkRestClient from EMR cluster arn and optional ServerConfig.
33+
34+
This function handles EMR Persistent UI applications.
35+
36+
Args:
37+
emr_cluster_arn: The EMR cluster ARN
38+
server_config: The server configuration
39+
40+
Returns:
41+
SparkRestClient instance properly configured
42+
"""
43+
if server_config is None:
44+
server_config = ServerConfig()
45+
server_config.emr_cluster_arn = emr_cluster_arn
46+
emr_client = EMRPersistentUIClient(server_config)
47+
48+
# Initialize EMR client (create persistent UI, get presigned URL, setup session)
49+
base_url, session = emr_client.initialize()
50+
51+
# Create a modified server config with the base URL
52+
if server_config is None:
53+
server_config = ServerConfig()
54+
else:
55+
server_config = server_config.model_copy()
56+
server_config.url = base_url
57+
58+
# Create SparkRestClient with the session
59+
spark_client = SparkRestClient(server_config)
60+
spark_client.session = session # Use the authenticated session
61+
62+
return spark_client

0 commit comments

Comments
 (0)