Skip to content

[Automatic Migrations] Dashboard migration agent initial Implementation#232637

Closed
logeekal wants to merge 61 commits intoelastic:mainfrom
logeekal:feat/automatic_migration_dashboard_agent
Closed

[Automatic Migrations] Dashboard migration agent initial Implementation#232637
logeekal wants to merge 61 commits intoelastic:mainfrom
logeekal:feat/automatic_migration_dashboard_agent

Conversation

@logeekal
Copy link
Contributor

@logeekal logeekal commented Aug 22, 2025

Summary

Initial implementation of the dashboard translation agent.

Dashboard migration graph

  • The Splunk XML is parsed into an array of panels to translate
  • All individual panels are processed simultaneously.
  • Query translation to ESQL uses the same logic as rules migrations.
  • indexPattern selection node implemented using AI Assistant subgraph
  • Dashboard panels are aggregated in a JSON using predefined visualization templates

Additional changes

  • Code encapsulation for reusable node helpers
  • Graph nodes improved (for both rules and dashboards):
    • Fixed ECS mapping node bug, which was not properly updating the query with the translated field names.
    • Inline query node improved to create all the missing macro and lookup placeholders
    • Query validation & fix self-healing loop improved to not remove placeholders
    • Query translation prompt improved.
  • Siem migration client dependencies unified (same for rules and dashboards)
  • Legacy chat models removed in favour of the generic InferenceChatModel.

About Rules migrations

This PR moves some logic from the rule migration agent to shared agent helpers, to be used by the dashboard migration agent as well.
At the same time, these node helpers (validate_esql, tranlsate_spl_to_esql, inline_spl_query, fix_esql_errors...) have been improved and some bugs have been fixed, as mentioned above.
Hence, this PR not only introduces the dashboard migration agent but also impacts the rules translations, which will receive improvements for free. However, we'll also need to test rule migrations and ensure no regression has been introduced.

Dashboard migration graph

  • translatePanel sub-graph is executed on a per-panel basis, concurrently.
  • Panel translations are aggregated in the last aggregateDashboard node to create the elastic dashboard JSON.
dashboard_migration_agent_graph

Example Trace

https://smith.langchain.com/public/ace2e897-84d8-4cec-8a35-53177816a4a1/r

@logeekal logeekal changed the title Feat/automatic migration dashboard agent [Security Solution] Automatic dashboard migration agent Aug 25, 2025
@logeekal logeekal changed the title [Security Solution] Automatic dashboard migration agent [Security Solution] Automatic dashboard migration agent initial Implementation Aug 25, 2025
@logeekal
Copy link
Contributor Author

❗ Move to Inference Model

As mentioned in this PR : #206710, we no longer need to use per-provider based model and can simply use InferenceChatModel. See if we can make this changes in this PR itself.

@logeekal logeekal changed the title [Security Solution] Automatic dashboard migration agent initial Implementation [Automatic Migrations] Dashboard migration agent initial Implementation Aug 25, 2025
@semd semd requested review from a team as code owners September 2, 2025 18:15
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-threat-hunting (Team:Threat Hunting)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@semd semd added ci:cloud-deploy Create or update a Cloud deployment ci:cloud-persist-deployment Persist cloud deployment indefinitely ci:cloud-deploy-elser If set, the ML node in the ES cluster will be deployed with considerations towards the ELSER model labels Sep 3, 2025
Copy link
Contributor Author

@logeekal logeekal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @semd . Still not done with full code review but will resume after today's meetings. After going through the trace you shared. I have some comments and questions.

  1. Regarding Empty panel : https://github.com/elastic/kibana/pull/232637/files#r2318624696
  2. Why do we need Panel descriptions and panel dashboards in the index? Please ignore. I misunderstood. they are only part of graph.
  3. And if we do, we can put them in the index when we import it so that we do not have to parse again.
  4. In this particular translatePanel node, the indexPattern is .*, which fails in the dashboard i have linked in the one of the comments below. May be we should specially check it and replace it with placeholder.
  5. If a panel is untranslatable, I think we should add comments paired with title of the panel and why it was untranslatable, similar to Rules.

We can ofcourse discuss these points in the Eng. sync.

Copy link
Contributor Author

@logeekal logeekal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall code is looking great ✨ . All are minor comments.. Doing some desk testing and will get back.

Comment on lines +17 to +18
const parser = new SplunkXmlDashboardParser(state.original_dashboard.data);
const panels = await parser.extractPanels();
Copy link
Contributor Author

@logeekal logeekal Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current method works but SplunkXmlDashboardParser has a static method called isSupportedSplunkXml which does a quick regex check without parsing if the dashboard is even support and returns a relevant error.

I think it could be useful to do a quick check before even spending time in parsing and we could even return the relevant error. See if that is of any use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@logeekal
Copy link
Contributor Author

logeekal commented Sep 4, 2025

@semd , I found only one issue which i think should be fixed before this PR is merged.

If you check a trace like this : https://smith.langchain.com/o/a9ce6102-b198-4b3d-9190-95bedc24ca4f/projects/p/66aedda3-8cfd-4eee-950d-7ba2f93a317e?timeModel=%7B%22duration%22%3A%227d%22%7D&searchModel=%7B%22filter%22%3A%22eq%28is_root%2C+true%29%22%7D&peek=fd40dcdf-824f-4f05-9005-35dd5154c925&peeked_trace=fd40dcdf-824f-4f05-9005-35dd5154c925

Here the dashboard description generation is failing because there are no panels. I think we should have a node like translationResult for the complete dashboard as well instead of just single panels which will assign missing/final/default values.

May be there should be a node which will check panels existence and if not, will simply skip the complete graph based on the conditional check with correct properties such as translation_result : untranslatable

Otherwise fields such as translationResult is not even populated.

@semd
Copy link
Contributor

semd commented Sep 4, 2025

Here the dashboard description generation is failing because there are no panels. I think we should have a node like translationResult for the complete dashboard as well instead of just single panels which will assign missing/final/default values.

May be there should be a node which will check panels existence and if not, will simply skip the complete graph based on the conditional check with correct properties such as translation_result : untranslatable

Otherwise fields such as translationResult is not even populated.

@logeekal
The createDescriptions is not failing, it's the translatePanels that has no work to do and stops without sending anything to the next node. We only need to add a conditionalEdge right after the parse to check if there are panels or not.
The final node that sets the translation_result is the aggregateDashboard.

fixed here: 171664f

Example (evaluation) trace: https://smith.langchain.com/public/213446e7-9819-43c8-a4db-02c20ee3bf2f/r

@semd
Copy link
Contributor

semd commented Sep 4, 2025

Appreciate the feedback, @logeekal.
This change is intentionally a first implementation to unblock the work. To keep it manageable, I’d like to limit the scope and iterate in separate PRs.
Let's keep track of the follow-up work in this separate ticket https://github.com/elastic/security-team/issues/13875

@semd semd removed ci:cloud-deploy Create or update a Cloud deployment ci:cloud-persist-deployment Persist cloud deployment indefinitely ci:cloud-deploy-elser If set, the ML node in the ES cluster will be deployed with considerations towards the ELSER model labels Sep 4, 2025
@semd
Copy link
Contributor

semd commented Sep 4, 2025

@logeekal I created a PR on my side so I don't appear as a reviewer, and you can properly approve/request changes. Let's continue there. closing this one.

#234046

@semd semd closed this Sep 4, 2025
@elasticmachine
Copy link
Contributor

💔 Build Failed

Failed CI Steps

History

cc @semd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Threat Hunting Security Solution Threat Hunting Team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants