[Automatic Import] Adding support for larger samples in ECS graph by P1llus · Pull Request #190426 · elastic/kibana

P1llus · 2024-08-13T14:39:44Z

Summary

This PR prepares the ECS Mapping graph to support larger samples by chunking and running certain parts of the graph concurrently side by side and merging the results rather than trying to use one large context.

More details below, but in general there is only a slight modification to the actual code, most of the lines are related to moving code around to new files and updated tests.

There are also some minor tweaks to the ECS graph code in general, below is the related changes:

Moved some code out of graph.ts to make it a bit smaller (moved model* functions to a new model.ts, moved state to its own file.
Added chunkSize as a optional input to the graph (default to 10 fields with an actual string value per chunk). Just to allow it to be overwritten if necessary later.
Renamed the samples state to prefixedSamples and formattedSamples to combinedSamples as it got really confusing at some point when debugging. I also updated the function argument names that used them to the new names to better understand which sample type they are using.
Renamed modifySamples to prefixSamples to clarify what it actually modifies
Moved mapping, invalid, duplicate, missing and validate nodes to its own subgraph. The combinedSamples state is now set when invoking the subgraph, the value will be its related chunk, so it only needs to work on this smaller subset of data.
The currentMapping state is now only used by the sub graph, once all the subgraphs has finished, the will post their own results to finalMapping state. This state uses a reducer function, that combines the existing state with the new, so all results from the X subgraphs running will be merged into the same resulting object as before this PR.

Checklist

Delete any items that are not applicable to this PR.

Unit or functional tests were updated or added to match the most common scenarios

For maintainers

This was checked for breaking API changes and was labeled appropriately

P1llus · 2024-08-13T14:40:09Z

@spong FYI on dependency bump we talked about.

… code

elasticmachine · 2024-08-14T09:28:35Z

Pinging @elastic/security-scalability (Team:Security-Scalability)

bhapas

Overall looks good. Just minor questions / comments

x-pack/plugins/integration_assistant/server/graphs/ecs/chunk.ts

x-pack/plugins/integration_assistant/server/graphs/ecs/graph.ts

x-pack/plugins/integration_assistant/server/graphs/ecs/model.ts

bhapas

LGTM

… well as the ECS graph

P1llus · 2024-08-15T13:10:21Z

For the last failed types I am waiting on some guidance from the code owners, see if we can resolve the more strict type checking on agent state that might have been the result of bumping the dependencies.

…-fix'

@spong

## Summary **NOTE** I will need help testing this before we merge it! I spoke with @spong about an upcoming PR we have here: #190426 which bumps the langgraph version from 0.0.31 to 0.0.34, unfortunately this caused a lot of type errors in the default assistant. After some more discussion we proposed to open a PR that removes some of the more complex layers and to fix up the type issues. Though I have not worked on this graph before, the changes hopefully makes sense 👍 Graph flow: ![image](https://github.com/user-attachments/assets/911190c1-2cdc-429f-bd1b-2b4a6a343729) The PR changes the below items to remove some of the abstractions and resolve some of the type issues, also adds a few improvements in general: - Moves `llmType`, `bedrockChatEnabled`, `isStream` and `conversationId` to be invoke parameters rather than compile parameters. This allows them to be used in state, and removes the need to pass them everywhere as parameters. Adding them to the state also allows them to be available in langsmith. - Removes the constants defining each node with wrappers and rather expose them directly as async functions. This removes a lot of the boilerplate code and it makes reading the stacktraces much easier. - Moved to a single `stepRouter` used for the current conditional edges. This allows one to very easily extend the routing between either existing or new nodes, and makes it much easier to understand what conditions are routed where. - Exports a common `NodeType` object constant (no need for the extra compile overhead of Enums here, we are only using strings), to make the node name strings auto-complete and prevent hardcoded names for the router. - Added a `modelInput` node to be the starter node. This was first because adding nodes inside if conditions usually create errors, so it was created to be able to set the `hasRespondStep` state. However this node is nice to have as an entrypoint in which you find yourself wanting to change the state based on the invoke parameters or other conditions retrieved from other parts of the stack etc before it continues to any of the other nodes. - Added a `yarn draw-graph` command, that outputs to `docs/img/default_assistant_graph.png`. This is then also included in the readme. This makes it better for changes by other teams (like me) to understand the intended graph workflows easier. ### Checklist Delete any items that are not applicable to this PR. - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials ### For maintainers - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

P1llus · 2024-08-23T06:12:10Z

@elasticmachine merge upstream

kibana-ci · 2024-08-23T14:00:49Z

💚 Build Succeeded

Buildkite Build
Commit: 79463fe

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id	before	after	diff
`integrationAssistant`	3	4	+1

Total ESLint disabled count

id	before	after	diff
`integrationAssistant`	10	11	+1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

…astic#190426) ## Summary This PR prepares the ECS Mapping graph to support larger samples by chunking and running certain parts of the graph concurrently side by side and merging the results rather than trying to use one large context. More details below, but in general there is only a slight modification to the actual code, most of the lines are related to moving code around to new files and updated tests. There are also some minor tweaks to the ECS graph code in general, below is the related changes: 1. Moved some code out of graph.ts to make it a bit smaller (moved model* functions to a new model.ts, moved state to its own file. 2. Added chunkSize as a optional input to the graph (default to 10 fields with an actual string value per chunk). Just to allow it to be overwritten if necessary later. 3. Renamed the `samples` state to `prefixedSamples` and `formattedSamples` to `combinedSamples` as it got really confusing at some point when debugging. I also updated the function argument names that used them to the new names to better understand which sample type they are using. 4. Renamed `modifySamples` to `prefixSamples` to clarify what it actually modifies 5. Moved `mapping`, `invalid`, `duplicate`, `missing` and `validate` nodes to its own subgraph. The `combinedSamples` state is now set when invoking the subgraph, the value will be its related `chunk`, so it only needs to work on this smaller subset of data. 6. The `currentMapping` state is now only used by the sub graph, once all the subgraphs has finished, the will post their own results to `finalMapping` state. This state uses a reducer function, that combines the existing state with the new, so all results from the X subgraphs running will be merged into the same resulting object as before this PR. ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### For maintainers - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> (cherry picked from commit 8e66a3e)

kibanamachine · 2024-08-26T20:06:48Z

💚 All backports created successfully

Status	Branch	Result
✅	8.15

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

@spong

## Summary **NOTE** I will need help testing this before we merge it! I spoke with @spong about an upcoming PR we have here: elastic#190426 which bumps the langgraph version from 0.0.31 to 0.0.34, unfortunately this caused a lot of type errors in the default assistant. After some more discussion we proposed to open a PR that removes some of the more complex layers and to fix up the type issues. Though I have not worked on this graph before, the changes hopefully makes sense 👍 Graph flow: ![image](https://github.com/user-attachments/assets/911190c1-2cdc-429f-bd1b-2b4a6a343729) The PR changes the below items to remove some of the abstractions and resolve some of the type issues, also adds a few improvements in general: - Moves `llmType`, `bedrockChatEnabled`, `isStream` and `conversationId` to be invoke parameters rather than compile parameters. This allows them to be used in state, and removes the need to pass them everywhere as parameters. Adding them to the state also allows them to be available in langsmith. - Removes the constants defining each node with wrappers and rather expose them directly as async functions. This removes a lot of the boilerplate code and it makes reading the stacktraces much easier. - Moved to a single `stepRouter` used for the current conditional edges. This allows one to very easily extend the routing between either existing or new nodes, and makes it much easier to understand what conditions are routed where. - Exports a common `NodeType` object constant (no need for the extra compile overhead of Enums here, we are only using strings), to make the node name strings auto-complete and prevent hardcoded names for the router. - Added a `modelInput` node to be the starter node. This was first because adding nodes inside if conditions usually create errors, so it was created to be able to set the `hasRespondStep` state. However this node is nice to have as an entrypoint in which you find yourself wanting to change the state based on the invoke parameters or other conditions retrieved from other parts of the stack etc before it continues to any of the other nodes. - Added a `yarn draw-graph` command, that outputs to `docs/img/default_assistant_graph.png`. This is then also included in the readme. This makes it better for changes by other teams (like me) to understand the intended graph workflows easier. ### Checklist Delete any items that are not applicable to this PR. - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials ### For maintainers - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> (cherry picked from commit b660d42) # Conflicts: # x-pack/plugins/elastic_assistant/server/lib/langchain/graphs/default_assistant_graph/nodes/execute_tools.ts # x-pack/plugins/elastic_assistant/server/lib/langchain/graphs/default_assistant_graph/nodes/generate_chat_title.ts # x-pack/plugins/elastic_assistant/server/lib/langchain/graphs/default_assistant_graph/nodes/run_agent.ts # x-pack/plugins/elastic_assistant/server/lib/langchain/graphs/default_assistant_graph/nodes/should_continue.ts

@spong

…191386) # Backport This will backport the following commits from `main` to `8.15`: - [[Elastic Assistant] Update default assistant graph (#190686)](#190686)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)

…aph (#190426) (#191314) # Backport This will backport the following commits from `main` to `8.15`: - [[Automatic Import] Adding support for larger samples in ECS graph (#190426)](#190426)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  Co-authored-by: Marius Iversen <marius.iversen@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

P1llus added 2 commits August 13, 2024 16:33

adding concurrent graph execution with mapping chunks to ecs graph

6d73896

adding some extra comments on the new subgraph parts

7e0df14

P1llus added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting v8.16.0 Team:Security-Scalability Security Integrations Scalability Team labels Aug 13, 2024

P1llus added 4 commits August 14, 2024 11:10

update test fixtures for new chunk settings and updated state names

71d1a3a

update some ecs state names, add chunkSize option, clean up ECS graph…

a3589c8

… code

rename modifySamples to prefixSamples

bb51728

Merge branch 'main' into automatic_import_ecs_chunking

25af2a8

P1llus marked this pull request as ready for review August 14, 2024 09:28

P1llus requested a review from a team as a code owner August 14, 2024 09:28

fix missing types in test fixture state

71f90fe

bhapas reviewed Aug 14, 2024

View reviewed changes

P1llus added 2 commits August 14, 2024 14:45

adding tests for chunking and merge

5235f35

Merge branch 'main' into automatic_import_ecs_chunking

47e94af

bhapas approved these changes Aug 14, 2024

View reviewed changes

P1llus added 5 commits August 14, 2024 15:45

move mergeSamples around as it was required by integration_builder as…

e5e02f9

… well as the ECS graph

Merge branch 'main' into automatic_import_ecs_chunking

55e4305

update state type

8380d9a

Merge branch 'main' into automatic_import_ecs_chunking

f3f829b

adding comment for prefixSamples

6ef4f51

P1llus mentioned this pull request Aug 19, 2024

[Elastic Assistant] Update default assistant graph #190686

Merged

2 tasks

P1llus and others added 4 commits August 22, 2024 11:18

Merge branch 'main' into automatic_import_ecs_chunking

57f7db0

[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…

a13a797

…-fix'

Merge branch 'main' into automatic_import_ecs_chunking

fdf332a

Update state.ts

7420ef1

Merge branch 'main' into automatic_import_ecs_chunking

db2b0a5

bhapas mentioned this pull request Aug 23, 2024

[ AutoImport] Introduce automatic log type detection graph #190407

Merged

2 tasks

P1llus added 3 commits August 23, 2024 10:18

Merge branch 'main' into automatic_import_ecs_chunking

8886016

Merge branch 'main' into automatic_import_ecs_chunking

0e7fc28

Merge branch 'main' into automatic_import_ecs_chunking

79463fe

P1llus merged commit 8e66a3e into elastic:main Aug 23, 2024

P1llus added backport:prev-minor and removed backport:skip This PR does not require backporting labels Aug 26, 2024

P1llus self-assigned this Aug 26, 2024

P1llus added the v8.15.1 label Aug 26, 2024

kibanamachine mentioned this pull request Aug 26, 2024

[8.15] [Automatic Import] Adding support for larger samples in ECS graph (#190426) #191314

Merged

kibanamachine mentioned this pull request Aug 27, 2024

[Automatic Import] resolve a bug in ECS missing fields detection #191502

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Automatic Import] Adding support for larger samples in ECS graph#190426

[Automatic Import] Adding support for larger samples in ECS graph#190426
P1llus merged 22 commits intoelastic:mainfrom
P1llus:automatic_import_ecs_chunking

P1llus commented Aug 13, 2024 •

edited by kibanamachine

Loading

Uh oh!

P1llus commented Aug 13, 2024

Uh oh!

elasticmachine commented Aug 14, 2024

Uh oh!

bhapas left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bhapas left a comment

Uh oh!

P1llus commented Aug 15, 2024

Uh oh!

P1llus commented Aug 23, 2024

Uh oh!

kibana-ci commented Aug 23, 2024

ESLint disabled in files

Total ESLint disabled count

Uh oh!

kibanamachine commented Aug 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

P1llus commented Aug 13, 2024 • edited by kibanamachine Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

For maintainers

Uh oh!

P1llus commented Aug 13, 2024

Uh oh!

elasticmachine commented Aug 14, 2024

Uh oh!

bhapas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bhapas left a comment

Choose a reason for hiding this comment

Uh oh!

P1llus commented Aug 15, 2024

Uh oh!

P1llus commented Aug 23, 2024

Uh oh!

kibana-ci commented Aug 23, 2024

💚 Build Succeeded

Metrics [docs]

ESLint disabled in files

Total ESLint disabled count

History

Uh oh!

kibanamachine commented Aug 26, 2024

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

P1llus commented Aug 13, 2024 •

edited by kibanamachine

Loading