[Task Manager] Use strategy pattern for API key management#259373
[Task Manager] Use strategy pattern for API key management#259373ersin-erdal merged 46 commits intoelastic:mainfrom
Conversation
Introduces a strategy pattern for handling ES and UIAM API keys in Task Manager. - ApiKeyStrategy interface with EsApiKeyStrategy (default) and EsAndUiamApiKeyStrategy implementations - Grant UIAM keys proactively when security.authc.apiKeys.uiam is available - Use UIAM keys for fake requests only when config api_key_type is 'uiam', with ES fallback - Persist UIAM keys as raw essu_ credentials, ES keys as encoded id:value pairs - Wire registerUiamApiKeyInvalidateFn through invalidation task to runInvalidate - Add uiamApiKey to task SO (encrypted), uiamApiKeyId to userScope - Add task SO model version 9, schema v9, and mappings for uiamApiKeyId - Decrypt both apiKey and uiamApiKey when reading tasks for invalidation Made-with: Cursor
|
Pinging @elastic/response-ops (Team:ResponseOps) |
# Conflicts: # x-pack/platform/plugins/shared/task_manager/server/task_running/task_runner.ts
# Conflicts: # x-pack/platform/plugins/shared/task_manager/server/plugin.ts # x-pack/platform/plugins/shared/task_manager/server/polling_lifecycle.test.ts # x-pack/platform/plugins/shared/task_manager/server/polling_lifecycle.ts # x-pack/platform/plugins/shared/task_manager/server/task_running/task_runner.test.ts # x-pack/platform/plugins/shared/task_manager/server/task_running/task_runner.ts
| attributes: { | ||
| apiKeyId: target.apiKeyId, | ||
| createdAt: new Date().toISOString(), | ||
| ...(target.uiamApiKey ? { uiamApiKey: target.uiamApiKey } : {}), |
There was a problem hiding this comment.
is it okay to write an actual api key to a regular SO here?
There was a problem hiding this comment.
Good catch, no it is not, converted it to an encrypted field.
| attributesToEncrypt: new Set(['uiamApiKey']), | ||
| attributesToIncludeInAAD: new Set(['apiKeyId', 'createdAt']), | ||
| }); | ||
|
|
There was a problem hiding this comment.
@jeramysoucy Can you review this part please?
api_key_to_invalidate is an existing SO type, now we are adding a new encrypted field, uiamApiKey.
Do we need an intermediate release?
There was a problem hiding this comment.
I don't think this has come up before, but it is a similar case to adding an encrypted attribute to an existing ESO.
Is uiamApiKey a new attribute for this SO type? Or has it already been in use as an non-encrypted field? Is this attribute expected to be populated already in the current version of serverless Kibana?
The concern here would be that the unencrypted value that is already stored in objects will not be decryptable in the new version of Kibana. I think we can solve this scenario with the createModelVersion wrapper. However, the scenario where the previous version of Kibana expects the value to not be encrypted might be problematic.
Could you provide some additional info?
- Is the attribute already in use?
- If so, what is the life cycle for the ESO? Is it short-lived or persisted over a long duration?
Related, I see you're also adding an encrypted attribute to the task ESO. I have the same concerns/questions for that as well.
There was a problem hiding this comment.
Is the attribute already in use?
No it is not, hasn't been used before.
Is this attribute expected to be populated already in the current version of serverless Kibana?
No, it will be populated with this PR
Related, I see you're also adding an encrypted attribute to the task ESO. I have the same concerns/questions for that as well.
It is the same, new field as encrypted, hasn't been used before.
There was a problem hiding this comment.
Thanks! In this case we should be ok. The old version of Kibana will not attempt to decrypt or utilize the encrypted value.
Relocate internal task_manager schedule and delete HTTP routes from the task_manager plugin to ftr_apis, which hosts test-only APIs. Wire ftr_apis to requiredPlugins.taskManager and register the same paths unchanged so Scout tests keep working. Made-with: Cursor
doakalexi
left a comment
There was a problem hiding this comment.
LGTM! Tested locally and works as expected
| path: '/internal/task_manager/tasks/{taskId}', | ||
| security: { | ||
| authz: { | ||
| requiredPrivileges: [ReservedPrivilegesSet.superuser], |
There was a problem hiding this comment.
I am not sure if this is required, but I was looking at some other routes in ftr_apis and they use requiredPrivileges: ['ftrApis']. Not sure if this is something we should use too?
There was a problem hiding this comment.
switched to it, looks working
Gate EsAndUiamApiKeyStrategy on a new boolean config `xpack.task_manager.grant_uiam_api_keys` (default false) so UIAM API key granting can be rolled out independently of the existing `api_key_type` setting, which continues to govern usage. Made-with: Cursor
Scout Test Review🔴 Blocker:
|
- Enable xpack.task_manager.grant_uiam_api_keys in the serverless Scout base config so EsAndUiamApiKeyStrategy is used and scheduled tasks get both apiKey and uiamApiKey populated. - Regenerate .meta/api/parallel.json for task_manager to include the full set of tests (manifest was missing task_schedule_and_delete). - Add safety-net task cleanup in task_api_keys.spec.ts afterAll. - Use the local apiTest fixture in spec files instead of importing from @kbn/scout directly. - Declare COMMON_HEADERS as const. Made-with: Cursor
Please temporarily discard the Scout Test Review comment. We're whipping up a fix to make this more relevant. The blocker "Blocker: .meta manifest is incomplete" isn't something we encourage necessarily (no harm in regenerating config manifest files, but this isn't a blocker at all). |
Scout Test Review✅ Previous blocker resolvedThe 🔵 Minor1. Reuse credentials — cache
|
The `xpack.task_manager.grant_uiam_api_keys` rollout flag is not yet enabled on MKI, so tests that depend on it must not run there. Split the Scout specs: - `task_api_keys.spec.ts` moved to a new `scout_task_manager_uiam` Scout root, served by a new `task_manager_uiam` config set that turns the flag on. Runs only on Kibana CI. - `task_schedule_and_delete.spec.ts` stays under the default `scout` root and keeps running on MKI. Made-with: Cursor
Scout Test Review⚪ Nit: Duplicate fixture files across test directoriesRule: Use constants for shared test values Files:
These are byte-for-byte identical. Since both test suites belong to the same plugin, consider extracting shared constants (e.g. That said, the duplication is small and both directories are self-contained — so this is low priority. 👍 Good config set separationNice work moving Posted via Macroscope — Scout Test Review |
💔 Build Failed
Failed CI StepsTest Failures
Metrics [docs]Public APIs missing exports
History
cc @ersin-erdal |
Scout Test ReviewFound 1 new issue (minor). See inline comment for details. Previous feedback (duplicate fixtures nit, credential caching minor) still applies — no code changes observed for those, so not re-posting. This review is experimental. Share your feedback in the #appex-qa channel. Posted via Macroscope — Scout Test Review |
There was a problem hiding this comment.
Minor issue in the UIAM API keys test: fragile absolute count assertion.
Posted via Macroscope — Scout Test Review
Parallel execution was unreliable for these API tests. Move both Scout roots (`scout/api` and `scout_task_manager_uiam/api`) from the `parallel_tests` layout to the sequential `tests` layout with a single `playwright.config.ts` and `.meta/api/standard.json` manifests. Made-with: Cursor
dmlemeshko
left a comment
There was a problem hiding this comment.
src/platform/packages/shared/kbn-scout/src/servers/configs/config_sets/task_manager_uiam/serverless/observability_complete.serverless.config.ts LGTM
There was a problem hiding this comment.
Scout Test Review
Re-flagging 1 issue (minor) — the file moved from parallel_tests/ to tests/, making the prior inline comment orphaned. See inline comment for details.
Previous feedback (duplicate fixtures nit, credential caching minor) still applies — no substantive code changes observed for those, so not re-posting.
This review is experimental. Share your feedback in the #appex-qa channel.
Posted via Macroscope — Scout Test Review
| const { saved_objects: pendingBefore } = await kbnClient.savedObjects.find({ | ||
| type: 'api_key_to_invalidate', | ||
| }); | ||
| expect(pendingBefore).toHaveLength(0); |
There was a problem hiding this comment.
🔵 Minor — Keep test suites independent
(Re-flagging — file moved from parallel_tests/ to tests/, orphaning the previous comment.)
Asserting an absolute count of 0 before the delete is fragile: if a prior run's afterAll cleanup didn't execute (runner crash, timeout, etc.), leftover api_key_to_invalidate SOs will cause a misleading failure.
Consider using a relative count so the test is resilient to leftover state:
| const { saved_objects: pendingBefore } = await kbnClient.savedObjects.find({ | |
| type: 'api_key_to_invalidate', | |
| }); | |
| expect(pendingBefore).toHaveLength(0); | |
| const { saved_objects: pendingBefore } = await kbnClient.savedObjects.find({ | |
| type: 'api_key_to_invalidate', | |
| }); | |
| const countBefore = pendingBefore.length; |
Then assert the delta at the end:
expect(pendingAfter).toHaveLength(countBefore + 2);Posted via Macroscope — Scout Test Review
| @@ -886,24 +907,22 @@ export class TaskStore { | |||
|
|
|||
| private async _bulkRemove(taskIds: string[]): Promise<SavedObjectsBulkDeleteResponse> { | |||
| const taskInstances = await this._bulkGet(taskIds); | |||
There was a problem hiding this comment.
🟢 Low server/task_store.ts:909
In _bulkRemove, unwrap(taskInstance) returns either a ConcreteTaskInstance or an error object { type; id; error }, but the result is immediately cast to ConcreteTaskInstance without checking isOk(taskInstance) first. When a task retrieval fails, the error object lacks apiKey/uiamApiKey/userScope, so docHasEncryptedApiKey returns false and the API key is never marked for invalidation — leaving orphaned keys in the system.
🤖 Copy this AI Prompt to have your agent fix this:
In file x-pack/platform/plugins/shared/task_manager/server/task_store.ts around line 909:
In `_bulkRemove`, `unwrap(taskInstance)` returns either a `ConcreteTaskInstance` or an error object `{ type; id; error }`, but the result is immediately cast to `ConcreteTaskInstance` without checking `isOk(taskInstance)` first. When a task retrieval fails, the error object lacks `apiKey`/`uiamApiKey`/`userScope`, so `docHasEncryptedApiKey` returns `false` and the API key is never marked for invalidation — leaving orphaned keys in the system.
Evidence trail:
1. x-pack/platform/plugins/shared/task_manager/server/task_store.ts lines 132-134: `BulkGetResult` type definition showing error type as `{ type: string; id: string; error: SavedObjectError }`
2. x-pack/platform/plugins/shared/task_manager/server/task_store.ts lines 976-1003: `_bulkGet` returns `asErr({ id, type, error })` for failed retrievals
3. x-pack/platform/plugins/shared/task_manager/server/lib/result_type.ts line 77: `unwrap` returns `result.error` (not `result.value`) when result is Err
4. x-pack/platform/plugins/shared/task_manager/server/task_store.ts lines 912-917: `unwrap(taskInstance) as ConcreteTaskInstance` without `isOk` check
5. x-pack/platform/plugins/shared/task_manager/server/task_store.ts lines 1336-1339: `docHasEncryptedApiKey` checks `apiKey`/`uiamApiKey`/`userScope` which are undefined on error objects
6. x-pack/platform/plugins/shared/task_manager/server/task_store.ts lines 927-928: `bulkDelete` uses original `taskIds`, not filtered by retrieval success
Summary
Refactors the Task Manager API key handling to use a strategy pattern, replacing scattered conditional logic with two concrete strategy implementations.
Key design rule: Grant and Usage are independent
security.authc.apiKeys.uiamis available (proactively store both keys so switching is seamless)config.api_key_type === 'uiam'Strategy Pattern
ApiKeyStrategyinterface defines grant, use, invalidation, and marking behaviorsEsApiKeyStrategy(default): grants only ES keys, uses ES keys for fake requestsEsAndUiamApiKeyStrategy: grants both ES + UIAM keys, uses UIAM for fake requests when configured (with ES fallback for pre-existing tasks)createApiKeyStrategyselects based on UIAM availabilityAPI Key Storage
id:keypair (base64), used asApiKey <base64>in headersessu_*credential, used asApiKey essu_*in headersapiKeyanduiamApiKeyare encrypted via ESO on the task SOuiamApiKeyIdstored inuserScopefor invalidation trackingChanges
api_key_strategy.ts,es_api_key_strategy.ts,es_and_uiam_api_key_strategy.ts,create_api_key_strategy.tstask.ts: AddeduiamApiKeytoTaskInstance/SerializedConcreteTaskInstance,uiamApiKeyIdtoTaskUserScopetask_store.ts: UsesApiKeyStrategyfor granting, invalidation marking, and decrypts bothapiKeyanduiamApiKeytask_runner.ts: Usesstrategy.getApiKeyForFakeRequest()for key selectionplugin.ts: Creates strategy, addsregisterUiamApiKeyInvalidateFnto start contracttask_manager_dependencies: RegistersuiamApiKeyfor ESO encryption, wiresregisterUiamApiKeyInvalidateFninvalidate_api_keys_task.ts: PassesinvalidateUiamApiKeyFnthrough torunInvalidateuiamApiKeyIdmappingTo verify:
Add below configs to your serverless.dev.yml
xpack.task_manager.api_key_type: 'uiam'xpack.task_manager.invalidate_api_key_task.removalDelay: '10s'xpack.task_manager.invalidate_api_key_task.interval: '1m'so we can see the invalidation task works without a problem.
Run Kibana and Elasticsearch with the below commands:
yarn es serverless --projectType observability --uiamyarn start --serverless oblt --uiam --run-examplesLogin with the
system_indices_superuseruser.Go to: http://localhost:5601/app/triggersActionsUiExample/task_manager_with_api_key
Create
task 1andtask 2Use the below query to check if the task has both ES and UIAM API Keys
Wait for a minute and do the same as above, the API Keys should be invalidated and removed from the index.
There shouldn't be any error on the console.
Add the task you removed again and do the same tests with
Remove All Taskswhich uses bulkDelete.