feat: Add service health to tctl bots instances ls|show#60316
Merged
nicholasmarais1158 merged 7 commits intomasterfrom Oct 21, 2025
Merged
feat: Add service health to tctl bots instances ls|show#60316nicholasmarais1158 merged 7 commits intomasterfrom
tctl bots instances ls|show#60316nicholasmarais1158 merged 7 commits intomasterfrom
Conversation
boxofrad
approved these changes
Oct 17, 2025
| // services provided. Priority; unhealthy, unspecified, initializing, healthy | ||
| func aggregateServiceHealth(services []*machineidv1pb.BotInstanceServiceHealth) machineidv1pb.BotInstanceHealthStatus { | ||
| if services != nil { | ||
| hasUnhealthy := slices.ContainsFunc(services, func(service *machineidv1pb.BotInstanceServiceHealth) bool { |
Contributor
There was a problem hiding this comment.
This is actually a lot more nuanced than what tbot's /readyz endpoint does - which just treats the presence of any non-healthy status as the entire bot being unhealthy 😄
Contributor
Author
There was a problem hiding this comment.
Is it safe to leave it how it is, or should I keep the logic in-sync?
…-health # Conflicts: # tool/tctl/common/bots_command.go # tool/tctl/common/bots_command_test.go
strideynet
approved these changes
Oct 20, 2025
Contributor
|
@nicholasmarais1158 See the table below for backport results.
|
nicholasmarais1158
added a commit
that referenced
this pull request
Oct 24, 2025
* Add `service_health` to bot instance protos * Add aggregated service health to `show` * Add services section to `show` * Add health status column to `ls` * Extra tabs are not welcome * Handle zero services when aggregating health status # Conflicts: # tool/tctl/common/bots_command.go
nicholasmarais1158
added a commit
that referenced
this pull request
Oct 29, 2025
* Add `service_health` to bot instance protos * Add aggregated service health to `show` * Add services section to `show` * Add health status column to `ls` * Extra tabs are not welcome * Handle zero services when aggregating health status # Conflicts: # tool/tctl/common/bots_command.go
14 tasks
nicholasmarais1158
added a commit
that referenced
this pull request
Oct 30, 2025
* Add `service_health` to bot instance protos * Add aggregated service health to `show` * Add services section to `show` * Add health status column to `ls` * Extra tabs are not welcome * Handle zero services when aggregating health status # Conflicts: # tool/tctl/common/bots_command.go
mmcallister
pushed a commit
that referenced
this pull request
Nov 6, 2025
* Add `service_health` to bot instance protos * Add aggregated service health to `show` * Add services section to `show` * Add health status column to `ls` * Extra tabs are not welcome * Handle zero services when aggregating health status
github-merge-queue bot
pushed a commit
that referenced
this pull request
Nov 13, 2025
* docs(rfd): Bot Instances at Scale (RFD0222) (#57888) * docs(rfd): Bot Instance at Scale (RFD0222) * Add Notion link * First draft for review * fix: Spell check * Fix protobuf code block * Make the document structure slightly clearer * Add a section on data aggregation * Add missing end of code block * Add a bit more context on bots and instances * Add plan for `tctl` * Add an explanation for each UX example * Rename semver functions * Remove activity visualisation and required complexity * cspell * Pre-populate isn't the correct term * Minor tweak * Test plan additions * Remove reference to activity visualization * Clarify use of pagination in `tctl bots instances ls` and remove filter summary * Refine predicate language functions * Restructure protos * Refine plans for resource storage and quantities * Add webapi support for config, health and notices * Expand backwards compatibility * cspell * State `tbot` config size limit * Reduce config max size to 32Kb * Add failsafe env var * Reduce max notices to 10 * Explain updating related record expiry in-line with the instance * Revert extracting service health records * Move notices to earlier in the delivery plan * Remove notices & config and expand the why and what * Expand aggregate data and metrics, and rearrange sections * Fix `version.between` snippet * Updated approach to calculating bot instance counts * Document `newer_than` predicate language function --------- Co-authored-by: Dan Upton <daniel.upton@goteleport.com> * feat: List bot instances version and hostname sort (#59263) * Add version and hostname indexes to cache * Add `ListBotInstancesV2` rpc and use request options * Add v2 bot instance list endpoint * Use v2 endpoint in web UI * Pass signal through to support aborting requests * Fix comment typo * Rename util func * Deprecate `ListBotInstances` rpc * Encode hostname in cache key * Address pre-release sorting in version numbers * Rename bot instance cache utils * Fix lint deprecation warnings * Extract filter fields to message * Replace `fmt.Sprintf("%06d", ...)` * Update invalid sort field error * Fallback to v1 endpoint if possible * Use `strcase` for case-insensitive compare * Backend results are filtered by bot name so no need to re-filter in `MatchBotInstance` * Revert "Replace `fmt.Sprintf("%06d", ...)`" This reverts commit 2fbd797. * feat: Bot instances advanced filter (#59374) * Add version and hostname indexes to cache * Add `ListBotInstancesV2` rpc and use request options * Add v2 bot instance list endpoint * Use v2 endpoint in web UI * Pass signal through to support aborting requests * Fix comment typo * Rename util func * Add expression parser * Contribute `to_string` function to default parser * Add API support for `query` filter * Fix `SearchPanel` submit with advanced toggle * Add advanced search to web UI * Deprecate `ListBotInstances` rpc * Encode hostname in cache key * Address pre-release sorting in version numbers * Rename bot instance cache utils * Fix lint deprecation warnings * Extract filter fields to message * Replace `fmt.Sprintf("%06d", ...)` * Update invalid sort field error * Fallback to v1 endpoint if possible * Use `strcase` for case-insensitive compare * Backend results are filtered by bot name so no need to re-filter in `MatchBotInstance` * Use `t.Context()` * Remove expression methods * Remove unnecessary fallback comments * Return early if only bot name filter is required (backend only) * Fix lint * replace `to_string` with `equals` (version type only) * Fix comment * Remove unnecessary `to_string` tests * Switch to a true equals function # Conflicts: # lib/cache/bot_instance.go # web/packages/teleport/src/BotInstances/List/BotInstancesList.tsx # Conflicts: # web/packages/teleport/src/BotInstances/List/BotInstancesList.tsx * fix: Rename bot instance version expression functions (#59819) * feat(webui): New bot instances experience (#59655) * Make `disableSearch` props optional for SearchPanel component * Add a shared mock for TextEditor * Make instance items selectable and include bot name * Remove old bot instance details page * Add new bot instances UI * Add stories * Add and amend tests * Switch to arrow function * Remove `null` from item selected callback * Fix info guide wording # Conflicts: # web/packages/teleport/src/BotInstances/Details/BotInstanceDetails.test.tsx # web/packages/teleport/src/BotInstances/Details/BotInstanceDetails.tsx # web/packages/teleport/src/BotInstances/List/BotInstancesList.tsx * feat: Bot instances design review (#59897) * Alternate sort menu icons * Titles and close button * Yaml background colour * Spacing * Keyboard selectable list items * Fix selected list item padding * Default scroll bars for list * Clarify delete bot messaging * Simplify `onClick` * Use `FlexProps` type * Revert "Alternate sort menu icons" This reverts commit 4212dcd. * feat: Add filtering and sort to `tctl bots instances ls` (#60273) * Fix missing `--format` flag * Use v2 rpc * Add `--search` flag * Add `--query` flag * Add `--sort-index` and `--sort-order` flags * Remove `generation` and add `version` fields to output * Allow enabling the auth cache for the test process * Add list bot instances tests * Sync join method access logic between tctl and web * Access `authentication.JoinMethod` safely * Unhide `--format` flag * Simplify version header label * Fallback to v1 ListBotInstances * Refactor to remove use of `authclient.ClientI` * A way better fallback implementation 🙌 * typo 🙄 * Refactor to single interface # Conflicts: # tool/tctl/common/bots_command.go * docs: Add filter, sort and format flags to `tctl bots instances ls` reference (#60508) * docs: Add filter, sort and format fields to `tctl bots instances ls` reference * Using consistent capitalization Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com> --------- Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com> * feat: Add service health to `tctl bots instances ls|show` (#60316) * Add `service_health` to bot instance protos * Add aggregated service health to `show` * Add services section to `show` * Add health status column to `ls` * Extra tabs are not welcome * Handle zero services when aggregating health status # Conflicts: # tool/tctl/common/bots_command.go * MWI: Add `/webapi/.../machine-id/bot-instance/metrics` endpoint (#59896) * Add `autoupdate_bot_instance_report` to the editor role preset * Add `/webapi/.../machine-id/bot-instance/metrics` endpoint * Add missing error check in test * Better error message when metrics aren't ready * Allow users with `bot_instance:list` to read the `autoupdate_bot_instance_report` * Move update timestamp onto upgrade statuses object * Fix predicate language function names * Remove erroneous comment * Fix tests * Add `refresh_after_seconds` to metrics response * Return an empty `upgrade_statuses` if there is no report * Replace `exact_version` helper with simple `==` operator * Use `trace.Aggregate` to return both auth errors * feat: Bot instance upgrade status dashboard (#60019) * Add plumbing for new metrics endpoint * Align version compatibility logic * Fix mocked responses in stories * Add new dashboard component * Wire-in dashboard component * Fix lint * Explain dynamic `refetchInterval` * docs: `onFilterSelected` * Use typography components from design package * Fix `onFilterSelected` naming inconsistencies * A better nbsp * Remove "control plane" terminology * Refactor `GetBotInstanceMetricsResponse` type * Handle out-of-date proxy * Make instance list messaging filter aware * Update chart title to "version compatibility" * Keep "Last updated x minutes ago" label current * Oops, forgot to update the test * Remove unused `TitleText` * Change dashboard title to "insights" * Version compatibility design changes * Fix tests after copy change, oops * feat: Bot instance service health (#60133) * Add tabs to instance details * Add `kind` to bot instance heartbeat proto * Extend `GetBotInstanceResponse` type * Add `InfoTab` component for Overview tab * Add `HealthTab` component for Services tab * Wire-up tabs content * Use `join_attrs.meta` for join token fields * Fix links style * Fix handling of unspecified health status * Fix tab spacing * Remove tab tooltips * Replace service item background * Add zero services story * Support tctl instance kind * Fix styled links * Fix test * Fix bot instances story * feat: Allow instances to be selectable from bot details (#60717) * Make instance items selectable from bot details * Add test * Fix mocked calls in stories * fix: Bot instance health status dot inconsistency (#60786) --------- Co-authored-by: Dan Upton <daniel.upton@goteleport.com> Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>
nicholasmarais1158
added a commit
that referenced
this pull request
Nov 18, 2025
* Add `service_health` to bot instance protos * Add aggregated service health to `show` * Add services section to `show` * Add health status column to `ls` * Extra tabs are not welcome * Handle zero services when aggregating health status # Conflicts: # tool/tctl/common/bots_command.go
14 tasks
nicholasmarais1158
added a commit
that referenced
this pull request
Nov 18, 2025
* Add `service_health` to bot instance protos * Add aggregated service health to `show` * Add services section to `show` * Add health status column to `ls` * Extra tabs are not welcome * Handle zero services when aggregating health status # Conflicts: # tool/tctl/common/bots_command.go
mmcallister
pushed a commit
that referenced
this pull request
Nov 19, 2025
* Add `service_health` to bot instance protos * Add aggregated service health to `show` * Add services section to `show` * Add health status column to `ls` * Extra tabs are not welcome * Handle zero services when aggregating health status
mmcallister
pushed a commit
that referenced
this pull request
Nov 20, 2025
* Add `service_health` to bot instance protos * Add aggregated service health to `show` * Add services section to `show` * Add health status column to `ls` * Extra tabs are not welcome * Handle zero services when aggregating health status
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This change adds service health to the outputs of
tctl bots instance lsandtctl bot instance show. An aggregated health status is used for the list of instances as well as the top-level status when showing an individual instance. The logic prioritises the least healthy status in this order; unhealthy, unknown, initializing and healthy.Changelog: Added service health to the output
tctl bots instances lsandtctl bot instance showcommandsDepends on: #60093 (for service health protos)
Updates: #55926
Changes
lsandshowcommandsshowoutputDemo
A mix of services with different statuses

Zero services reported for the instance

List of instances with aggregate health status

Reviewer notes
Here's a script to insert a bunch of bot instances to make testing easier; bot_instances.sql
The bots for these instances wont exist, but that not a problem within the scope of this change.
Be sure to restart your cluster afterwards - the cache will not be notified of the changes.
Delete all instances afterwards to tidy up, if you like;