Skip to content

MWI: Generate AutoUpdateBotInstanceReport resource#59738

Merged
boxofrad merged 16 commits intomasterfrom
boxofrad/bot-version-report
Oct 7, 2025
Merged

MWI: Generate AutoUpdateBotInstanceReport resource#59738
boxofrad merged 16 commits intomasterfrom
boxofrad/bot-version-report

Conversation

@boxofrad
Copy link
Copy Markdown
Contributor

@boxofrad boxofrad commented Sep 30, 2025

Supersedes #59122.

Adds a new AutoUpdateBotInstanceReport resource to track the number of connected bot instances per version.

It's based on the AutoUpdateAgentReport, but is a true singleton (i.e. generated by a single leader-elected auth server) rather than one report per auth server, as it's calculated from cluster-wide state rather than the inventory.

I'll update RFD 222 with this approach shortly, just wanted to discover what was involved first.

Comment on lines +87 to +93
// Identifies the external updater process.
string external_updater = 10;
// Identifies the external updated version. Empty if no updater is configured.
string external_updater_version = 11;
// Information provided by the external updater, including the update group
// and updater status.
types.UpdaterV2Info updater_info = 12;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hugoShaka I've added these fields (based the inventory hello message) but am not currently populating them.

I took a look at re-using agent.ReadHelloUpdaterInfo but I wasn't sure what we should do about the DBPID, etc?

@boxofrad boxofrad force-pushed the boxofrad/bot-version-report branch 3 times, most recently from 417aae5 to e81e36b Compare September 30, 2025 12:29
Comment thread api/proto/teleport/autoupdate/v1/autoupdate.proto Outdated
Comment thread api/proto/teleport/autoupdate/v1/autoupdate.proto Outdated
@boxofrad boxofrad force-pushed the boxofrad/bot-version-report branch from 708d1c1 to e19e015 Compare September 30, 2025 15:08
@nicholasmarais1158 nicholasmarais1158 changed the title MWI: Generate AutoUpdateBotReport resource MWI: Generate AutoUpdateBotInstanceReport resource Sep 30, 2025
Comment thread lib/auth/machineid/machineidv1/auto_update_version_reporter_test.go Outdated
Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you, in a future PR, make sure that tctl get, and tctl delete support autoupdate_bot_report ?

Comment on lines +247 to +252
// Take the version information from the latest heartbeat.
heartbeats := inst.GetStatus().GetLatestHeartbeats()
if len(heartbeats) == 0 {
continue
}
latest := heartbeats[len(heartbeats)-1]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do the bot instance expire? Some backends don't seem to strictly respect this, is it possible to check if the instance should be ignored here as well? This would lower the chances of counting stale data.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They expire when their backing "credentials" (certificates) expire. I've added a defensive check to filter out any expired instances ✅

Comment thread lib/services/autoupdates.go Outdated
Comment on lines +102 to +104

// SetAutoUpdateBotInstanceReport overwrites the singleton auto-update bot report.
SetAutoUpdateBotInstanceReport(ctx context.Context, spec *autoupdate.AutoUpdateBotInstanceReportSpec) error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for singleton resources we want to use Get/Create/Update/Delete verbs as per RFD 153. This makes sure we stay consistent and always have a way to do conditional updates and delete the singleton.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. For now, I've renamed this method to UpsertAutoUpdateBotInstanceReport (because we want the unconditional overwrite semantics) and swapped the spec parameter for the full report object.

I've left the getter as-in (i.e. only accepting the context, not the name) to match AuthPreference, but if you think it's better to accept the name I'd be happy to do so in a later PR.

Comment thread lib/services/local/autoupdate.go Outdated

// SetAutoUpdateBotInstanceReport overwrites the singleton auto-update bot report.
func (s *AutoUpdateService) SetAutoUpdateBotInstanceReport(ctx context.Context, spec *autoupdate.AutoUpdateBotInstanceReportSpec) error {
_, err := s.botInstanceReport.UpsertResource(ctx, &autoupdate.AutoUpdateBotInstanceReport{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a validation to cap the number of groups to something reasonable before writing (e.g. 20)? The risk is that we end up with resources exceeding the grpc max size and that the proxy won't be able to stream it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5fb4484

@boxofrad boxofrad requested a review from hugoShaka October 6, 2025 16:04
@boxofrad
Copy link
Copy Markdown
Contributor Author

boxofrad commented Oct 6, 2025

@hugoShaka Sure thing! Happy to add those tctl resource mappings in a follow-up PR.

Comment on lines +352 to 359
// UpsertAutoUpdateBotInstanceReport creates or updates the bot instance report.
func (s *AutoUpdateService) UpsertAutoUpdateBotInstanceReport(ctx context.Context, report *autoupdate.AutoUpdateBotInstanceReport) (*autoupdate.AutoUpdateBotInstanceReport, error) {
if err := update.ValidateAutoUpdateBotInstanceReport(report); err != nil {
return nil, trace.Wrap(err)
}
report, err := s.botInstanceReport.UpsertResource(ctx, report)
return report, trace.Wrap(err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a Delete so we have a way to unblock things without backend surgery if this were to cause cache/backend issue.

@boxofrad boxofrad added this pull request to the merge queue Oct 7, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 7, 2025
@boxofrad boxofrad added this pull request to the merge queue Oct 7, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 7, 2025
@boxofrad boxofrad added this pull request to the merge queue Oct 7, 2025
Merged via the queue into master with commit 165ec1f Oct 7, 2025
43 checks passed
@boxofrad boxofrad deleted the boxofrad/bot-version-report branch October 7, 2025 14:54
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@boxofrad See the table below for backport results.

Branch Result
branch/v17 Failed
branch/v18 Failed

github-merge-queue bot pushed a commit that referenced this pull request Oct 15, 2025
* MWI: Generate `AutoUpdateBotInstanceReport` resource (#59738)

* MWI: Add `tctl` get and delete mappings for `AutoUpdateBotInstanceReport` (#60017)

* MWI: Add `teleport_bot_instances` metric (#59774)

* MWI: Log on `AutoUpdateBotInstanceReport` generation failure (#60191)

* Fix passing lock by value

* Allow `machineid.AutoUpdateVersionReporter` to shut down correctly (#60219)
github-merge-queue bot pushed a commit that referenced this pull request Oct 15, 2025
* MWI: Generate `AutoUpdateBotInstanceReport` resource (#59738)

* MWI: Add `tctl` get and delete mappings for `AutoUpdateBotInstanceReport` (#60017)

* MWI: Add `teleport_bot_instances` metric (#59774)

* MWI: Log on `AutoUpdateBotInstanceReport` generation failure (#60191)

* Allow `machineid.AutoUpdateVersionReporter` to shut down correctly (#60219)
rhammonds-teleport pushed a commit that referenced this pull request Nov 6, 2025
* Add `AutoUpdateBotReport` resource definition

* Generate bot version report once per minute

* Add gRPC endpoint for reading bot report

* Add updater info to the bot heartbeat message

* Fix a couple of minor typos

* Add forgotten cache event plumbing

* Fix duplicate import

* Fix racy test

* Add missing license header

* Fill HostUUID in tests

* Rename "bot report" to "bot instance report"

* Fix formatting

* Make one of the expected values different

* Defend against expired instances being counted

* Closer align `AutoUpdateBotInstanceReport` resource with RFD 153

* Add endpoint for deleting report
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants