MWI: Generate AutoUpdateBotInstanceReport resource#59738
Conversation
| // Identifies the external updater process. | ||
| string external_updater = 10; | ||
| // Identifies the external updated version. Empty if no updater is configured. | ||
| string external_updater_version = 11; | ||
| // Information provided by the external updater, including the update group | ||
| // and updater status. | ||
| types.UpdaterV2Info updater_info = 12; |
There was a problem hiding this comment.
@hugoShaka I've added these fields (based the inventory hello message) but am not currently populating them.
I took a look at re-using agent.ReadHelloUpdaterInfo but I wasn't sure what we should do about the DBPID, etc?
417aae5 to
e81e36b
Compare
708d1c1 to
e19e015
Compare
AutoUpdateBotReport resourceAutoUpdateBotInstanceReport resource
hugoShaka
left a comment
There was a problem hiding this comment.
Can you, in a future PR, make sure that tctl get, and tctl delete support autoupdate_bot_report ?
| // Take the version information from the latest heartbeat. | ||
| heartbeats := inst.GetStatus().GetLatestHeartbeats() | ||
| if len(heartbeats) == 0 { | ||
| continue | ||
| } | ||
| latest := heartbeats[len(heartbeats)-1] |
There was a problem hiding this comment.
When do the bot instance expire? Some backends don't seem to strictly respect this, is it possible to check if the instance should be ignored here as well? This would lower the chances of counting stale data.
There was a problem hiding this comment.
They expire when their backing "credentials" (certificates) expire. I've added a defensive check to filter out any expired instances ✅
|
|
||
| // SetAutoUpdateBotInstanceReport overwrites the singleton auto-update bot report. | ||
| SetAutoUpdateBotInstanceReport(ctx context.Context, spec *autoupdate.AutoUpdateBotInstanceReportSpec) error |
There was a problem hiding this comment.
Even for singleton resources we want to use Get/Create/Update/Delete verbs as per RFD 153. This makes sure we stay consistent and always have a way to do conditional updates and delete the singleton.
There was a problem hiding this comment.
Gotcha. For now, I've renamed this method to UpsertAutoUpdateBotInstanceReport (because we want the unconditional overwrite semantics) and swapped the spec parameter for the full report object.
I've left the getter as-in (i.e. only accepting the context, not the name) to match AuthPreference, but if you think it's better to accept the name I'd be happy to do so in a later PR.
|
|
||
| // SetAutoUpdateBotInstanceReport overwrites the singleton auto-update bot report. | ||
| func (s *AutoUpdateService) SetAutoUpdateBotInstanceReport(ctx context.Context, spec *autoupdate.AutoUpdateBotInstanceReportSpec) error { | ||
| _, err := s.botInstanceReport.UpsertResource(ctx, &autoupdate.AutoUpdateBotInstanceReport{ |
There was a problem hiding this comment.
Can we add a validation to cap the number of groups to something reasonable before writing (e.g. 20)? The risk is that we end up with resources exceeding the grpc max size and that the proxy won't be able to stream it.
|
@hugoShaka Sure thing! Happy to add those |
| // UpsertAutoUpdateBotInstanceReport creates or updates the bot instance report. | ||
| func (s *AutoUpdateService) UpsertAutoUpdateBotInstanceReport(ctx context.Context, report *autoupdate.AutoUpdateBotInstanceReport) (*autoupdate.AutoUpdateBotInstanceReport, error) { | ||
| if err := update.ValidateAutoUpdateBotInstanceReport(report); err != nil { | ||
| return nil, trace.Wrap(err) | ||
| } | ||
| report, err := s.botInstanceReport.UpsertResource(ctx, report) | ||
| return report, trace.Wrap(err) | ||
| } |
There was a problem hiding this comment.
Can you also add a Delete so we have a way to unblock things without backend surgery if this were to cause cache/backend issue.
* MWI: Generate `AutoUpdateBotInstanceReport` resource (#59738) * MWI: Add `tctl` get and delete mappings for `AutoUpdateBotInstanceReport` (#60017) * MWI: Add `teleport_bot_instances` metric (#59774) * MWI: Log on `AutoUpdateBotInstanceReport` generation failure (#60191) * Fix passing lock by value * Allow `machineid.AutoUpdateVersionReporter` to shut down correctly (#60219)
* MWI: Generate `AutoUpdateBotInstanceReport` resource (#59738) * MWI: Add `tctl` get and delete mappings for `AutoUpdateBotInstanceReport` (#60017) * MWI: Add `teleport_bot_instances` metric (#59774) * MWI: Log on `AutoUpdateBotInstanceReport` generation failure (#60191) * Allow `machineid.AutoUpdateVersionReporter` to shut down correctly (#60219)
* Add `AutoUpdateBotReport` resource definition * Generate bot version report once per minute * Add gRPC endpoint for reading bot report * Add updater info to the bot heartbeat message * Fix a couple of minor typos * Add forgotten cache event plumbing * Fix duplicate import * Fix racy test * Add missing license header * Fill HostUUID in tests * Rename "bot report" to "bot instance report" * Fix formatting * Make one of the expected values different * Defend against expired instances being counted * Closer align `AutoUpdateBotInstanceReport` resource with RFD 153 * Add endpoint for deleting report
Supersedes #59122.
Adds a new
AutoUpdateBotInstanceReportresource to track the number of connected bot instances per version.It's based on the
AutoUpdateAgentReport, but is a true singleton (i.e. generated by a single leader-elected auth server) rather than one report per auth server, as it's calculated from cluster-wide state rather than the inventory.I'll update RFD 222 with this approach shortly, just wanted to discover what was involved first.