Skip to content

Conversation

@smklein
Copy link
Collaborator

@smklein smklein commented Dec 3, 2025

Built on top of #9254

This PR instruments support bundle collection to track timing and status for each collection step. Each step now records its start time, end time, name, and final status (ok/skipped/failed). The report includes all this information, which may be displayed in omdb.

Steps that are intentionally not executed return CollectionStepOutput::Skipped rather than silently succeeding, and failures are explicitly tracked. This hopefully makes it easier to diagnose which parts of bundle collection are slow or failing, and provides better observability into the collection process

@smklein smklein changed the title Improve the support bundle report on a step-by-step basis [support bundle] Improve the report on a step-by-step basis Dec 3, 2025
@smklein smklein changed the title [support bundle] Improve the report on a step-by-step basis [support bundle] Collect 'step' timing information, names generically Dec 3, 2025
@smklein smklein marked this pull request as ready for review December 3, 2025 21:22
@smklein smklein requested review from hawkw and wfchandler December 3, 2025 21:28
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

.unwrap_or(Duration::from_millis(0));
StepRow {
step_name: step.name,
start_time: step.start.to_rfc3339(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take it or leave it: we have a helper for doing this, so you _ could_ just use a DateTime as the struct field and add the attribute to use that. Not important either way, this is fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, left as "DateTime" in row struct

step_name: step.name,
start_time: step.start.to_rfc3339(),
duration: format!("{:.3}s", duration.as_secs_f64()),
status: step.status.to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since status is Display, we could just make the status field that type instead of ToStringing it...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use stronger type here

warn!(
collection.log,
"Step failed";
"name" => &self.name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take it or leave it: perhaps this ought to be

Suggested change
"name" => &self.name,
"step" => &self.name,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

InlineErrorChain::new(err.as_ref()),
);
})
debug!(collection.log, "Running step"; "name" => &step.name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take it or leave it: perhaps this ought to be

Suggested change
debug!(collection.log, "Running step"; "name" => &step.name);
debug!(collection.log, "Running step"; "step" => &step.name);

as it seems a bit less likely to include totally unrelated logs when filtering looker output or similar based on a particular step?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

)
.await?;
return Ok(CollectionStepOutput::None);
bail!("Could not contact sled");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i realize that this was already not the case, but it kinda feels like the underlying error from the client ought to be formatted here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good; I overhauled the error handling here to be more explicit

Base automatically changed from bundle-filter to main December 4, 2025 23:51
@smklein smklein enabled auto-merge (squash) December 5, 2025 00:34
@smklein smklein merged commit b7ba37f into main Dec 5, 2025
16 checks passed
@smklein smklein deleted the better-report branch December 5, 2025 04:30
smklein added a commit that referenced this pull request Dec 8, 2025
…re (#9466)

Built on top of #9463

This PR simplifies the support bundle collection report by removing the
redundant listed_in_service_sleds and listed_sps boolean fields from
`SupportBundleCollectionReport`. These fields were tracking whether
certain operations succeeded, but that information is now captured by
the per-step status tracking added in #9463.

The PR also consolidates `CollectionStepOutput::SpawnSleds` and
`CollectionStepOutput::SavingSpDumps` into the generic
`CollectionStepOutput::Spawn` variant, removing artificial distinctions
that only existed to set those boolean flags.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants