-
Notifications
You must be signed in to change notification settings - Fork 63
[support bundle] Collect 'step' timing information, names generically #9463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hawkw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
dev-tools/omdb/src/bin/omdb/nexus.rs
Outdated
| .unwrap_or(Duration::from_millis(0)); | ||
| StepRow { | ||
| step_name: step.name, | ||
| start_time: step.start.to_rfc3339(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take it or leave it: we have a helper for doing this, so you _ could_ just use a DateTime as the struct field and add the attribute to use that. Not important either way, this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, left as "DateTime" in row struct
dev-tools/omdb/src/bin/omdb/nexus.rs
Outdated
| step_name: step.name, | ||
| start_time: step.start.to_rfc3339(), | ||
| duration: format!("{:.3}s", duration.as_secs_f64()), | ||
| status: step.status.to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since status is Display, we could just make the status field that type instead of ToStringing it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use stronger type here
| warn!( | ||
| collection.log, | ||
| "Step failed"; | ||
| "name" => &self.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, take it or leave it: perhaps this ought to be
| "name" => &self.name, | |
| "step" => &self.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
| InlineErrorChain::new(err.as_ref()), | ||
| ); | ||
| }) | ||
| debug!(collection.log, "Running step"; "name" => &step.name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, take it or leave it: perhaps this ought to be
| debug!(collection.log, "Running step"; "name" => &step.name); | |
| debug!(collection.log, "Running step"; "step" => &step.name); |
as it seems a bit less likely to include totally unrelated logs when filtering looker output or similar based on a particular step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
| ) | ||
| .await?; | ||
| return Ok(CollectionStepOutput::None); | ||
| bail!("Could not contact sled"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i realize that this was already not the case, but it kinda feels like the underlying error from the client ought to be formatted here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; I overhauled the error handling here to be more explicit
…re (#9466) Built on top of #9463 This PR simplifies the support bundle collection report by removing the redundant listed_in_service_sleds and listed_sps boolean fields from `SupportBundleCollectionReport`. These fields were tracking whether certain operations succeeded, but that information is now captured by the per-step status tracking added in #9463. The PR also consolidates `CollectionStepOutput::SpawnSleds` and `CollectionStepOutput::SavingSpDumps` into the generic `CollectionStepOutput::Spawn` variant, removing artificial distinctions that only existed to set those boolean flags.
Built on top of #9254
This PR instruments support bundle collection to track timing and status for each collection step. Each step now records its start time, end time, name, and final status (ok/skipped/failed). The report includes all this information, which may be displayed in omdb.
Steps that are intentionally not executed return
CollectionStepOutput::Skippedrather than silently succeeding, and failures are explicitly tracked. This hopefully makes it easier to diagnose which parts of bundle collection are slow or failing, and provides better observability into the collection process