Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(timeline,cobuilds): cobuilt operations should reflect the cobuild time and not cache restore time #4680

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

aramissennyeydd
Copy link
Contributor

Summary

OperationExecutionRecord is currently only tracking cache restore time for cobuilt operations that are marked as RemoteOperating and are then restored from cache. This is a confusing UX and causes developers + maintainers to have to search across multiple machine logs to determine operation run times. This PR adjusts that to use the nonCachedDurationMs from the state file. This felt like something that #3649 was intending to do or is in a similar vein.

Before:

Screenshot 2024-05-07 at 12 24 12 PM Screenshot 2024-05-07 at 12 24 01 PM

After:

Screenshot 2024-05-07 at 12 23 31 PM Screenshot 2024-05-07 at 12 23 18 PM

Details

This overwrites the existing stopwatch for operations that were not cobuilt on the specific agent. It adds a new beforeResult method to OperationExecutionRecord#executeAsync to handle the afterExecuteOperation hook instead of passing that work into the onResult method which ended up receiving running stopwatches and had other assumptions of state (collated terminal not closed). This does add possibly breaking behavior where cobuilds were showing cache times which might be useful, but I don't think those are as useful as having the cobuild time available across all agents. That also has the secondary effect of making the timeline views of all agents much more cohesive - though as we see above they're not the exact same across agents.

How it was tested

I tested this locally using the build-tests/rush-redis-cobuild-plugin-integration-test sandbox repo with 2 runners, confirmed that timings generally matched across the instances. There's about 10ms (rounded up) of difference between the agents, but this already seems much more useful.

Impacted documentation

None

@aramissennyeydd
Copy link
Contributor Author

Small adjustment to improve the dependency view, previously they were getting smushed into the same time slot, when they should be sequential.

Screenshot 2024-05-10 at 10 07 36 AM

@aramissennyeydd
Copy link
Contributor Author

aramissennyeydd commented May 10, 2024

And a few screen grabs from our internal instance showing the difference from the small adjustment,

Before

Screenshot 2024-05-10 at 10 24 55 AM Screenshot 2024-05-10 at 10 25 05 AM

After

Screenshot 2024-05-10 at 10 29 49 AM Screenshot 2024-05-10 at 10 29 55 AM

@UberMouse
Copy link
Contributor

Any reason this hasn't been merged? It's a real problem trying to work out performance of co-build enabled builds 😬

@aramissennyeydd
Copy link
Contributor Author

@UberMouse Lost track of this, I was waiting on #4755 to get merged so we didn't cause more telemetry skew with this.

@iclanton
Copy link
Member

@aramissennyeydd - this should be unblocked now.

@aramissennyeydd aramissennyeydd force-pushed the sennyeya/operation-timings branch from 77ffeee to 44b9ff2 Compare December 26, 2024 18:15
@aramissennyeydd
Copy link
Contributor Author

@iclanton This should be good for another round of 👀 🙏

Comment on lines +323 to +324
// Make sure that the stopwatch is stopped before reporting the result, otherwise endTime is undefined.
this.stopwatch.stop();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the stopwatch time include the before + after timings? The big issue right now is that when we go to report the times, the stopwatch hasn't ended so endTime is undefined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why hasn't it ended? Are we bypassing the code path that would tell it to stop()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been calling .stop in the finally block, which runs after onResult, this wasn't an issue really until now since the toString and duration use the current time as a fallback 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants