[Maybe controversial?] Add diagnostics to tasks#4752
[Maybe controversial?] Add diagnostics to tasks#4752gavofyork merged 1 commit intoparitytech:masterfrom
Conversation
mxinden
left a comment
There was a problem hiding this comment.
Overall I think this is a great idea to get more visibility into what is happening. In addition I don't think this change is very intrusive.
Have you been able to bechmark the performance impact futures-diagnose has here? On the one hand we only do this for root futures, on the other hand futures-diagnose seems to serialize all calls through a single Mutex (correct me if I am wrong).
In regards to configuration through an environment variable, what do you think of making this only configurable at compile time. With the latter the compiler can remove all the if enabled, thus this change having zero impact on performance when disabled. Downside of the compile time option is that we can't tell users to just set something at runtime to diagnose an issue but need to provide them with another binary instead.
tomusdrw
left a comment
There was a problem hiding this comment.
If it doesn't affect performance when profiling data is not being collected, I'm all in, but it would be really nice to get a document describing how to collect and analyze the outputs.
I didn't notice any performance degradation with this tool. It's true that this |
This adds names to tasks and integrates the
futures-diagnosetool into Substrate.Usage
Start your node with the
PROFILE_DIR=profilesenvironment variable, andfutures-diagnosewill create a directory namedprofilesthat will contain a trace of all the future tasks being executed in Substrate (at least, all the tasks that got wrapped aroundfutures-diagnose, I hope I didn't forget any).You can then open the traces by starting Chrome and browsing to
chrome://tracing. There's a load button top.Example output:
The X axis is the time, and the Y axis is the thread number.
Each block represents a task being polled. Here we can see the the import queue monopolizes an entire thread (unsurprisingly, this is while syncing). The little green lines are networking and telemetry sockets being polled. They are normally rectangles, but they are too thin in this screenshot.
As an example of usefulness, this would have easily diagnosed the performance issue of last week, where everything started running in a single thread.
Why "[Maybe controversial?]"
To me it seems like half of the planet is integrating their own profiling solution inside Substrate, so I'm not sure whether this one is appropriate. Another option is to add names to tasks (what this PR does), but leave out the
futures-diagnosetool. It can then easily be restored by tweaking the source code in case there's a performance issue.It also uses an environment variable, which isn't great compared to a CLI option. Ideally, we should use a single runtime for everything (including the import queue), wrap this runtime around the diagnose tool, and customize it there.
I also have no idea where to document this, and this seems like a hidden undiscoverable feature.