Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

[Maybe controversial?] Add diagnostics to tasks#4752

Merged
gavofyork merged 1 commit intoparitytech:masterfrom
tomaka:diagnose2
Jan 29, 2020
Merged

[Maybe controversial?] Add diagnostics to tasks#4752
gavofyork merged 1 commit intoparitytech:masterfrom
tomaka:diagnose2

Conversation

@tomaka
Copy link
Contributor

@tomaka tomaka commented Jan 28, 2020

This adds names to tasks and integrates the futures-diagnose tool into Substrate.

Usage

Start your node with the PROFILE_DIR=profiles environment variable, and futures-diagnose will create a directory named profiles that will contain a trace of all the future tasks being executed in Substrate (at least, all the tasks that got wrapped around futures-diagnose, I hope I didn't forget any).

You can then open the traces by starting Chrome and browsing to chrome://tracing. There's a load button top.

Example output:

Screenshot from 2020-01-28 18-10-08

The X axis is the time, and the Y axis is the thread number.
Each block represents a task being polled. Here we can see the the import queue monopolizes an entire thread (unsurprisingly, this is while syncing). The little green lines are networking and telemetry sockets being polled. They are normally rectangles, but they are too thin in this screenshot.

As an example of usefulness, this would have easily diagnosed the performance issue of last week, where everything started running in a single thread.

Why "[Maybe controversial?]"

To me it seems like half of the planet is integrating their own profiling solution inside Substrate, so I'm not sure whether this one is appropriate. Another option is to add names to tasks (what this PR does), but leave out the futures-diagnose tool. It can then easily be restored by tweaking the source code in case there's a performance issue.

It also uses an environment variable, which isn't great compared to a CLI option. Ideally, we should use a single runtime for everything (including the import queue), wrap this runtime around the diagnose tool, and customize it there.

I also have no idea where to document this, and this seems like a hidden undiscoverable feature.

Copy link
Contributor

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is a great idea to get more visibility into what is happening. In addition I don't think this change is very intrusive.

Have you been able to bechmark the performance impact futures-diagnose has here? On the one hand we only do this for root futures, on the other hand futures-diagnose seems to serialize all calls through a single Mutex (correct me if I am wrong).

In regards to configuration through an environment variable, what do you think of making this only configurable at compile time. With the latter the compiler can remove all the if enabled, thus this change having zero impact on performance when disabled. Downside of the compile time option is that we can't tell users to just set something at runtime to diagnose an issue but need to provide them with another binary instead.

Copy link
Contributor

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't affect performance when profiling data is not being collected, I'm all in, but it would be really nice to get a document describing how to collect and analyze the outputs.

@tomaka
Copy link
Contributor Author

tomaka commented Jan 29, 2020

On the one hand we only do this for root futures, on the other hand futures-diagnose seems to serialize all calls through a single Mutex (correct me if I am wrong).

I didn't notice any performance degradation with this tool.

It's true that this Mutex mostly comes out of laziness, as figuring out how to properly do log rotation without any locking isn't trivial.

@gavofyork gavofyork merged commit 38c5ed0 into paritytech:master Jan 29, 2020
@tomaka tomaka deleted the diagnose2 branch January 29, 2020 10:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A0-please_review Pull request needs code review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants