Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the blkio_delay is so different with iotop-py? #65

Open
zylthinking opened this issue Feb 27, 2024 · 4 comments
Open

Why the blkio_delay is so different with iotop-py? #65

zylthinking opened this issue Feb 27, 2024 · 4 comments

Comments

@zylthinking
Copy link

zylthinking commented Feb 27, 2024

I made a busy reading of disk in clickhouse, and notice the GRAPH[IO]▽ column in the view is quite different with the IO> column in iotop.

The main thread of clickhouse shows the GRAPH[IO] is 100% and IO> is zero.
Which I think iotop is correct.

Because the IO mainly happens in a background thread, the main thread should be idle.

I see the data is fetched from netlink instead of /proc; but the algorithm is similar.
I don't understand the result, is there something I missed that GRAPH[IO] is not equivalent to IO> ?

@zylthinking
Copy link
Author

zylthinking commented Feb 27, 2024

I found the logic, you added all threads statistics to main threads besides to thread itself.

Seems the main thread is used for filter and ordering, thus it must sum all threads data; while corrupts the outputs of main thread

@bbonev
Copy link
Collaborator

bbonev commented Feb 28, 2024

Right, maybe the current behavior is erratic. I need to double check it anyways...

The intent was to have thread data summed in process mode where only the main thread is displayed, while in thread mode each thread to show its data without summing.

In order to switch between both modes, iotop will need to keep both the main thread and the summed data. That will either make the thread list more complex (different structures) or use more RAM (doubling the data).

What do you think?

@zylthinking
Copy link
Author

zylthinking commented Feb 29, 2024

Yes, it should maintain the process level information independently.

And I found another possible problem, pidgen_cb may enumerate other threads before main thread, that results in losing of some thread's data at process level.

And, I think IO should not be summed up at process level, maybe max(IO) or average(IO) of all threads is better.

@bbonev
Copy link
Collaborator

bbonev commented Mar 1, 2024

Thanks for spotting these problems!

I will separate the information. But if there is average/max then the memory usage will definitely increase. My thinking is that memory nowadays is cheap and saving some cycles should be preferred. Calculating the possible aggregates should definitely happen during the data collection and not during the display (unless the possible views count increases way too much). I prefer to be able to switch between different 'views' without starting over with the history data.

After checking the code, I can confirm that it is possible pidgen_cb to see a thread before the main thread and then it will not sum its data into the main thread. In thread mode, the display will be correct besides that the main thread has summed data instead of its own only.

The aggregation function currently in use should have been average. Since IO is measured as time, summing the threads can go over 100% and that would later be clipped to 100%...

I think that adding a toggle to switch between max and average is a good idea. Of course that applies to process mode only.

To summarize:

Bug 1) Do not aggregate the data in the main thread [thread mode only]; keep both aggregated main thread data and its original data to enable proper switching between thread and process mode.
Bug 2) Change pidgen_cb to properly handle any order of the threads.
Bug 3) Actually do the aggregation instead of summing the data.
Feat 1) Add a toggle to use max/average as aggregate function.

@bbonev bbonev changed the title why the blkio_delay is so different with iotop? why the blkio_delay is so different with iotop-py? Mar 1, 2024
@bbonev bbonev changed the title why the blkio_delay is so different with iotop-py? Why the blkio_delay is so different with iotop-py? Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants