Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core/bench): add fs, monoiofs and compfs benchmark #5095

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

NKID00
Copy link
Contributor

@NKID00 NKID00 commented Sep 4, 2024

Part of #4552.

This PR introduces a benchmark that compares the performance of OpenDAL services fs, monoiofs (and compfs, but unfortunately commented out since it is still a work in progress and did not finish the benchmark).

Concurrent benchmarks uses .chunk(size).concurrent(parallel) rather than polling several independent io tasks (which is the way bench/ops benchmarks). Not sure which one simulates real world better.

Full benchmark result is as follows. I'm going to write a progress report of monoiofs along with a brief analysis of the result on the mailing list, so stay tuned. 😋

Benchmark result
read 4.00 KiB/fs        time:   [27.531 µs 27.614 µs 27.710 µs]
                        thrpt:  [140.97 MiB/s 141.46 MiB/s 141.89 MiB/s]
read 4.00 KiB/monoiofs  time:   [30.880 µs 32.507 µs 34.392 µs]
                        thrpt:  [113.58 MiB/s 120.17 MiB/s 126.50 MiB/s]
Found 20 outliers among 100 measurements (20.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  6 (6.00%) high mild
  9 (9.00%) high severe

read 256 KiB/fs         time:   [64.932 µs 69.819 µs 73.971 µs]
                        thrpt:  [3.3005 GiB/s 3.4968 GiB/s 3.7599 GiB/s]
read 256 KiB/monoiofs   time:   [49.171 µs 50.267 µs 51.240 µs]
                        thrpt:  [4.7646 GiB/s 4.8568 GiB/s 4.9651 GiB/s]

read 4.00 MiB/fs        time:   [985.53 µs 986.56 µs 987.66 µs]
                        thrpt:  [3.9551 GiB/s 3.9595 GiB/s 3.9636 GiB/s]
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
read 4.00 MiB/monoiofs  time:   [230.23 µs 232.47 µs 234.54 µs]
                        thrpt:  [16.655 GiB/s 16.803 GiB/s 16.967 GiB/s]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) low severe

Benchmarking read 16.0 MiB/fs: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 12.5s, enable flat sampling, or reduce sample count to 60.
read 16.0 MiB/fs        time:   [2.4754 ms 2.4769 ms 2.4785 ms]
                        thrpt:  [6.3043 GiB/s 6.3084 GiB/s 6.3120 GiB/s]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
read 16.0 MiB/monoiofs  time:   [1.7233 ms 1.7337 ms 1.7459 ms]
                        thrpt:  [8.9495 GiB/s 9.0123 GiB/s 9.0668 GiB/s]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

read concurrent 16x4.00 KiB/fs
                        time:   [298.61 µs 306.31 µs 315.15 µs]
                        thrpt:  [198.32 MiB/s 204.04 MiB/s 209.31 MiB/s]
read concurrent 16x4.00 KiB/monoiofs
                        time:   [234.62 µs 244.44 µs 253.48 µs]
                        thrpt:  [246.57 MiB/s 255.69 MiB/s 266.39 MiB/s]

read concurrent 16x256 KiB/fs
                        time:   [399.76 µs 404.79 µs 409.38 µs]
                        thrpt:  [9.5420 GiB/s 9.6502 GiB/s 9.7715 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
read concurrent 16x256 KiB/monoiofs
                        time:   [327.93 µs 339.69 µs 351.85 µs]
                        thrpt:  [11.102 GiB/s 11.500 GiB/s 11.912 GiB/s]

read concurrent 16x4.00 MiB/fs
                        time:   [6.9919 ms 7.0599 ms 7.1296 ms]
                        thrpt:  [8.7663 GiB/s 8.8528 GiB/s 8.9389 GiB/s]
read concurrent 16x4.00 MiB/monoiofs
                        time:   [8.7125 ms 8.7150 ms 8.7180 ms]
                        thrpt:  [7.1691 GiB/s 7.1715 GiB/s 7.1736 GiB/s]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

read concurrent 16x16.0 MiB/fs
                        time:   [24.318 ms 24.443 ms 24.569 ms]
                        thrpt:  [10.175 GiB/s 10.228 GiB/s 10.280 GiB/s]
read concurrent 16x16.0 MiB/monoiofs
                        time:   [33.131 ms 33.149 ms 33.168 ms]
                        thrpt:  [7.5375 GiB/s 7.5417 GiB/s 7.5458 GiB/s]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

write 4.00 KiB/fs       time:   [32.943 µs 33.006 µs 33.073 µs]
                        thrpt:  [118.11 MiB/s 118.35 MiB/s 118.58 MiB/s]
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) high mild
  13 (13.00%) high severe
write 4.00 KiB/monoiofs time:   [62.287 µs 66.421 µs 70.833 µs]
                        thrpt:  [55.148 MiB/s 58.810 MiB/s 62.713 MiB/s]
Found 14 outliers among 100 measurements (14.00%)
  12 (12.00%) low severe
  2 (2.00%) high severe

write 256 KiB/fs        time:   [122.46 µs 129.67 µs 137.91 µs]
                        thrpt:  [1.7702 GiB/s 1.8828 GiB/s 1.9936 GiB/s]
Found 21 outliers among 100 measurements (21.00%)
  20 (20.00%) low severe
  1 (1.00%) high severe
write 256 KiB/monoiofs  time:   [125.98 µs 126.22 µs 126.46 µs]
                        thrpt:  [1.9306 GiB/s 1.9343 GiB/s 1.9380 GiB/s]

write 4.00 MiB/fs       time:   [1.8901 ms 1.9275 ms 1.9553 ms]
                        thrpt:  [1.9978 GiB/s 2.0265 GiB/s 2.0667 GiB/s]
write 4.00 MiB/monoiofs time:   [1.1394 ms 1.1405 ms 1.1416 ms]
                        thrpt:  [3.4218 GiB/s 3.4251 GiB/s 3.4283 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

write 16.0 MiB/fs       time:   [5.0289 ms 5.0352 ms 5.0416 ms]
                        thrpt:  [3.0992 GiB/s 3.1031 GiB/s 3.1071 GiB/s]
write 16.0 MiB/monoiofs time:   [5.3379 ms 5.3411 ms 5.3444 ms]
                        thrpt:  [2.9236 GiB/s 2.9254 GiB/s 2.9272 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

write concurrent 16x4.00 KiB/fs
                        time:   [150.81 µs 150.93 µs 151.05 µs]
                        thrpt:  [413.78 MiB/s 414.09 MiB/s 414.43 MiB/s]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
write concurrent 16x4.00 KiB/monoiofs
                        time:   [324.57 µs 335.41 µs 346.54 µs]
                        thrpt:  [180.36 MiB/s 186.34 MiB/s 192.56 MiB/s]
Found 12 outliers among 100 measurements (12.00%)
  10 (10.00%) low severe
  1 (1.00%) high mild
  1 (1.00%) high severe

write concurrent 16x256 KiB/fs
                        time:   [1.6521 ms 1.6540 ms 1.6558 ms]
                        thrpt:  [2.3591 GiB/s 2.3617 GiB/s 2.3644 GiB/s]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high severe
write concurrent 16x256 KiB/monoiofs
                        time:   [1.3098 ms 1.3107 ms 1.3118 ms]
                        thrpt:  [2.9779 GiB/s 2.9802 GiB/s 2.9824 GiB/s]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

write concurrent 16x4.00 MiB/fs
                        time:   [26.624 ms 26.687 ms 26.750 ms]
                        thrpt:  [2.3364 GiB/s 2.3420 GiB/s 2.3475 GiB/s]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
write concurrent 16x4.00 MiB/monoiofs
                        time:   [23.338 ms 23.354 ms 23.369 ms]
                        thrpt:  [2.6745 GiB/s 2.6762 GiB/s 2.6780 GiB/s]
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) low mild

write concurrent 16x16.0 MiB/fs
                        time:   [95.318 ms 95.462 ms 95.609 ms]
                        thrpt:  [2.6148 GiB/s 2.6188 GiB/s 2.6228 GiB/s]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
write concurrent 16x16.0 MiB/monoiofs
                        time:   [94.194 ms 94.233 ms 94.275 ms]
                        thrpt:  [2.6518 GiB/s 2.6530 GiB/s 2.6541 GiB/s]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

@Xuanwo
Copy link
Member

Xuanwo commented Sep 4, 2024

Thank you very much for your work! I have noticed that Monoio performs well on single reads but not as effectively on concurrent ones. Could this be related to our thread-per-core design? Are there any plans to improve it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this benchmark fs_alike? By the way, why do we need separate benchmark suites?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark puts services in a group, criterion can generate summary graphs of benchmarks that shows difference between them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of designing a way to benchmark multiple services, like OPENDAL_TEST=fs,monoiofs,compfs. But it looks like requires some refactor to our current test infrastructure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of designing a way to benchmark multiple services, like OPENDAL_TEST=fs,monoiofs,compfs. But it looks like requires some refactor to our current test infrastructure.

Yep, this makes more sense to me.

@@ -0,0 +1,13 @@
# OpenDAL services fs vs. monoiofs vs. compfs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could support more FS interfaces.


This benchmark compares the performance of OpenDAL services fs, monoiofs and compfs.

## Goal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this isn't the benchmark goal. We shouldn't make assumptions about it. In fact, monoiofs can't outperform tokio-based fs in every case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, monoiofs can't outperform tokio-based fs in every case.

Actually, monoiofs is expected to be slower since it is currently single threaded. I'm trying to express the final goal we wish to achieve. Did I misunderstood the point of this section?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I misunderstood the point of this section?

Our goal is to measure the performance of various fs services so users can choose for themselves, rather than trying to prove that monoiofs is faster than tokio fs. Please note that OpenDAL does not declare a winner; we simply offer options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out that!

@NKID00
Copy link
Contributor Author

NKID00 commented Sep 4, 2024

Thank you very much for your work! I have noticed that Monoio performs well on single reads but not as effectively on concurrent ones. Could this be related to our thread-per-core design? Are there any plans to improve it?

Monoiofs is currently single threaded (let worker_threads = 1;) wile fs runs on multiple threads. There seems to be some deadlock bug with multiple worker thread that I'm investigating. The performance should be improved after fixing the bug, enabling worker thread pool and binding them to cpu cores.

@NKID00
Copy link
Contributor Author

NKID00 commented Sep 4, 2024

Also, PositionWrite is not implemented for monoiofs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants