Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dokka spends 35%+ of its time in GC on kotlinx.coroutines #2729

Closed
qwwdfsad opened this issue Oct 26, 2022 · 2 comments
Closed

Dokka spends 35%+ of its time in GC on kotlinx.coroutines #2729

qwwdfsad opened this issue Oct 26, 2022 · 2 comments
Labels
perf Performance related

Comments

@qwwdfsad
Copy link
Contributor

qwwdfsad commented Oct 26, 2022

Hi, I did a short profiling session of Dokka in kotlinx.coroutines.
The methodology was pretty straightforward: warmup Gradle daemon, run ./gradlew cleanDokkaHtmlMultiModule cleanDokkaHtmlPartial dokkaHtmlMultiModule, profile it with async-profiler in CPU/alloc modes, the corresponding flame graphs attached.

The root cause is quite straightforward -- for text-based HTML blocks Dokka invokes parseWithNormalisedSpaces, which is harmless in itself, but under the hood, it invokes Jsoup-parser unconditionally, which is not only a slow operation by itself, but also allocates a 64K temporary buffer for each text element.

PR #2730 invokes Jsoup conditionally by optimistically looking for & in the text first and applying Jsoup only when necessary (also, it's probably worth doing it manually anyway, but it's beyond the scope of my change).

It's quite hard to measure the impact of the change on coroutines because there are a lot of other tasks happening (ktor is probably a better candidate to test against, GC there takes 90% of the CPU time).

My numbers are the following:

  • No outstanding char[] allocations in the profile (new hotspots are identified)
  • 10-15% less GC in the profile (35% -> ~20-25% of all execution time)
  • In no-daemon mode, it saves around ~15% of CPU time and peak CPU consumption by ~200% (on 16 core OS X)

dokka
dokka2

@qwwdfsad
Copy link
Contributor Author

The load average is pretty much dropped from 6 to 5.
It probably should be tested on a less performant machine or with cores restriction (which OS X doesn't have, but Linux has via taskset)

@vmishenev vmishenev added the perf Performance related label Dec 14, 2022
qwwdfsad added a commit that referenced this issue Mar 8, 2023
…nally invoked for each HTML text element (#2730)

Addresses #2729
@qwwdfsad
Copy link
Contributor Author

qwwdfsad commented Mar 8, 2023

Closing this one as it is superseded by #2903

@qwwdfsad qwwdfsad closed this as completed Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Performance related
Projects
None yet
Development

No branches or pull requests

2 participants