Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: disable thp to see if it helps OOMS in costfuzz #91062

Closed
wants to merge 1 commit into from

Conversation

cucaroach
Copy link
Contributor

See #90683

Release note: None

@cucaroach cucaroach requested a review from a team as a code owner November 1, 2022 14:52
@cucaroach cucaroach requested review from srosenberg and smg260 and removed request for a team November 1, 2022 14:52
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @smg260)


pkg/roachprod/install/cockroach.go line 421 at r1 (raw file):
From what I recall, jemalloc is pretty conservative wrt THP. When debugging another issue we used strace to trace allocations; only one MADV_HUGEPAGE was issued by jemalloc [1]. The more likely culprit is go's GC which according to [2] doesn't call MADV_NOHUGEPAGE to break up allocations,

The result of all this is that Linux may be keeping around huge pages transparently using up a lot more memory than intended, and reducing the effectiveness of the scavenger.

[1] #83790 (comment)
[2] golang/go#55328

@renatolabs
Copy link
Contributor

I'd also suggest we only use this setting in the costfuzz roachtest instead of enabling it in all of them.

@srosenberg
Copy link
Member

srosenberg commented Nov 1, 2022

I'd also suggest we only use this setting in the costfuzz roachtest instead of enabling it in all of them.

Good catch, I didn't realize this was disabling it for everything; that's highly unadvisable.

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @smg260)

@cucaroach
Copy link
Contributor Author

I'd also suggest we only use this setting in the costfuzz roachtest instead of enabling it in all of them.

Good catch, I didn't realize this was disabling it for everything; that's highly unadvisable.

Why? Just curious, I can make it only take affect for costfuzz but my theory is that THP is likely always bad for jemalloc. Also because we use jemalloc for large block allocations I don't think THP buys us much performance wise. Note that setting MALLOC_CONF only affects jemalloc and doesn't affect the go heap. Making it specific to costfuzz can be done and if we want to start with that more limited in scope experiment thats fine with me. Maybe instead of this experiment what I should be doing is adding code to determine where the THP is kicking in. If we dumped the jemalloc arenas and the proc/smaps we could make that determination. I'll give that a go...

@srosenberg
Copy link
Member

Why? Just curious, I can make it only take affect for costfuzz but my theory is that THP is likely always bad for jemalloc.

We don't know the side-effects of this change. This could potentially affect multiple roachtest benchmarks, causing a regression; other roachtests could also misbehave in unexpected ways. It'd be safer to test this out in an isolated manner first.

Also because we use jemalloc for large block allocations I don't think THP buys us much performance wise. Note that setting MALLOC_CONF only affects jemalloc and doesn't affect the go heap.

Right, the only consumer of this change is pebble's block cache.

Making it specific to costfuzz can be done and if we want to start with that more limited in scope experiment thats fine with me. Maybe instead of this experiment what I should be doing is adding code to determine where the THP is kicking in. If we dumped the jemalloc arenas and the proc/smaps we could make that determination. I'll give that a go...

Seems like a more fruitful next step. My hunch it isn't jemalloc. Kernel is using madvise by default (for THP). You could confirm/refute by running strace (example in [1]).

@cucaroach cucaroach closed this Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants