Skip to content

Conversation

@kares
Copy link
Contributor

@kares kares commented Mar 23, 2021

What does this PR do?

This PR handles memory leaks that exhibit when running the rake/gradle task.

NOTE: this is mostly a partial understanding what's going on by simply following the "huge" objects from memory dumps.
There might be better ways to handle the issue - without the patches (e.g. rewrite plugin generation to put each bundle resolution into a disposable runtime or run as a process).

Details on why each patch is relevant are to be found in code comments.

Why is it important/What is the impact to the user?

To be able to generate plugin docs we need to check available plugin versions.

How to test this PR locally

./gradlew generatePluginsVersion -Dorg.gradle.jvmargs="-XX:+HeapDumpOnOutOfMemoryError -Xmx2g"
should no longer end up with an out of memory error

Tested with Bundler 2.2.14 as well as latest 2.2.15

Related issues

resolves #12758
related: #10942

@kares kares added the build label Mar 23, 2021
kares added a commit to elastic/docs-tools that referenced this pull request Mar 23, 2021
follow-up after elastic/logstash#12763 is merged (on master)

have successfully tested  `./gradlew generatePluginsVersion` with 2g of heap, but let's not get overly optimistic 😉
Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two very good finds. I've added a comment about a maybe-safer interception of the worker pool size. Feel free to accept or reject the suggestion.

# The compact index is built using multiple threads and this is hard-coded atm to 25 threads:
# `Bundler::Worker.new(Bundler.current_ruby.rbx? ? 1 : 25, worker_name, func)`
# each thread might built up a Bundler::Source::Rubygems object which retains more than 100MB.
# By limiting the worker thread count we make sure not to produce too many of these objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good find.

I'm a bit wary of the Bundler::Fetcher::CompactIndex#bundle_worker implementation changing out of under us -- if it were to perform an operation that had a side-effect of starting the worker threads, our interception here would be too late to make a difference.

An alternate approach would be to patch Bundler::Worker to cap the size of the pool, something like:

Worker.class_eval do
  alias_method :__initialize, :initialize
  def initialize(size, name, func)
    if size > 2
      $stderr.puts("Bundler::Worker: initializing limited worker pool (size: 2; requested: #{size})"
      size = 2
    end
    __initialize(size, name, func)
  end
end

Copy link
Contributor Author

@kares kares Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I understand the concern and have thought about patching the Worker as well.
The worker was introduced 4.5+ years ago with Bundler 2.2.
(has bee used previously but existed inline - there wasn't a bundle_worker method here).

Would say the main use-case for Bundler::Worker is the parallel installation (from the installer), this can be disabled using the API if we choose to (command line --jobs option).

That being said I would rather not cause future surprises by affecting bundle install, unless we know it to be problematic. My general approach to patches tends to be to be contained and have minimal effect on other places.

Instantiating Bundle::Worker does not spawn threads, there needs to be an enq operation to do so (original code changing to do so would show up as changing arguments for the method).

Interestingly the 25 constant seems to have a life of its own, will try to raise this upstream with a reproducer.

@kares kares merged commit cc615da into elastic:master Mar 24, 2021
@kares kares added the v7.13.0 label Mar 24, 2021
kares added a commit to kares/logstash that referenced this pull request Mar 24, 2021
kares added a commit to kares/logstash that referenced this pull request Mar 24, 2021
@kares kares added the v7.12.1 label Mar 24, 2021
kares added a commit that referenced this pull request Mar 24, 2021
kares added a commit that referenced this pull request Mar 25, 2021
kares added a commit to elastic/docs-tools that referenced this pull request Jun 14, 2021
* reduce memory for plugin versions generation

follow-up after elastic/logstash#12763 is merged (on master)

have successfully tested  `./gradlew generatePluginsVersion` with 2g of heap, but let's not get overly optimistic 😉

* keep 12g for 6.8 on plugin version generation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Plugins doc gen OOM's on master branch

3 participants