-
Notifications
You must be signed in to change notification settings - Fork 0
Support deployment mode #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Deployment mode comes from vendored mode + frozen bundle
Since the test forwarder is written in Ruby and spawned as a separate process, it would be subject to injection through RUBYOPT during injection itself, creating a recursion.
Understanding the state of `Gem.path`, `GEM_PATH`, and `GEM_HOME` is critical for debugging.
Evaluation on every fetch is costly, especially with a fork.
|
|
||
| mod = Module.new do | ||
| def kernel_exec(*args) | ||
| ENV['RUBYOPT'] = ENV['RUBYOPT'].gsub(%r{^(.*)(?:\s+|^)(-r(\s*)\S+/injector\.rb)(.*)$}, '\2 \1 \3') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this line attempting to accomplish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ensures the injector is required first by Ruby because Bundler itself massages RUBYOPT to make sure rubygems is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Minor: Would it be simpler to prepend the injector to RUBYOPT always unconditionally? That way we'd avoid the regular expression which also took me a bit to follow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be clearer this way?
- delete any
-r<injector>reference - unconditionally prepend
-r<injector>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes me think that <injector> may or may not be equal to injector.rb because reasons.
Which means that the names differing implies it may be required twice? So this part is important:
delete any
-r<injector>reference
p-datadog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything that is wrong in the diff but I cannot evaluate it for correctness either. The regexp I commented on for example, I don't understand what it is actually doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a side comment that it's strange that Github does "diffs" like this... (i.e. documenting the change as "these Gemfiles have been renamed to the subsequent version", rather than "the 2.6.0 file changed, the 3.5.0+0 was added, and otherwise the files stayed the same")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that is mighty odd indeed; since (most of?) these files are essentially the same, the rename detector completely falls apart here.
sarahchen6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable!
ivoanjo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given it a big pass. In general... I find this really hard to follow. We're doing very specific things in a very specific order to target very specific behaviors and... I'm dearly missing notes explaining why.
Without such context, I'm just looking at path changes, env variable changes, etc, which make it really hard to review beyond going "well the minimal test is passing so I hope real applications aren't meaningfully different". In particular, it's hard to for instance figure out if there's any gaps in our approach without the context explaining why and what.
|
|
||
| def status | ||
| { | ||
| @status ||= { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Since this object gets reused, perhaps worth sprinkling a bunch of .freeze to guard against oops?
| @level ||= case ENV['DD_INTERNAL_RUBY_INJECTOR_LOG_LEVEL'] | ||
| when 'DEBUG', 0 then DEBUG | ||
| when 'INFO', 1 then INFO | ||
| when 'WARN', 2 then WARN | ||
| when 'ERROR', 3 then ERROR | ||
| when 'FATAL', 4 then FATAL | ||
| else UNKNOWN | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly... weird? Specifically, ENV effectively behaves as a Hash[String, String?], it won't ever be 0/1/2/3/4. Maybe you meant for those to be strings?
But zooming out a bit, it seems to me the intention here is to have a mode where we log more, for debugging. So perhaps this can be simplified as a more true/false for that... 🤔
In particular, a simple variable would probably by simpler as well when asking customers to enable something to help us troubleshoot some issue...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it won't ever be 0/1/2/3/4. Maybe you meant for those to be strings?
You guessed right, I'll fix that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But zooming out a bit, it seems to me the intention here is to have a mode where we log more, for debugging. So perhaps this can be simplified as a more true/false for that...
The amount of intricate logs this thing can emit warrants the granularity.
when asking customers
This is absolutely not geared towards customers but to us, hence DD_INTERNAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's for internal use, I think the more the reason to keep the mechanism as simple as possible -- log everything as verbose as possible, or not. Not sure if the in-between states are worth the extra code/complexity?
| log.info { 'inject:patch' } | ||
|
|
||
| bundler = import 'bundler' | ||
|
|
||
| bundler.patch! | ||
| else | ||
| log.info { 'inject:skip' } | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads a bit surprising to me... Why would we patch when DD_INTERNAL_RUBY_INJECTOR == false? Perhaps I'm misunderstanding the semantics here (some comments in the source would be very helpful!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is indeed a bit lousy:
DD_INTERNAL_RUBY_INJECTOR=true=> perform the injection stepsDD_INTERNAL_RUBY_INJECTOR=false=> don't perform the injection steps
The "injection steps" are the "stage 1" injection of dependencies into the user's bundle; this is typically achieved when bundle exec is called. Once that stage 1 is done bundle exec performs a call to Kernel.exec, which starts a new process, which is intercepted by the C injector again, which adds the Ruby injector to RUBYOPT, and so it is invoked again, but this time we're under the correct bundler env as set up by the Ruby injector.
So DD_INTERNAL_RUBY_INJECTOR is set to false beforehand, and the second execution in the exec'd process won't start it all over again and just reuse the current env. As a bonus this also applies to any other spawn, system, exec... call, which will merely reuse the environment (including the new gemfile) and it will all Just Work (as it does for Bundler)
There is an exception to that "It Just Works": deployment mode is persisted in config or in the env, and to break through the grip of deployment mode (or rather vendored mode, which restricts visible paths to only one) we need to monkeypatch Bundler so that the processes do not see the deployment mode config being true.
So, yes, we have to patch Bundler when DD_INTERNAL_RUBY_INJECTOR=false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a comment in the code, not in the review! :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed... sort of! Rather, this should be documentation about how this thing works.
I tried to be detailed in commit messages as well.
Thanks for being my rubber duck / hallway tester about what is unclear to a newcomer! Being head-deep into this problem domain for a while I must admit a can lose sight of what is clear and obvious and what is not...
| if !status[:bundler][:use_system_gems] | ||
| result << { :name => 'bundler.use_system_gems', :reason => 'bundler.vendored' } | ||
| if !status[:fs][:writable] | ||
| result << { :name => 'fs.writable', :reason => 'fs.readonly' } | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was the only place using status[:bundler][:use_system_gems] so it can be removed now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly: I'd rather not remove it because of logging.
This allows examining relevant Bundler internals for diagnostic. Indeed that we don't use this state directly doesn't mean it doesn't have an influence on Bundler's behaviour. use_system_gems in particular is used in multiple Bundler conditionals that are really important to understand come debug time.
| if status[:bundler][:deployment] | ||
| result << { :name => 'bundler.deployment', :reason => 'bundler.deployment' } | ||
| if status[:bundler][:settings][:path] | ||
| result << { :name => 'bundler.path', :reason => 'bundler.vendored' } | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spotted there's still references to bundler.deployment in "report.rb" and "test.rb"... they can (should?) be removed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I will address.
| if context[:bundler][:deployment] | ||
| app_bundle_path = context[:bundler][:bundle_path] | ||
|
|
||
| ENV['DD_INTERNAL_RUBY_INJECTOR_PATCH'] = "mode=deployment,path=#{package_gem_home}:#{app_bundle_path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... I see this being set here to this complex string and in main.rb we check if ENV['DD_INTERNAL_RUBY_INJECTOR_PATCH'] is set and... do nothing with it? What's up with that? I'm clearly missing something... 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not missing anything, it is an attempt at passing information from the first stage (bundle exec) to the second stage (the exec'd process).
Reason is:
- diagnostic logging and inspection from the second stage
- consistency/sanity checks
- full support for vendor mode and tackling read-only filesystems will require passing information from one stage to the other
| Gem.paths = { 'GEM_PATH' => "#{package_gem_home}:#{app_bundle_path}" } | ||
| ENV['GEM_PATH'] = Gem.path.join(File::PATH_SEPARATOR) | ||
| ENV['GEM_HOME'] = app_bundle_path | ||
|
|
||
| BUNDLER.patch! | ||
| else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So similarly to my comment on bundler.rb, it's very hard to follow along why we're doing these changes -- e.g. if someone ever needs to investigate an issue around this code we'll need to effectively start from scratch in reverse engineering all of the logic here again.
This really needs a big comment explaining what we're doing and why...
Also -- why do we .patch! bundler here + again on main.rb? What's up with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed about the lack of docs.
Also -- why do we
.patch!bundler here + again onmain.rb? What's up with that?
This one is during the first stage (bundle exec), the one in main happens during the second stage. Since the second stage is essentially another process (albeit with the same pid because of exec), all patches have been forgotten.
| 'injection should abort', | ||
| 'app gemfile should not include datadog', | ||
| 'app lockfile should not include datadog', | ||
| # TODO: disabled due to race condition on naive deletion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a "naive deletion" in this context again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing some rm of the new Gemfile+lockfile, possibly even in an ensure.
e.g there's a risk that two Ruby processes ran at about the same time in the same bundle would attempt injection and walk upon each other while one is creating and the other removing files.
In that case there's no problem leaving the files around: a hypothetical rm would happen if injection failed, and so they would not be pointed to.
Similarly two concurrent processes simultaneously attempting and succeeding at injection, while wasteful, would produce the same content idempotently.
The problem only occurs when someone attempts to clean up, and doing the whole creation atomically is tough (e.g we can't operate in a temp folder if the original Gemfile contains relative file references; even moving both Gemfile+lockfile on success isn't atomic)
Something similar happened in testing which made the tests flaky (they would somehow incorrectly pass).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand any of these fixture changes -- why did we need the many files with a single puts, and why do we now need to have them being copy/pasted copies of all of this info?
(It's very hard to follow the why on all these changes...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These stubs are the target Ruby programs that get ran during testing, to be injected into.
Initially they were a mere puts so that there was output visually confirming they were being ran.
Then as I needed more diagnostic output to understand and/or confirm behaviour, each stub got an increasing amount of information. This information is as of today only consumed by humans but ultimately the intent is to have this information captured and checked by tests.
There is one per fixture directory because when tests are ran the fixture is being copied to a unique temporary folder from which things are run. Each directory being self-contained is much easier to understand, manage, and copy around: devising a mechanism to have them shared while being a) copied and b) ran inside of various containers is non-trivial and adds a fair amount of complexity.
In addition it is not inconceivable that the stub contents may need to be specialised depending on the fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just generate this instead? E.g. have a "base_fixture.rb" that gets copied and renamed as needed? It seems a lot of copy-paste is going around here
| # To ensure no crash happens on Ruby 2.6 we package the corresponding | ||
| # `did_you_mean` version. | ||
| # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment -- specifically, it's not clear if the impact here is in our test code or on in the code in production?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both.
The packages/<foo> directory contains an injection "package" (for lack of a better word), IOW the dependencies that we want injected into the applications.
The injector code is intended to be largely Datadog-agnostic, so the simple package is a minimal one without the datadog baggage and assumptions.
The datadog package mimics what is packaged by our OCI package builder, only minimally so for testing purposes.
This change needs to be reflected (and is, here) in the actual package building process.
Vendored mode (`BUNDLE_PATH` / `use_system_gems?` => `false`) removes all paths but the vendor path from the Gem path list. This effectively hides all other gems, making any injection impossible. Massage `GEM_PATH` and `GEM_HOME` to set them to values that: - are identical in behaviour with `BUNDLE_PATH='vendor/bundle'` - allow injection once vendored mode is relaxed Note: this does not take an arbitrary `BUNDLE_PATH` into account yet, instead focusing on the default for deployment mode only (`vendor/bundle`).
When deployment mode is detected we patch Bundler to act as if it wasn't set. This: - makes Bundler not set `use_system_gems?` to `false` - makes Bundler not set the vendored path to `vendor/bundle` But thanks to the `GEM_HOME` and `GEM_PATH` variables that have been set beforehand, gems will be looked up in the appropriate location, all the while having broken out of being able to see _only_ vendored gems. Ergo, gem injection can proceed.
Since Ruby 1.9 `rubygems` is automatically required default. Ruby and `rubygems` contain mechanisms that enforce rubygems is not just loaded, but that it is being loaded *first*. Right before executing the Ruby process we make sure that the Ruby injector is appearing first, and whatever else is present still appears too, but afterwards.
It turns out rubygems bootstraps bundler straight from `rubygems` code. This is used so that once, say, `bundle exec` completes, running ruby programs is also executed in that bundler context. Indeed withotut this `bundle exec foo` itself would not work! As soon as it `exec`s to `foo`, any previous Ruby context would be lost. But it also means that Ruby loads bundler stuff too early. Unset the undocumented `BUNDLER_SETUP`.
When `bundle exec` happens, it proceeds with processing Bundler things, then `exec`s into the actual Ruby program to execute. When it does so, this is a new process, hence any patch that we have applied goes away, which results in deployment mode being re-armed. Apply the `bundler` patches to override Bundler behaviours. We can safely load `bundler` withotu forking in that case since we are in a bundled case.
Both threads and pipes were otherwise leaking.
When bundle is unlocked `bundle exec` (obviously) fails, and thus makes the test fail due to the exit code. Instead, when unlocked, run the stub directly to test guardrails are in effect. Note: an extension of this change would be to: - still run bundle exec and ensure it exits with a non-zero status, behaving as expected. - run without bundle exec in a locked bundle with `RUBYGEMS_GEMDEPS` unset (as of this commit). - run without bundle exec in a locked bundle with `RUBYGEMS_GEMDEPS` set to `-` and/or the path to the fixture `Gemfile`.
- `-r` is not supposed to have a space even though it can in some later Ruby versions - Before 1.9 `Gem` isn't present until `rubygems` is required
There's a discrepancy when e.g. `BUNDLE_PATH` is set.
When `BUNDLE_PATH` is set to `/bundle` the result of `bundle install` is lost, so the bundle is empty come test time. Store this path in a volume.
There are two cases we don't handle right: - vendor path by itself: we only patch Bundler to ignore deployment mode - deployment mode non-default vendor path: we hardcode the path
Bundler doesn't define `Bundler::CLI` outside of `lib/bundler/cli.rb` and defines commands as e.g `class CLI::Exec` to save nesting. This makes requiring `lib/bundler/cli/exec.rb` standalone crash.
Test packages were pinned to a problematic version of `libddwaf` that causes misresolutions due to multiple overlapping binary gem platforms being available.
Ruby 2.6 decided to activate the default gem via `Kernel.gem` which makes it subject to isolation, hence breaking under vendored mode. This `Kernel.gem` activation happens in `gem_prelude` and can only be skipped with the `--disable=did_you_mean` CLI flag. Ruby 2.7 reverted that problematic behaviour, instead resorting to a plain `require` which will either simply load from `$LOAD_PATH` or use a bundled version. To ensure no crash happens on Ruby 2.6 we package the corresponding `did_you_mean` version. See: - gem_prelude.rb calls `Kernel.gem` on 2.6, but not 2.7: https://github.com/ruby/ruby/blob/ruby_2_6/gem_prelude.rb#L3-L7 https://github.com/ruby/ruby/blob/ruby_2_7/gem_prelude.rb#L2 - gem_prelude.rb is included as a prelude script https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L158-L161 - prelude scripts get compiled in prelude.c: https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L1050-L1059 - prelude.c is generated from a template: https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L189 - the template embeds ISeqs of scripts https://github.com/ruby/ruby/blob/ruby_2_6/template/prelude.c.tmpl#L167-L168 - prelude targets are just prelude.c: https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L1081-L1082
07e0ecd to
6a1e55d
Compare
Why?
Deployment mode is one of the major use cases in the wild.
What does this PR do?
Beat
bundlerandrubygemsinto submission via a two-stage injector:GEM_HOMEandGEM_PATHenvironment variables in ways compatible withBUNDLE_PATHbundlerto ignore deployment mode and not reset our modificationsrubygemsto not callBundler.setupbefore we doSince vendored mode is not set, Bundler code that filters out other paths will be able to consider additional paths we add to gem paths.
How to test the change?
CI
Additional Notes:
A tough nut to crack. "Pure" deployment mode is supported (via
BUNDLE_DEPLOYMENT=true), but not yet "standalone" vendored mode (e.g viaBUNDLE_PATH=/some/where, optionally combined withBUNDLE_FROZEN=truefor the same effect as deployment mode)