Support deployment mode #28

lloeki · 2025-11-03T16:27:18Z

Why?

Deployment mode is one of the major use cases in the wild.

What does this PR do?

Beat bundler and rubygems into submission via a two-stage injector:

set up GEM_HOME and GEM_PATH environment variables in ways compatible with BUNDLE_PATH
patch bundler to ignore deployment mode and not reset our modifications
patch rubygems to not call Bundler.setup before we do

Since vendored mode is not set, Bundler code that filters out other paths will be able to consider additional paths we add to gem paths.

How to test the change?

CI

Additional Notes:

A tough nut to crack. "Pure" deployment mode is supported (via BUNDLE_DEPLOYMENT=true), but not yet "standalone" vendored mode (e.g via BUNDLE_PATH=/some/where, optionally combined with BUNDLE_FROZEN=true for the same effect as deployment mode)

Deployment mode comes from vendored mode + frozen bundle

Since the test forwarder is written in Ruby and spawned as a separate process, it would be subject to injection through RUBYOPT during injection itself, creating a recursion.

Understanding the state of `Gem.path`, `GEM_PATH`, and `GEM_HOME` is critical for debugging.

Evaluation on every fetch is costly, especially with a fork.

p-datadog · 2025-11-18T12:40:09Z

src/mod/bundler.rb

+
+    mod = Module.new do
+      def kernel_exec(*args)
+        ENV['RUBYOPT'] = ENV['RUBYOPT'].gsub(%r{^(.*)(?:\s+|^)(-r(\s*)\S+/injector\.rb)(.*)$}, '\2 \1 \3')


What is this line attempting to accomplish?

It ensures the injector is required first by Ruby because Bundler itself massages RUBYOPT to make sure rubygems is required.

(Minor: Would it be simpler to prepend the injector to RUBYOPT always unconditionally? That way we'd avoid the regular expression which also took me a bit to follow)

Would it be clearer this way?

delete any -r<injector> reference

unconditionally prepend -r<injector>

Makes me think that <injector> may or may not be equal to injector.rb because reasons.

Which means that the names differing implies it may be required twice? So this part is important:

delete any -r<injector> reference

p-datadog

I don't see anything that is wrong in the diff but I cannot evaluate it for correctness either. The regexp I commented on for example, I don't understand what it is actually doing.

sarahchen6 · 2025-11-19T15:27:25Z

test/packages/datadog/ruby/2.7.0/Gemfile

Just a side comment that it's strange that Github does "diffs" like this... (i.e. documenting the change as "these Gemfiles have been renamed to the subsequent version", rather than "the 2.6.0 file changed, the 3.5.0+0 was added, and otherwise the files stayed the same")

Wow that is mighty odd indeed; since (most of?) these files are essentially the same, the rename detector completely falls apart here.

sarahchen6

Looks reasonable!

ivoanjo

I've given it a big pass. In general... I find this really hard to follow. We're doing very specific things in a very specific order to target very specific behaviors and... I'm dearly missing notes explaining why.

Without such context, I'm just looking at path changes, env variable changes, etc, which make it really hard to review beyond going "well the minimal test is passing so I hope real applications aren't meaningfully different". In particular, it's hard to for instance figure out if there's any gaps in our approach without the context explaining why and what.

ivoanjo · 2025-11-19T14:40:38Z

src/mod/context.rb


  def status
-    {
+    @status ||= {


Minor: Since this object gets reused, perhaps worth sprinkling a bunch of .freeze to guard against oops?

ivoanjo · 2025-11-19T14:48:49Z

src/mod/log.rb

+    @level ||= case ENV['DD_INTERNAL_RUBY_INJECTOR_LOG_LEVEL']
+               when 'DEBUG', 0 then DEBUG
+               when 'INFO',  1 then INFO
+               when 'WARN',  2 then WARN
+               when 'ERROR', 3 then ERROR
+               when 'FATAL', 4 then FATAL
+               else UNKNOWN
+               end


This is slightly... weird? Specifically, ENV effectively behaves as a Hash[String, String?], it won't ever be 0/1/2/3/4. Maybe you meant for those to be strings?

But zooming out a bit, it seems to me the intention here is to have a mode where we log more, for debugging. So perhaps this can be simplified as a more true/false for that... 🤔

In particular, a simple variable would probably by simpler as well when asking customers to enable something to help us troubleshoot some issue...?

it won't ever be 0/1/2/3/4. Maybe you meant for those to be strings?

You guessed right, I'll fix that

But zooming out a bit, it seems to me the intention here is to have a mode where we log more, for debugging. So perhaps this can be simplified as a more true/false for that...

The amount of intricate logs this thing can emit warrants the granularity.

when asking customers

This is absolutely not geared towards customers but to us, hence DD_INTERNAL.

If it's for internal use, I think the more the reason to keep the mechanism as simple as possible -- log everything as verbose as possible, or not. Not sure if the in-between states are worth the extra code/complexity?

ivoanjo · 2025-11-19T14:52:32Z

src/mod/main.rb

+      log.info { 'inject:patch' }
+
+      bundler = import 'bundler'
+
+      bundler.patch!
+    else
+      log.info { 'inject:skip' }
+    end


This reads a bit surprising to me... Why would we patch when DD_INTERNAL_RUBY_INJECTOR == false? Perhaps I'm misunderstanding the semantics here (some comments in the source would be very helpful!)

The name is indeed a bit lousy:

DD_INTERNAL_RUBY_INJECTOR=true => perform the injection steps

DD_INTERNAL_RUBY_INJECTOR=false => don't perform the injection steps

The "injection steps" are the "stage 1" injection of dependencies into the user's bundle; this is typically achieved when bundle exec is called. Once that stage 1 is done bundle exec performs a call to Kernel.exec, which starts a new process, which is intercepted by the C injector again, which adds the Ruby injector to RUBYOPT, and so it is invoked again, but this time we're under the correct bundler env as set up by the Ruby injector.

So DD_INTERNAL_RUBY_INJECTOR is set to false beforehand, and the second execution in the exec'd process won't start it all over again and just reuse the current env. As a bonus this also applies to any other spawn, system, exec... call, which will merely reuse the environment (including the new gemfile) and it will all Just Work (as it does for Bundler)

There is an exception to that "It Just Works": deployment mode is persisted in config or in the env, and to break through the grip of deployment mode (or rather vendored mode, which restricts visible paths to only one) we need to monkeypatch Bundler so that the processes do not see the deployment mode config being true.

So, yes, we have to patch Bundler when DD_INTERNAL_RUBY_INJECTOR=false.

This should be a comment in the code, not in the review! :P

Agreed... sort of! Rather, this should be documentation about how this thing works.

I tried to be detailed in commit messages as well.

Thanks for being my rubber duck / hallway tester about what is unclear to a newcomer! Being head-deep into this problem domain for a while I must admit a can lose sight of what is clear and obvious and what is not...

ivoanjo · 2025-11-19T15:01:23Z

src/mod/guard.rb

-    if !status[:bundler][:use_system_gems]
-      result << { :name => 'bundler.use_system_gems', :reason => 'bundler.vendored' }
+    if !status[:fs][:writable]
+      result << { :name => 'fs.writable', :reason => 'fs.readonly' }
    end


I believe this was the only place using status[:bundler][:use_system_gems] so it can be removed now :)

Not directly: I'd rather not remove it because of logging.

This allows examining relevant Bundler internals for diagnostic. Indeed that we don't use this state directly doesn't mean it doesn't have an influence on Bundler's behaviour. use_system_gems in particular is used in multiple Bundler conditionals that are really important to understand come debug time.

ivoanjo · 2025-11-19T15:03:18Z

src/mod/guard.rb

-    if status[:bundler][:deployment]
-      result << { :name => 'bundler.deployment', :reason => 'bundler.deployment' }
+    if status[:bundler][:settings][:path]
+      result << { :name => 'bundler.path', :reason => 'bundler.vendored' }
    end


I spotted there's still references to bundler.deployment in "report.rb" and "test.rb"... they can (should?) be removed, right?

Good catch! I will address.

ivoanjo · 2025-11-19T15:37:38Z

src/mod/injector.rb

+    if context[:bundler][:deployment]
+      app_bundle_path = context[:bundler][:bundle_path]
+
+      ENV['DD_INTERNAL_RUBY_INJECTOR_PATCH'] = "mode=deployment,path=#{package_gem_home}:#{app_bundle_path}"


So... I see this being set here to this complex string and in main.rb we check if ENV['DD_INTERNAL_RUBY_INJECTOR_PATCH'] is set and... do nothing with it? What's up with that? I'm clearly missing something... 👀

You're not missing anything, it is an attempt at passing information from the first stage (bundle exec) to the second stage (the exec'd process).

Reason is:

diagnostic logging and inspection from the second stage

consistency/sanity checks

full support for vendor mode and tackling read-only filesystems will require passing information from one stage to the other

ivoanjo · 2025-11-19T15:40:07Z

src/mod/injector.rb

+      Gem.paths = { 'GEM_PATH' => "#{package_gem_home}:#{app_bundle_path}" }
+      ENV['GEM_PATH'] = Gem.path.join(File::PATH_SEPARATOR)
+      ENV['GEM_HOME'] = app_bundle_path
+
+      BUNDLER.patch!
+    else


So similarly to my comment on bundler.rb, it's very hard to follow along why we're doing these changes -- e.g. if someone ever needs to investigate an issue around this code we'll need to effectively start from scratch in reverse engineering all of the logic here again.

This really needs a big comment explaining what we're doing and why...

Also -- why do we .patch! bundler here + again on main.rb? What's up with that?

Agreed about the lack of docs.

Also -- why do we .patch! bundler here + again on main.rb? What's up with that?

This one is during the first stage (bundle exec), the one in main happens during the second stage. Since the second stage is essentially another process (albeit with the same pid because of exec), all patches have been forgotten.

ivoanjo · 2025-11-19T15:44:23Z

test/bin/test.rb

+        'injection should abort',
+        'app gemfile should not include datadog',
+        'app lockfile should not include datadog',
+      # TODO: disabled due to race condition on naive deletion


What's a "naive deletion" in this context again?

Doing some rm of the new Gemfile+lockfile, possibly even in an ensure.

e.g there's a risk that two Ruby processes ran at about the same time in the same bundle would attempt injection and walk upon each other while one is creating and the other removing files.

In that case there's no problem leaving the files around: a hypothetical rm would happen if injection failed, and so they would not be pointed to.

Similarly two concurrent processes simultaneously attempting and succeeding at injection, while wasteful, would produce the same content idempotently.

The problem only occurs when someone attempts to clean up, and doing the whole creation atomically is tough (e.g we can't operate in a temp folder if the original Gemfile contains relative file references; even moving both Gemfile+lockfile on success isn't atomic)

Something similar happened in testing which made the tests flaky (they would somehow incorrectly pass).

ivoanjo · 2025-11-19T15:52:19Z

test/fixtures/unbundled/stub.rb

I don't understand any of these fixture changes -- why did we need the many files with a single puts, and why do we now need to have them being copy/pasted copies of all of this info?

(It's very hard to follow the why on all these changes...)

These stubs are the target Ruby programs that get ran during testing, to be injected into.

Initially they were a mere puts so that there was output visually confirming they were being ran.

Then as I needed more diagnostic output to understand and/or confirm behaviour, each stub got an increasing amount of information. This information is as of today only consumed by humans but ultimately the intent is to have this information captured and checked by tests.

There is one per fixture directory because when tests are ran the fixture is being copied to a unique temporary folder from which things are run. Each directory being self-contained is much easier to understand, manage, and copy around: devising a mechanism to have them shared while being a) copied and b) ran inside of various containers is non-trivial and adds a fair amount of complexity.

In addition it is not inconceivable that the stub contents may need to be specialised depending on the fixture.

Can we just generate this instead? E.g. have a "base_fixture.rb" that gets copied and renamed as needed? It seems a lot of copy-paste is going around here

ivoanjo · 2025-11-19T15:56:11Z

test/packages/datadog/ruby/2.6.0/Gemfile

+# To ensure no crash happens on Ruby 2.6 we package the corresponding
+# `did_you_mean` version.
+#


I don't understand this comment -- specifically, it's not clear if the impact here is in our test code or on in the code in production?

Both.

The packages/<foo> directory contains an injection "package" (for lack of a better word), IOW the dependencies that we want injected into the applications.

The injector code is intended to be largely Datadog-agnostic, so the simple package is a minimal one without the datadog baggage and assumptions.

The datadog package mimics what is packaged by our OCI package builder, only minimally so for testing purposes.

This change needs to be reflected (and is, here) in the actual package building process.

Vendored mode (`BUNDLE_PATH` / `use_system_gems?` => `false`) removes all paths but the vendor path from the Gem path list. This effectively hides all other gems, making any injection impossible. Massage `GEM_PATH` and `GEM_HOME` to set them to values that: - are identical in behaviour with `BUNDLE_PATH='vendor/bundle'` - allow injection once vendored mode is relaxed Note: this does not take an arbitrary `BUNDLE_PATH` into account yet, instead focusing on the default for deployment mode only (`vendor/bundle`).

When deployment mode is detected we patch Bundler to act as if it wasn't set. This: - makes Bundler not set `use_system_gems?` to `false` - makes Bundler not set the vendored path to `vendor/bundle` But thanks to the `GEM_HOME` and `GEM_PATH` variables that have been set beforehand, gems will be looked up in the appropriate location, all the while having broken out of being able to see _only_ vendored gems. Ergo, gem injection can proceed.

Since Ruby 1.9 `rubygems` is automatically required default. Ruby and `rubygems` contain mechanisms that enforce rubygems is not just loaded, but that it is being loaded *first*. Right before executing the Ruby process we make sure that the Ruby injector is appearing first, and whatever else is present still appears too, but afterwards.

It turns out rubygems bootstraps bundler straight from `rubygems` code. This is used so that once, say, `bundle exec` completes, running ruby programs is also executed in that bundler context. Indeed withotut this `bundle exec foo` itself would not work! As soon as it `exec`s to `foo`, any previous Ruby context would be lost. But it also means that Ruby loads bundler stuff too early. Unset the undocumented `BUNDLER_SETUP`.

When `bundle exec` happens, it proceeds with processing Bundler things, then `exec`s into the actual Ruby program to execute. When it does so, this is a new process, hence any patch that we have applied goes away, which results in deployment mode being re-armed. Apply the `bundler` patches to override Bundler behaviours. We can safely load `bundler` withotu forking in that case since we are in a bundled case.

Both threads and pipes were otherwise leaking.

When bundle is unlocked `bundle exec` (obviously) fails, and thus makes the test fail due to the exit code. Instead, when unlocked, run the stub directly to test guardrails are in effect. Note: an extension of this change would be to: - still run bundle exec and ensure it exits with a non-zero status, behaving as expected. - run without bundle exec in a locked bundle with `RUBYGEMS_GEMDEPS` unset (as of this commit). - run without bundle exec in a locked bundle with `RUBYGEMS_GEMDEPS` set to `-` and/or the path to the fixture `Gemfile`.

- `-r` is not supposed to have a space even though it can in some later Ruby versions - Before 1.9 `Gem` isn't present until `rubygems` is required

There's a discrepancy when e.g. `BUNDLE_PATH` is set.

When `BUNDLE_PATH` is set to `/bundle` the result of `bundle install` is lost, so the bundle is empty come test time. Store this path in a volume.

There are two cases we don't handle right: - vendor path by itself: we only patch Bundler to ignore deployment mode - deployment mode non-default vendor path: we hardcode the path

Bundler doesn't define `Bundler::CLI` outside of `lib/bundler/cli.rb` and defines commands as e.g `class CLI::Exec` to save nesting. This makes requiring `lib/bundler/cli/exec.rb` standalone crash.

Test packages were pinned to a problematic version of `libddwaf` that causes misresolutions due to multiple overlapping binary gem platforms being available.

Ruby 2.6 decided to activate the default gem via `Kernel.gem` which makes it subject to isolation, hence breaking under vendored mode. This `Kernel.gem` activation happens in `gem_prelude` and can only be skipped with the `--disable=did_you_mean` CLI flag. Ruby 2.7 reverted that problematic behaviour, instead resorting to a plain `require` which will either simply load from `$LOAD_PATH` or use a bundled version. To ensure no crash happens on Ruby 2.6 we package the corresponding `did_you_mean` version. See: - gem_prelude.rb calls `Kernel.gem` on 2.6, but not 2.7: https://github.com/ruby/ruby/blob/ruby_2_6/gem_prelude.rb#L3-L7 https://github.com/ruby/ruby/blob/ruby_2_7/gem_prelude.rb#L2 - gem_prelude.rb is included as a prelude script https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L158-L161 - prelude scripts get compiled in prelude.c: https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L1050-L1059 - prelude.c is generated from a template: https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L189 - the template embeds ISeqs of scripts https://github.com/ruby/ruby/blob/ruby_2_6/template/prelude.c.tmpl#L167-L168 - prelude targets are just prelude.c: https://github.com/ruby/ruby/blob/ruby_2_6/common.mk#L1081-L1082

lloeki added 10 commits November 3, 2025 16:42

Enable tests for vendored mode and deployment mode

adaaeba

Deployment mode comes from vendored mode + frozen bundle

Unguard deployment mode and vendored mode

110152a

Run tests using RUBYOPT and bundle exec

b1bcb28

Protect forwarder against RUBYOPT

85bd7e7

Since the test forwarder is written in Ruby and spawned as a separate process, it would be subject to injection through RUBYOPT during injection itself, creating a recursion.

Handle forwarder failure

ee7d1db

Add rubygems path info to context

9c449ac

Understanding the state of `Gem.path`, `GEM_PATH`, and `GEM_HOME` is critical for debugging.

Pass (and log) context to injector

3ba522c

Improve error and fatal logging

1861266

Add state output information to stubs

61300e7

Memoize status

3d0795c

Evaluation on every fetch is costly, especially with a fork.

lloeki marked this pull request as ready for review November 14, 2025 15:43

lloeki requested a review from a team as a code owner November 14, 2025 15:43

p-datadog reviewed Nov 18, 2025

View reviewed changes

p-datadog approved these changes Nov 18, 2025

View reviewed changes

sarahchen6 reviewed Nov 19, 2025

View reviewed changes

sarahchen6 approved these changes Nov 19, 2025

View reviewed changes

ivoanjo reviewed Nov 19, 2025

View reviewed changes

lloeki added 13 commits November 20, 2025 11:12

Wrap run output for readability

073ee35

Transform select values for test filtering

5deda6e

Group tests by execution context commonality

50c4b67

Improve test output clarity

baddf02

Handle exception in test case

ab19c8c

Remove dead comments

683b8bc

Ensure IO thread completion

4ccba72

Both threads and pipes were otherwise leaking.

lloeki added 15 commits November 20, 2025 11:13

Fix testing on Ruby 1.8

d9c54ac

- `-r` is not supposed to have a space even though it can in some later Ruby versions - Before 1.9 `Gem` isn't present until `rubygems` is required

Pass test case env to bundle install

b94ee03

There's a discrepancy when e.g. `BUNDLE_PATH` is set.

Persist /bundle across runs

b60a06e

When `BUNDLE_PATH` is set to `/bundle` the result of `bundle install` is lost, so the bundle is empty come test time. Store this path in a volume.

Add vendored fixture

f8b3521

Guard against manually specified vendored path

9a8a169

There are two cases we don't handle right: - vendor path by itself: we only patch Bundler to ignore deployment mode - deployment mode non-default vendor path: we hardcode the path

Use gemfile+lockfile from context

1ab0649

Fix exception logging

f29a590

Use proper vendored bundle path from context

ec01210

Add missing ruby 3.5 test coverage

2997137

Fix Ruby 3.5.0

d832620

Require necessary Bundler::CLI when patching

66e0e40

Bundler doesn't define `Bundler::CLI` outside of `lib/bundler/cli.rb` and defines commands as e.g `class CLI::Exec` to save nesting. This makes requiring `lib/bundler/cli/exec.rb` standalone crash.

Allow log level control

1e79de6

Update packages

26fbb5a

Test packages were pinned to a problematic version of `libddwaf` that causes misresolutions due to multiple overlapping binary gem platforms being available.

Pin Docker version

6a1e55d

lloeki force-pushed the lloeki/deployment-mode branch from 07e0ecd to 6a1e55d Compare November 20, 2025 10:13

Support deployment mode #28

Are you sure you want to change the base?

Support deployment mode #28

Uh oh!

Conversation

lloeki commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

What does this PR do?

How to test the change?

Additional Notes:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lloeki Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

p-datadog left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarahchen6 left a comment

Choose a reason for hiding this comment

Uh oh!

ivoanjo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lloeki Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lloeki Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lloeki Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lloeki Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lloeki Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

lloeki commented Nov 3, 2025 •

edited

Loading

lloeki Nov 19, 2025 •

edited

Loading

lloeki Nov 19, 2025 •

edited

Loading

lloeki Nov 19, 2025 •

edited

Loading

lloeki Nov 19, 2025 •

edited

Loading

lloeki Nov 20, 2025 •

edited

Loading

lloeki Nov 20, 2025 •

edited

Loading

lloeki Nov 20, 2025 •

edited

Loading