Improve Sandbox "owner exited" FAQ #694

tiagoefmoraes · 2025-10-13T16:43:13Z

TLDR

Started process need to be terminated while the test process is still running to avoid "owner exited" errors.

More details

I was still running into "owner exited" errors even after following the FAQ's instructions after some investigations I found that automatic start_supervised!/2 process termination and on_exit/1 callback execution happen after the test process has terminated, this creates a (small) window of time where the test is already dead and the processes are still running.

ExUnit's docs state that processes started with start_supervised!/2 are "guaranteed to exit before the next test starts" and that on_exit/2 callbacks are run "once the test exits".

To completly avoid these errors we need to stop the started processes in the test process (maybe we're missing a before_exit/1 callback in ExUnit).

The issues can be simulated with the following script, to check the suggested fixes uncomment the corresponding code in each test:

Expand Script

defmodule MyApp.SlowTerminate do
  use GenServer

  def start_link(opts), do: GenServer.start_link(__MODULE__, opts)

  @impl GenServer
  def init(opts) do
    Process.flag(:trap_exit, true)
    {:ok, opts}
  end

  @impl GenServer
  def terminate(reason, _state) do
    IO.puts("Terminating #{__MODULE__}: #{reason}...")
    Process.sleep(100)
    IO.puts("Terminated #{__MODULE__}: #{reason}.")
    :ok
  end
end

defmodule MyApp.DelayedAction do
  use GenServer, restart: :temporary

  def start_link(test_pid), do: GenServer.start_link(__MODULE__, test_pid)

  @impl GenServer
  def init(test_pid) do
    Process.flag(:trap_exit, true)
    Process.send_after(self(), :work, 50)
    {:ok, test_pid}
  end

  @impl GenServer
  def handle_info(:work, test_pid) do
    IO.puts("Working")

    if Process.alive?(test_pid) do
      send(test_pid, :hello)

      # {:noreply, test_pid}
      {:stop, :normal, test_pid}
    else
      raise "test is dead"
    end
  end

  @impl GenServer
  def terminate(reason, _state) do
    IO.puts("Terminated #{__MODULE__}: #{inspect(reason)}.")
    :ok
  end
end

ExUnit.start(autorun: false)

defmodule MyApp.PeriodicallyTest do
  use ExUnit.Case, async: true

  test "started in test" do
    start_supervised!({MyApp.DelayedAction, self()})
    # Simulate slow terminate to make the issue deterministic
    start_supervised!(MyApp.SlowTerminate)
    # uncomment stop_supervised!/1 to solve the issue
    # stop_supervised!(MyApp.DelayedAction)
    IO.inspect(:test_end)
  end

  test "started in a Task.Supervisor" do
    # Simulate Task.Supervisor in application
    {:ok, pid} = Task.Supervisor.start_link(name: MyApp.Task.Supervisor)

    on_exit(fn ->
      IO.puts("Terminating Task.Supervisor")
      Process.exit(pid, :kill)
    end)

    Process.unlink(pid)
    # Test start
    on_exit(fn ->
      for pid <- Task.Supervisor.children(MyApp.Task.Supervisor) do
        IO.puts("Waiting #{inspect(pid)} to terminate...")
        ref = Process.monitor(pid)
        assert_receive {:DOWN, ^ref, _, _, _}
      end
    end)

    test_pid = self()

    Task.Supervisor.start_child(MyApp.Task.Supervisor, fn ->
      # Simulate slow processing to make issue deterministic
      Process.sleep(50)

      if Process.alive?(test_pid) do
        send(test_pid, :hello)
      else
        raise "test is dead"
      end
    end)

    # uncomment `for` to solve the issue
    # for pid <- Task.Supervisor.children(MyApp.Task.Supervisor) do
    #   IO.puts("Waiting #{inspect(pid)} to terminate...")
    #   ref = Process.monitor(pid)
    #   assert_receive {:DOWN, ^ref, _, _, _}
    # end

    IO.inspect(:test_end)
  end

  test "started in a DynamicSupervisor" do
    # Simulate DynamicSupervisor in application
    {:ok, pid} = DynamicSupervisor.start_link(name: MyApp.DynamicSupervisor)

    on_exit(fn ->
      # Simulate slow stop to make the issue deterministic
      Process.sleep(100)
      IO.puts("Terminating DynamicSupervisor")
      DynamicSupervisor.stop(pid)
    end)

    Process.unlink(pid)
    # Test start
    on_exit(fn ->
      for {_, pid, _, _} <- DynamicSupervisor.which_children(MyApp.DynamicSupervisor) do
        IO.puts("Waiting #{inspect(pid)} to terminate...")
        ref = Process.monitor(pid)
        assert_receive {:DOWN, ^ref, _, _, _}
      end
    end)

    DynamicSupervisor.start_child(MyApp.DynamicSupervisor, {MyApp.DelayedAction, self()})
    
    # uncomment `for` to solve the issue
    # for {_, pid, _, _} <- DynamicSupervisor.which_children(MyApp.DynamicSupervisor) do
    #   IO.puts("Waiting #{inspect(pid)} to terminate...")
    #   ref = Process.monitor(pid)
    #   assert_receive {:DOWN, ^ref, _, _, _}
    # end

    IO.inspect(:test_end)
  end
end

ExUnit.run()

josevalim · 2025-10-14T07:19:25Z

Thank you @tiagoefmoraes! Can you please double check that you are using start_owner for starting the sandbox? If you use this version, you should be fine without stop_supervised!:

ecto_sql/lib/ecto/adapters/sql/sandbox.ex

Lines 48 to 49 in 8297802

    
                     pid = Ecto.Adapters.SQL.Sandbox.start_owner!(Repo) 
        
                     on_exit(fn -> Ecto.Adapters.SQL.Sandbox.stop_owner(pid) end)

WDYT?

tiagoefmoraes · 2025-10-15T12:55:27Z

@josevalim, thanks for the kind response.

I have this in my DataCase:

pid = Ecto.Adapters.SQL.Sandbox.start_owner!(MyApp.Repo, shared: not tags[:async])
on_exit(fn -> Ecto.Adapters.SQL.Sandbox.stop_owner(pid) end)

And I see how that solves what I described, the connection will be available until on_exit, the last thing to happen.

After revisiting the problem on my real project I assumed the issue was because
my test is async: true and I was allowing the process giving the test pid as the parent (Ecto.Adapters.SQL.Sandbox.allow(MyApp.Repo, test_pid, pid)), but that's not true either, the spawned process can still use the connection even after the test dies because of start_owner!.

Then I realized the original error message had changed to [error] Postgrex.Protocol (#PID<0.8771.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.9028.0> exited and I was still assuming it was caused by the same problem, but this is a totally different error.

This one happens because my GenServer does not trap exits and when the test supervisor sends the exit signal it exits immediately, and when it has a connection checked out the error will happen, calling stop_supervised! only minimized the changes of the issue happening.

The solution was to gracefully stop the server in the tests with:

{:ok, pid} = MyGenServer.start_link([])
Process.unlink(pid)
on_exit(fn -> GenServer.stop(pid) end)

That allows the process to checkin the connection before exiting, and avoids the need to stop the server in each test.

improve Sandbox "owner exited" FAQ

8297802

tiagoefmoraes closed this Oct 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Sandbox "owner exited" FAQ #694

Improve Sandbox "owner exited" FAQ #694

Uh oh!

tiagoefmoraes commented Oct 13, 2025

Uh oh!

josevalim commented Oct 14, 2025 •

edited

Loading

Uh oh!

tiagoefmoraes commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve Sandbox "owner exited" FAQ #694

Improve Sandbox "owner exited" FAQ #694

Uh oh!

Conversation

tiagoefmoraes commented Oct 13, 2025

TLDR

More details

Uh oh!

josevalim commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tiagoefmoraes commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

josevalim commented Oct 14, 2025 •

edited

Loading