Skip to content

Conversation

tiagoefmoraes
Copy link

TLDR

Started process need to be terminated while the test process is still running to avoid "owner exited" errors.

More details

I was still running into "owner exited" errors even after following the FAQ's instructions after some investigations I found that automatic start_supervised!/2 process termination and on_exit/1 callback execution happen after the test process has terminated, this creates a (small) window of time where the test is already dead and the processes are still running.

ExUnit's docs state that processes started with start_supervised!/2 are "guaranteed to exit before the next test starts" and that on_exit/2 callbacks are run "once the test exits".

To completly avoid these errors we need to stop the started processes in the test process (maybe we're missing a before_exit/1 callback in ExUnit).

The issues can be simulated with the following script, to check the suggested fixes uncomment the corresponding code in each test:

Expand Script
defmodule MyApp.SlowTerminate do
  use GenServer

  def start_link(opts), do: GenServer.start_link(__MODULE__, opts)

  @impl GenServer
  def init(opts) do
    Process.flag(:trap_exit, true)
    {:ok, opts}
  end

  @impl GenServer
  def terminate(reason, _state) do
    IO.puts("Terminating #{__MODULE__}: #{reason}...")
    Process.sleep(100)
    IO.puts("Terminated #{__MODULE__}: #{reason}.")
    :ok
  end
end

defmodule MyApp.DelayedAction do
  use GenServer, restart: :temporary

  def start_link(test_pid), do: GenServer.start_link(__MODULE__, test_pid)

  @impl GenServer
  def init(test_pid) do
    Process.flag(:trap_exit, true)
    Process.send_after(self(), :work, 50)
    {:ok, test_pid}
  end

  @impl GenServer
  def handle_info(:work, test_pid) do
    IO.puts("Working")

    if Process.alive?(test_pid) do
      send(test_pid, :hello)

      # {:noreply, test_pid}
      {:stop, :normal, test_pid}
    else
      raise "test is dead"
    end
  end

  @impl GenServer
  def terminate(reason, _state) do
    IO.puts("Terminated #{__MODULE__}: #{inspect(reason)}.")
    :ok
  end
end

ExUnit.start(autorun: false)

defmodule MyApp.PeriodicallyTest do
  use ExUnit.Case, async: true

  test "started in test" do
    start_supervised!({MyApp.DelayedAction, self()})
    # Simulate slow terminate to make the issue deterministic
    start_supervised!(MyApp.SlowTerminate)
    # uncomment stop_supervised!/1 to solve the issue
    # stop_supervised!(MyApp.DelayedAction)
    IO.inspect(:test_end)
  end

  test "started in a Task.Supervisor" do
    # Simulate Task.Supervisor in application
    {:ok, pid} = Task.Supervisor.start_link(name: MyApp.Task.Supervisor)

    on_exit(fn ->
      IO.puts("Terminating Task.Supervisor")
      Process.exit(pid, :kill)
    end)

    Process.unlink(pid)
    # Test start
    on_exit(fn ->
      for pid <- Task.Supervisor.children(MyApp.Task.Supervisor) do
        IO.puts("Waiting #{inspect(pid)} to terminate...")
        ref = Process.monitor(pid)
        assert_receive {:DOWN, ^ref, _, _, _}
      end
    end)

    test_pid = self()

    Task.Supervisor.start_child(MyApp.Task.Supervisor, fn ->
      # Simulate slow processing to make issue deterministic
      Process.sleep(50)

      if Process.alive?(test_pid) do
        send(test_pid, :hello)
      else
        raise "test is dead"
      end
    end)

    # uncomment `for` to solve the issue
    # for pid <- Task.Supervisor.children(MyApp.Task.Supervisor) do
    #   IO.puts("Waiting #{inspect(pid)} to terminate...")
    #   ref = Process.monitor(pid)
    #   assert_receive {:DOWN, ^ref, _, _, _}
    # end

    IO.inspect(:test_end)
  end

  test "started in a DynamicSupervisor" do
    # Simulate DynamicSupervisor in application
    {:ok, pid} = DynamicSupervisor.start_link(name: MyApp.DynamicSupervisor)

    on_exit(fn ->
      # Simulate slow stop to make the issue deterministic
      Process.sleep(100)
      IO.puts("Terminating DynamicSupervisor")
      DynamicSupervisor.stop(pid)
    end)

    Process.unlink(pid)
    # Test start
    on_exit(fn ->
      for {_, pid, _, _} <- DynamicSupervisor.which_children(MyApp.DynamicSupervisor) do
        IO.puts("Waiting #{inspect(pid)} to terminate...")
        ref = Process.monitor(pid)
        assert_receive {:DOWN, ^ref, _, _, _}
      end
    end)

    DynamicSupervisor.start_child(MyApp.DynamicSupervisor, {MyApp.DelayedAction, self()})
    
    # uncomment `for` to solve the issue
    # for {_, pid, _, _} <- DynamicSupervisor.which_children(MyApp.DynamicSupervisor) do
    #   IO.puts("Waiting #{inspect(pid)} to terminate...")
    #   ref = Process.monitor(pid)
    #   assert_receive {:DOWN, ^ref, _, _, _}
    # end

    IO.inspect(:test_end)
  end
end

ExUnit.run()

@josevalim
Copy link
Member

josevalim commented Oct 14, 2025

Thank you @tiagoefmoraes! Can you please double check that you are using start_owner for starting the sandbox? If you use this version, you should be fine without stop_supervised!:

pid = Ecto.Adapters.SQL.Sandbox.start_owner!(Repo)
on_exit(fn -> Ecto.Adapters.SQL.Sandbox.stop_owner(pid) end)

WDYT?

@tiagoefmoraes
Copy link
Author

@josevalim, thanks for the kind response.

I have this in my DataCase:

pid = Ecto.Adapters.SQL.Sandbox.start_owner!(MyApp.Repo, shared: not tags[:async])
on_exit(fn -> Ecto.Adapters.SQL.Sandbox.stop_owner(pid) end)

And I see how that solves what I described, the connection will be available until on_exit, the last thing to happen.

After revisiting the problem on my real project I assumed the issue was because
my test is async: true and I was allowing the process giving the test pid as the parent (Ecto.Adapters.SQL.Sandbox.allow(MyApp.Repo, test_pid, pid)), but that's not true either, the spawned process can still use the connection even after the test dies because of start_owner!.

Then I realized the original error message had changed to [error] Postgrex.Protocol (#PID<0.8771.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.9028.0> exited and I was still assuming it was caused by the same problem, but this is a totally different error.

This one happens because my GenServer does not trap exits and when the test supervisor sends the exit signal it exits immediately, and when it has a connection checked out the error will happen, calling stop_supervised! only minimized the changes of the issue happening.

The solution was to gracefully stop the server in the tests with:

{:ok, pid} = MyGenServer.start_link([])
Process.unlink(pid)
on_exit(fn -> GenServer.stop(pid) end)

That allows the process to checkin the connection before exiting, and avoids the need to stop the server in each test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants