Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres memory leak/high usage triggered by Ecto connection pool #3284

Closed
sezaru opened this issue Apr 17, 2020 · 5 comments
Closed

Postgres memory leak/high usage triggered by Ecto connection pool #3284

sezaru opened this issue Apr 17, 2020 · 5 comments

Comments

@sezaru
Copy link

sezaru commented Apr 17, 2020

Hello,

I'm having an issue where I get a very high memory usage until I get OOM regardless of how much memory I have (I have 64GB btw).

I first thought it was a Postgres issue, but I can trigger it with MySQL and at the same time I cannot trigger it if using identical queries with a python implementation or an Elixir implementation using Postgrex directly.

It only happens when using Ecto connection pool, even if I limit the pool to 1 connection only.

Environment

  • Elixir version (elixir -v): 1.10.2
  • Database and version (PostgreSQL 9.4, MongoDB 3.2, etc.): PostgreSQL 12.2 (but I tested with other versions too and with and without docker).
  • Ecto version (mix deps): 3.3.4
  • Database adapter and version (mix deps): Postgrex 0.15.3
  • Operating system: Ubuntu 19.04 and Gentoo

Current behavior

So, the issue happens when I insert data in different tables, in my case, I have more than 12k tables. Each time I do a new insert query to a new table (new in the sense that I didn't touch it yet with the current connection) it will increase the memory usage and never release it (only when I close the connection).

To help diagnose it, I created some self-contained Elixir scripts that reproduce the issue, you only need to run it with Postgres installed and with the Ecto and Postgrex dependency too obviously. Also, you need to create the database with create database test_dev; via psql or other Postgres client.

The first script creates the tables:

tables = 1..800 |> Enum.map(fn i -> "table_#{i}" end)

{:ok, pid} =
  Postgrex.start_link(
    hostname: "localhost",
    username: "postgres",
    password: "postgres",
    database: "test_dev"
  )

Enum.each(tables, fn table ->
  query = """
  create table #{table} (
  id integer not null,
  value1 float not null,
  value2 float not null,
  timestamp timestamp not null
  )
  """
  
  Postgrex.query!(pid, query, [])
end)

The second script triggers the issue:

defmodule Repo do
  use Ecto.Repo,
    otp_app: :db,
    adapter: Ecto.Adapters.Postgres
end

defmodule Table do
  alias Ecto.{Changeset, Schema}

  use Schema

  @primary_key false

  schema "table" do
    field :id, :integer

    field :timestamp, :utc_datetime_usec

    field :value1, :float
    field :value2, :float
  end
end

steps = 16_383

tables = 1..800 |> Enum.map(fn i -> "table_#{i}" end)

{:ok, repo_pid} = Repo.start_link(
  name: :db,
  database: "test_dev",
  username: "postgres",
  password: "postgres",
  hostname: "localhost",
  timeout: 300_000,
  pool_size: 1,
  log: false)

(
Stream.iterate(1, fn id -> id + 1 end)
|> Stream.chunk_every(steps)
|> Stream.map(fn ids ->
  IO.puts("ids: #{hd(ids)}")

  Enum.map(ids, fn id ->
    %{id: id, value1: 1.1, value2: 1.1, timestamp: DateTime.utc_now()}
  end)
end)
|> Stream.map(fn values ->
  [table] = Enum.take_random(tables, 1)
  Ecto.Repo.Schema.insert_all(Repo, repo_pid, {table, Table}, values, [])
end)
|> Enum.to_list()
)

The third script runs an equivalent code as the second one but using Postgrex directly, it doesn't trigger the issue:

steps = 16_383

tables = 1..800 |> Enum.map(fn i -> "table_#{i}" end)

{:ok, pid} = Postgrex.start_link(
  hostname: "localhost",
  username: "postgres",
  password: "postgres",
  database: "test_dev")

(
Stream.iterate(1, fn id -> id + 1 end)
|> Stream.chunk_every(steps)
|> Stream.map(fn ids ->
  IO.puts("ids: #{hd(ids)}")
  
  ids
  |> Enum.map(fn id -> "(#{id}, 1.1, 1.1, now())" end)
  |> Enum.join(", ")
end)
|> Stream.map(fn values ->
  [table] = Enum.take_random(tables, 1)

  query = "insert into #{table} (id, value1, value2, timestamp) values #{values}"
  
  {:ok, _} = Postgrex.query(pid, query, [])
end)
|> Enum.to_list()
)

As you can see from the code, basically what it does is create a lot of rows to be inserted into one of the 800 tables created, then it selects one of these tables at random and does the query. The scripts will run infinitely until the user stops it or it runs out of memory (in the case of the second script).

To see the memory usage, you can simply use htop and filter to only show postgres processes, that way you can see the connection elixir will do with the postgres server and how the memory increases.

Also, for completeness, I ran a little python script called psrecord that logs the memory usage of a process, these are the results I got (note that I stopped it when it reached 2 GB usage).

Running the second script:

# Elapsed time   CPU (%)     Real (MB)   Virtual (MB)
       0.000        0.000      387.754     1746.008
       5.045        0.000      387.754     1746.008
      10.093        0.000      543.008     2108.859
      15.149        0.000      725.574     2309.859
      20.205        0.000      930.910     2544.359
      25.271        0.000     1111.582     2745.359
      30.328        0.000     1287.168     2945.863
      35.394        0.000     1474.262     3162.113
      40.452        0.000     1635.703     3347.113
      45.511        0.000     1806.688     3547.863
      50.579        0.000     1923.895     3681.613
      55.641        0.000     2083.008     3864.367

Running the third script:

# Elapsed time   CPU (%)     Real (MB)   Virtual (MB)
       0.000        0.000      358.648     1534.094
      15.138        0.000      423.621     1746.820
      30.303        0.000      497.297     1779.320
      45.477        0.000      540.789     1794.883
      60.629        0.000      542.285     1794.883
      75.782        0.000      533.785     1778.883
      90.943        0.000      534.137     1778.883
     106.101        0.000      534.145     1778.883
     121.251        0.000      534.148     1778.883
     136.407        0.000      547.562     1794.883
     151.573        0.000      534.168     1778.883
     166.726        0.000      534.172     1778.883

Thanks for the help!

@josevalim
Copy link
Member

This is caused by our query cache which is... surprise, surprise, per table! Given you have so many tables, you may want to disable it. I will work on something.

@josevalim
Copy link
Member

Set cache_statement: :per_operation in Ecto.SQL master and your problems should be solved.

@sezaru
Copy link
Author

sezaru commented Apr 30, 2020

Hey Jose, I just saw that in commit 377fdbacbe3170039bfc1806881f9650c16251b6 the changes of cache_write_statements was reverted.

Why was that? Without being able to set cache_write_statements to :per_operations I'm back to having my Postgres connections consume all my RAM and crashing with OOM :(

@josevalim
Copy link
Member

Hi @sezaru! That's because prepare: :unnamed also disables cache_write_statements altogether. So I thought that, if you are using prepare: :unnamed, you don't need cache_write_statement. Did I misunderstand it? :)

@sezaru
Copy link
Author

sezaru commented Apr 30, 2020

Oh, indeed! You are absolutely right.

Sorry, I didn't realise that..

zachdaniel pushed a commit to zachdaniel/ecto_sql that referenced this issue Sep 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants