Postgres high memory usage by select queries in high number of tables #3286

sezaru · 2020-04-20T14:01:45Z

Hello,

First of all, let me apologize if this is not the best place to ask/report this. I'm seeing this issue as a "follow-up" from my other issue #3284 hence why I'm reporting it here.

So, basically, as in #3284, my Postgres is still increasing it's memory usage until OOM kills it.

Since the insertions doesn't increase the memory usage anymore after the cache_write_statements fix, I'm suspecting of all my select queries (my backend only run one type of insertion query and multiple select queries).

To test that, I disabled my insertion query and only kept the select ones and postgres still keeps increasing it's memory usage until OOM.

Just as a reminder, my backend has more than 12k tables, but they are all very simple, without associations, etc.

For me it seems like a similar issue to #3284, memory in postgres is increasing as I run more and more queries in these tables.

So, do Ecto do some kind of caching with select queries too similar to the ones done to write operations? If so, is it possible to manipulate/disable it?

Thanks for the help!

Environment

Elixir version (elixir -v): 1.10.2
Database and version (PostgreSQL 9.4, MongoDB 3.2, etc.): PostgreSQL 12.2
Ecto version (mix deps): master
Database adapter and version (mix deps): Postgrex 0.15.1
Operating system: 19.04

Current behavior

As backend queries data in database from multiple different tables, memory is increasing but never releasing in Postgres until OOM.

Expected behavior

Postgres use memory to do the query, then releases it.

The text was updated successfully, but these errors were encountered:

josevalim · 2020-04-20T14:04:13Z

How are you querying your tables?

sezaru · 2020-04-20T14:14:17Z

I'm not so sure what you mean by how, so here are all the select queries I use:

@spec exists_by?(Meta.t(), Timeframe.t(), Keyword.t()) :: boolean
def exists_by?(meta, timeframe, timestamp: timestamp) do
  meta
  |> Meta.candle_source(timeframe)
  |> where([c], c.timestamp == ^timestamp)
  |> Meta.repo(meta).exists?()
end

@spec count_range(range, Timeframe.t(), Meta.t()) :: integer
def count_range({from, to}, timeframe, %{repo: repo} = meta) do
  meta
  |> Meta.candle_source(timeframe)
  |> where([c], c.timestamp >= ^from and c.timestamp <= ^to)
  |> select([_], count("*"))
  |> repo.one()
end

@spec get_by(Meta.t(), Timeframe.t(), Keyword.t()) :: Candle.t() | nil
def get_by(%{repo: repo} = meta, timeframe, opts),
  do: meta |> Meta.candle_source(timeframe) |> repo.get_by(opts)

@spec get_closest(DateTime.t(), Timeframe.t(), Meta.t()) :: Candle.t() | nil
def get_closest(timestamp, timeframe, %{repo: repo} = meta) do
  meta
  |> Meta.candle_source(timeframe)
  |> where([c], c.timestamp <= ^timestamp)
  |> order_by([c], desc: c.timestamp)
  |> limit(1)
  |> repo.one()
end

@spec get_last(Timeframe.t(), Meta.t()) :: Candle.t() | nil
def get_last(timeframe, %{repo: repo} = meta) do
  meta
  |> Meta.candle_source(timeframe)
  |> order_by([c], desc: c.timestamp)
  |> limit(1)
  |> repo.one()
end

@spec get_range(range, Timeframe.t(), Meta.t()) :: [Candle.t()]
def get_range({from, to}, timeframe, %{repo: repo} = meta) do
  meta
  |> Meta.candle_source(timeframe)
  |> where([c], c.timestamp >= ^from and c.timestamp <= ^to)
  |> order_by([c], asc: c.timestamp)
  |> repo.all()
end

@spec number_of_trades_from_date_onwards(DateTime.t(), Meta.t()) :: integer
def number_of_trades_from_date_onwards(date, %{repo: repo} = meta) do
  meta
  |> Meta.trade_source()
  |> where([t], t.timestamp >= ^date)
  |> select([t], count(t.id))
  |> repo.one()
end

@spec get_by(Meta.t(), Keyword.t()) :: Trade.t() | nil
def get_by(%{repo: repo} = meta, opts), do: meta |> Meta.trade_source() |> repo.get_by(opts)

@spec get_candle_trades(Candle.t(), Meta.t()) :: [Trade.t()]
def get_candle_trades(%{timestamp: from}, %{repo: repo} = meta) do
  to = DateTime.add(from, 60, :second)

  meta
  |> Meta.trade_source()
  |> where([t], t.timestamp >= ^from and t.timestamp < ^to)
  |> order_by([t], asc: t.id)
  |> repo.all()
end

Meta is a module that returns the repo and table source (since I have multiple tables that have equal fields and only want to use one query to all tables).

And here is my database config:

import Config

config :data_db, DataDb.Exchanges.Binance.Repo,
  database: "data_db_binance_prod",
  username: "postgres",
  password: "postgres",
  hostname: "localhost",
  priv: "priv/repo/binance",
  log: false,
  timeout: 300_000,
  pool_size: 8,
  queue_target: 500,
  cache_write_statements: :per_operation

Hope this is somehow helpful.

sezaru · 2020-04-20T15:11:35Z

Oh, I forgot to mention, I cannot see the 100% the memory usage for some reason, here is an image from htop for example:

And here from free -m:

As you can see, memory usage seems to be around 7GB, and the rest is cache/buffer, but for some reason the system just OOM around that time (in this case it simply doesn't allow me to run anything, it needed a hard-reset).

Also, just to be sure that is not my Elixir backend consuming the memory, I'm running the backend in another computer.

Unfortunately, this time I'm only able to reproduce it when running the full backend, I tried reproduce with a similar script as in #3284 but couldn't make it trigger.

sezaru · 2020-04-21T04:17:29Z

Hey, some updates.

I was doing a lot of tests here to try to understand better the problem. As a consequence, I was able to create a minimal code that triggers the issue, but it only triggers if the database is populated, I tried with an empty one and it does increase the RAM, but not enough to give me an OOM.

So, basically this code triggers it:

Queries.Market.all()
|> Enum.with_index()
|> Enum.chunk_every(4)
|> Enum.map(fn markets ->
  Enum.map(markets, fn {market, index} ->
    Task.async(fn ->
      meta = Meta.new!(market, Repo)

      Enum.map(1..100, fn id ->
        Enum.map(Timeframe.all(), fn timeframe ->
          Queries.Candle.exists_by?(meta, timeframe, timestamp: DateTime.utc_now())
        end)
      end)

      IO.puts("Finished market #{index} #{market.symbol}")
    end)
  end)
  |> Enum.map(& Task.await(&1, :infinity))
end)

So, basically what this code does is run the query exists_by? in each one of the 12000 tables 100 times 4 tables at a time. This code just tries to simulate a part of what is happening in my backend.

From my tests, it doesn't matter what select query I'm using, as long as it "touches" each table a lot of times. In about the middle of the run, it will consume all the memory and OOM.

Also, I created a similar code using Postgrex directly, just to see if only happens with Ecto, and well, using Postgrex directly doesn't trigger the issue, making me more suspicious that is some kind of cache that Ecto uses, here is the code anyway:

{:ok, pid1} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid2} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid3} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid4} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid5} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid6} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid7} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid8} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

pids = [pid1, pid2, pid3, pid4, pid5, pid6, pid7, pid8]

|> Queries.Market.all()
|> Enum.with_index()
|> Enum.chunk_every(4)
|> Enum.map(fn markets ->
  markets
  |> Enum.with_index()
  |> Enum.map(fn {{market, index}, pid_index} ->
    Task.async(fn ->
      Enum.map(1..100, fn id ->
        Enum.map(Timeframe.all(), fn timeframe ->
          [pid] = [Enum.at(pids, pid_index), Enum.at(pids, pid_index + 4)] |> Enum.take_random(1)

          query = "select true from candles_#{timeframe}_#{market} where timestamp = $1 limit 1" 
          {:ok, _} = Postgrex.query(pid, query, [DateTime.utc_now()])
        end)
      end)

      IO.puts("Finished market #{index} #{market.symbol}")
    end)
  end)
  |> Enum.map(& Task.await(&1, :infinity))
end)

I also was able to get some raised errors that Ecto returned me when the OOM happens, it seems they are just standard Postgres errors, but maybe they can help somehow:

22:34:26.364 [error] #PID<0.836.0> 
↳ Task #PID<0.453.0> started from #PID<0.836.0> terminating
** (Postgrex.Error) ERROR 53200 (out_of_memory) out of memory

Failed on request of size 284 in memory context "CacheMemoryContext".
    (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:601: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:534: Ecto.Adapters.SQL.execute/5
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:95: Ecto.Repo.Queryable.exists?/3
    (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
    (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
    (elixir 1.10.2) lib/enum.ex:1400: anonymous fn/3 in Enum.map/2
Function: #Function<21.126501267/0 in :erl_eval.expr/5>
    Args: []
Bug Bug!! ** (EXIT from #PID<0.453.0>) shell process exited with reason: an exception was raised:
    ** (Postgrex.Error) ERROR 53200 (out_of_memory) out of memory

Failed on request of size 284 in memory context "CacheMemoryContext".
        (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:601: Ecto.Adapters.SQL.raise_sql_call_error/1
        (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:534: Ecto.Adapters.SQL.execute/5
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:95: Ecto.Repo.Queryable.exists?/3
        (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
        (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
        (elixir 1.10.2) lib/enum.ex:1400: anonymous fn/3 in Enum.map/2

00:26:45.389 [error] #PID<0.1322.0> 
↳ Task #PID<0.915.0> started from #PID<0.1322.0> terminating
** (Postgrex.Error) ERROR 53200 (out_of_memory) out of memory

Failed on request of size 128 in memory context "ExecutorState".
    (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:601: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:534: Ecto.Adapters.SQL.execute/5
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:95: Ecto.Repo.Queryable.exists?/3
    (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
    (elixir 1.10.2) lib/enum.ex:1400: anonymous fn/3 in Enum.map/2
    (elixir 1.10.2) lib/enum.ex:3371: Enum.reduce_range_inc/4
Function: #Function<21.126501267/0 in :erl_eval.expr/5>
    Args: [] 
Bug Bug!! ** (EXIT from #PID<0.915.0>) shell process exited with reason: an exception was raised:
    ** (Postgrex.Error) ERROR 53200 (out_of_memory) out of memory

Failed on request of size 128 in memory context "ExecutorState".
        (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:601: Ecto.Adapters.SQL.raise_sql_call_error/1
        (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:534: Ecto.Adapters.SQL.execute/5
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:95: Ecto.Repo.Queryable.exists?/3
        (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
        (elixir 1.10.2) lib/enum.ex:1400: anonymous fn/3 in Enum.map/2
        (elixir 1.10.2) lib/enum.ex:3371: Enum.reduce_range_inc/4

I hope this can help with some hint on what the issue is.

Feel free to ask me any more needed information.

josevalim · 2020-04-21T07:41:14Z

Oh, if you can reproduce it with only Postgrex and without cached statements, then the issue is not the statement cache. Honestly, this doesn't seem to be caused by Elixir/Postgrex at all and it seems to be either a PostgreSQL issue. So I would consider investigating the PostgreSQL side of things instead.

sezaru · 2020-04-21T12:07:11Z

Ah, no, you got it inverted, I said I cannot reproduce it with Postgrex, only with the Ecto version.

josevalim · 2020-04-21T12:45:02Z

Oh, ok. So do this in your repo configuration:

config :my_app, MyApp.Repo,
  prepare: :unnamed

This should stop all prepared queries for being cached. Pls let me know how it goes.

sezaru · 2020-04-21T14:57:18Z

That fixed it! Now in my tests the connections will not grow in memory until OOM.

Thank you so much for the help!

As for the option, my guess is that this disables some optimizations since it disables the use of postgre's PREPARE right?

I will test this in my full backend to see how it behaves there (I tested with the above example code), but it should work fine, so I think we can close the issue :)

josevalim · 2020-04-21T16:58:52Z

We will still prepare queries, but they won't be cached.

sezaru changed the title ~~Poesgres high usage memory by select queries in high number of tables~~ Postgres high memory usage by select queries in high number of tables Apr 20, 2020

josevalim closed this as completed Apr 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres high memory usage by select queries in high number of tables #3286

Postgres high memory usage by select queries in high number of tables #3286

sezaru commented Apr 20, 2020

josevalim commented Apr 20, 2020

sezaru commented Apr 20, 2020

sezaru commented Apr 20, 2020

sezaru commented Apr 21, 2020

josevalim commented Apr 21, 2020

sezaru commented Apr 21, 2020

josevalim commented Apr 21, 2020

sezaru commented Apr 21, 2020

josevalim commented Apr 21, 2020

Postgres high memory usage by select queries in high number of tables #3286

Postgres high memory usage by select queries in high number of tables #3286

Comments

sezaru commented Apr 20, 2020

Environment

Current behavior

Expected behavior

josevalim commented Apr 20, 2020

sezaru commented Apr 20, 2020

sezaru commented Apr 20, 2020

sezaru commented Apr 21, 2020

josevalim commented Apr 21, 2020

sezaru commented Apr 21, 2020

josevalim commented Apr 21, 2020

sezaru commented Apr 21, 2020

josevalim commented Apr 21, 2020