Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres high memory usage by select queries in high number of tables #3286

Closed
sezaru opened this issue Apr 20, 2020 · 9 comments
Closed

Postgres high memory usage by select queries in high number of tables #3286

sezaru opened this issue Apr 20, 2020 · 9 comments

Comments

@sezaru
Copy link

sezaru commented Apr 20, 2020

Hello,

First of all, let me apologize if this is not the best place to ask/report this. I'm seeing this issue as a "follow-up" from my other issue #3284 hence why I'm reporting it here.

So, basically, as in #3284, my Postgres is still increasing it's memory usage until OOM kills it.

Since the insertions doesn't increase the memory usage anymore after the cache_write_statements fix, I'm suspecting of all my select queries (my backend only run one type of insertion query and multiple select queries).

To test that, I disabled my insertion query and only kept the select ones and postgres still keeps increasing it's memory usage until OOM.

Just as a reminder, my backend has more than 12k tables, but they are all very simple, without associations, etc.

For me it seems like a similar issue to #3284, memory in postgres is increasing as I run more and more queries in these tables.

So, do Ecto do some kind of caching with select queries too similar to the ones done to write operations? If so, is it possible to manipulate/disable it?

Thanks for the help!

Environment

  • Elixir version (elixir -v): 1.10.2
  • Database and version (PostgreSQL 9.4, MongoDB 3.2, etc.): PostgreSQL 12.2
  • Ecto version (mix deps): master
  • Database adapter and version (mix deps): Postgrex 0.15.1
  • Operating system: 19.04

Current behavior

As backend queries data in database from multiple different tables, memory is increasing but never releasing in Postgres until OOM.

Expected behavior

Postgres use memory to do the query, then releases it.

@josevalim
Copy link
Member

How are you querying your tables?

@sezaru
Copy link
Author

sezaru commented Apr 20, 2020

I'm not so sure what you mean by how, so here are all the select queries I use:

@spec exists_by?(Meta.t(), Timeframe.t(), Keyword.t()) :: boolean
def exists_by?(meta, timeframe, timestamp: timestamp) do
  meta
  |> Meta.candle_source(timeframe)
  |> where([c], c.timestamp == ^timestamp)
  |> Meta.repo(meta).exists?()
end

@spec count_range(range, Timeframe.t(), Meta.t()) :: integer
def count_range({from, to}, timeframe, %{repo: repo} = meta) do
  meta
  |> Meta.candle_source(timeframe)
  |> where([c], c.timestamp >= ^from and c.timestamp <= ^to)
  |> select([_], count("*"))
  |> repo.one()
end

@spec get_by(Meta.t(), Timeframe.t(), Keyword.t()) :: Candle.t() | nil
def get_by(%{repo: repo} = meta, timeframe, opts),
  do: meta |> Meta.candle_source(timeframe) |> repo.get_by(opts)

@spec get_closest(DateTime.t(), Timeframe.t(), Meta.t()) :: Candle.t() | nil
def get_closest(timestamp, timeframe, %{repo: repo} = meta) do
  meta
  |> Meta.candle_source(timeframe)
  |> where([c], c.timestamp <= ^timestamp)
  |> order_by([c], desc: c.timestamp)
  |> limit(1)
  |> repo.one()
end

@spec get_last(Timeframe.t(), Meta.t()) :: Candle.t() | nil
def get_last(timeframe, %{repo: repo} = meta) do
  meta
  |> Meta.candle_source(timeframe)
  |> order_by([c], desc: c.timestamp)
  |> limit(1)
  |> repo.one()
end

@spec get_range(range, Timeframe.t(), Meta.t()) :: [Candle.t()]
def get_range({from, to}, timeframe, %{repo: repo} = meta) do
  meta
  |> Meta.candle_source(timeframe)
  |> where([c], c.timestamp >= ^from and c.timestamp <= ^to)
  |> order_by([c], asc: c.timestamp)
  |> repo.all()
end

@spec number_of_trades_from_date_onwards(DateTime.t(), Meta.t()) :: integer
def number_of_trades_from_date_onwards(date, %{repo: repo} = meta) do
  meta
  |> Meta.trade_source()
  |> where([t], t.timestamp >= ^date)
  |> select([t], count(t.id))
  |> repo.one()
end

@spec get_by(Meta.t(), Keyword.t()) :: Trade.t() | nil
def get_by(%{repo: repo} = meta, opts), do: meta |> Meta.trade_source() |> repo.get_by(opts)

@spec get_candle_trades(Candle.t(), Meta.t()) :: [Trade.t()]
def get_candle_trades(%{timestamp: from}, %{repo: repo} = meta) do
  to = DateTime.add(from, 60, :second)

  meta
  |> Meta.trade_source()
  |> where([t], t.timestamp >= ^from and t.timestamp < ^to)
  |> order_by([t], asc: t.id)
  |> repo.all()
end

Meta is a module that returns the repo and table source (since I have multiple tables that have equal fields and only want to use one query to all tables).

And here is my database config:

import Config

config :data_db, DataDb.Exchanges.Binance.Repo,
  database: "data_db_binance_prod",
  username: "postgres",
  password: "postgres",
  hostname: "localhost",
  priv: "priv/repo/binance",
  log: false,
  timeout: 300_000,
  pool_size: 8,
  queue_target: 500,
  cache_write_statements: :per_operation

Hope this is somehow helpful.

@sezaru sezaru changed the title Poesgres high usage memory by select queries in high number of tables Postgres high memory usage by select queries in high number of tables Apr 20, 2020
@sezaru
Copy link
Author

sezaru commented Apr 20, 2020

Oh, I forgot to mention, I cannot see the 100% the memory usage for some reason, here is an image from htop for example:
image

And here from free -m:
image

As you can see, memory usage seems to be around 7GB, and the rest is cache/buffer, but for some reason the system just OOM around that time (in this case it simply doesn't allow me to run anything, it needed a hard-reset).

Also, just to be sure that is not my Elixir backend consuming the memory, I'm running the backend in another computer.

Unfortunately, this time I'm only able to reproduce it when running the full backend, I tried reproduce with a similar script as in #3284 but couldn't make it trigger.

@sezaru
Copy link
Author

sezaru commented Apr 21, 2020

Hey, some updates.

I was doing a lot of tests here to try to understand better the problem. As a consequence, I was able to create a minimal code that triggers the issue, but it only triggers if the database is populated, I tried with an empty one and it does increase the RAM, but not enough to give me an OOM.

So, basically this code triggers it:

Queries.Market.all()
|> Enum.with_index()
|> Enum.chunk_every(4)
|> Enum.map(fn markets ->
  Enum.map(markets, fn {market, index} ->
    Task.async(fn ->
      meta = Meta.new!(market, Repo)

      Enum.map(1..100, fn id ->
        Enum.map(Timeframe.all(), fn timeframe ->
          Queries.Candle.exists_by?(meta, timeframe, timestamp: DateTime.utc_now())
        end)
      end)

      IO.puts("Finished market #{index} #{market.symbol}")
    end)
  end)
  |> Enum.map(& Task.await(&1, :infinity))
end)

So, basically what this code does is run the query exists_by? in each one of the 12000 tables 100 times 4 tables at a time. This code just tries to simulate a part of what is happening in my backend.

From my tests, it doesn't matter what select query I'm using, as long as it "touches" each table a lot of times. In about the middle of the run, it will consume all the memory and OOM.

Also, I created a similar code using Postgrex directly, just to see if only happens with Ecto, and well, using Postgrex directly doesn't trigger the issue, making me more suspicious that is some kind of cache that Ecto uses, here is the code anyway:

{:ok, pid1} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid2} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid3} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid4} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid5} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid6} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid7} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

{:ok, pid8} = Postgrex.start_link(
  hostname: "192.168.0.43",
  username: "postgres",
  password: "postgres",
  database: "data_db_prod")

pids = [pid1, pid2, pid3, pid4, pid5, pid6, pid7, pid8]

|> Queries.Market.all()
|> Enum.with_index()
|> Enum.chunk_every(4)
|> Enum.map(fn markets ->
  markets
  |> Enum.with_index()
  |> Enum.map(fn {{market, index}, pid_index} ->
    Task.async(fn ->
      Enum.map(1..100, fn id ->
        Enum.map(Timeframe.all(), fn timeframe ->
          [pid] = [Enum.at(pids, pid_index), Enum.at(pids, pid_index + 4)] |> Enum.take_random(1)

          query = "select true from candles_#{timeframe}_#{market} where timestamp = $1 limit 1" 
          {:ok, _} = Postgrex.query(pid, query, [DateTime.utc_now()])
        end)
      end)

      IO.puts("Finished market #{index} #{market.symbol}")
    end)
  end)
  |> Enum.map(& Task.await(&1, :infinity))
end)

I also was able to get some raised errors that Ecto returned me when the OOM happens, it seems they are just standard Postgres errors, but maybe they can help somehow:

22:34:26.364 [error] #PID<0.836.0> 
↳ Task #PID<0.453.0> started from #PID<0.836.0> terminating
** (Postgrex.Error) ERROR 53200 (out_of_memory) out of memory

Failed on request of size 284 in memory context "CacheMemoryContext".
    (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:601: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:534: Ecto.Adapters.SQL.execute/5
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:95: Ecto.Repo.Queryable.exists?/3
    (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
    (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
    (elixir 1.10.2) lib/enum.ex:1400: anonymous fn/3 in Enum.map/2
Function: #Function<21.126501267/0 in :erl_eval.expr/5>
    Args: []
Bug Bug!! ** (EXIT from #PID<0.453.0>) shell process exited with reason: an exception was raised:
    ** (Postgrex.Error) ERROR 53200 (out_of_memory) out of memory

Failed on request of size 284 in memory context "CacheMemoryContext".
        (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:601: Ecto.Adapters.SQL.raise_sql_call_error/1
        (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:534: Ecto.Adapters.SQL.execute/5
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:95: Ecto.Repo.Queryable.exists?/3
        (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
        (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
        (elixir 1.10.2) lib/enum.ex:1400: anonymous fn/3 in Enum.map/2
00:26:45.389 [error] #PID<0.1322.0> 
↳ Task #PID<0.915.0> started from #PID<0.1322.0> terminating
** (Postgrex.Error) ERROR 53200 (out_of_memory) out of memory

Failed on request of size 128 in memory context "ExecutorState".
    (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:601: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:534: Ecto.Adapters.SQL.execute/5
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
    (ecto 3.4.3) lib/ecto/repo/queryable.ex:95: Ecto.Repo.Queryable.exists?/3
    (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
    (elixir 1.10.2) lib/enum.ex:1400: anonymous fn/3 in Enum.map/2
    (elixir 1.10.2) lib/enum.ex:3371: Enum.reduce_range_inc/4
Function: #Function<21.126501267/0 in :erl_eval.expr/5>
    Args: [] 
Bug Bug!! ** (EXIT from #PID<0.915.0>) shell process exited with reason: an exception was raised:
    ** (Postgrex.Error) ERROR 53200 (out_of_memory) out of memory

Failed on request of size 128 in memory context "ExecutorState".
        (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:601: Ecto.Adapters.SQL.raise_sql_call_error/1
        (ecto_sql 3.4.3) lib/ecto/adapters/sql.ex:534: Ecto.Adapters.SQL.execute/5
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
        (ecto 3.4.3) lib/ecto/repo/queryable.ex:95: Ecto.Repo.Queryable.exists?/3
        (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
        (elixir 1.10.2) lib/enum.ex:1400: anonymous fn/3 in Enum.map/2
        (elixir 1.10.2) lib/enum.ex:3371: Enum.reduce_range_inc/4

I hope this can help with some hint on what the issue is.

Feel free to ask me any more needed information.

@josevalim
Copy link
Member

Oh, if you can reproduce it with only Postgrex and without cached statements, then the issue is not the statement cache. Honestly, this doesn't seem to be caused by Elixir/Postgrex at all and it seems to be either a PostgreSQL issue. So I would consider investigating the PostgreSQL side of things instead.

@sezaru
Copy link
Author

sezaru commented Apr 21, 2020

Ah, no, you got it inverted, I said I cannot reproduce it with Postgrex, only with the Ecto version.

@josevalim
Copy link
Member

Oh, ok. So do this in your repo configuration:

config :my_app, MyApp.Repo,
  prepare: :unnamed

This should stop all prepared queries for being cached. Pls let me know how it goes.

@sezaru
Copy link
Author

sezaru commented Apr 21, 2020

That fixed it! Now in my tests the connections will not grow in memory until OOM.

Thank you so much for the help!

As for the option, my guess is that this disables some optimizations since it disables the use of postgre's PREPARE right?

I will test this in my full backend to see how it behaves there (I tested with the above example code), but it should work fine, so I think we can close the issue :)

@josevalim
Copy link
Member

We will still prepare queries, but they won't be cached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants