Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twemproxy going OOM #553

Open
auror opened this issue Apr 4, 2018 · 16 comments
Open

Twemproxy going OOM #553

auror opened this issue Apr 4, 2018 · 16 comments

Comments

@auror
Copy link

auror commented Apr 4, 2018

Hi,

We're running twemproxy behind set of memcached servers. Below's the obfuscated config

some_pool:
  listen: 127.0.0.1:11213
  hash: crc16
  distribution: ketama
  backlog: 4000
  timeout: 30
  preconnect: true
  server_connections: 200
  auto_eject_hosts: false
  server_retry_timeout: 2000
  servers:

   - some_host_port:1 k1
   - some_host_port:1 k2
   - some_host_port:1 k3
   - some_host_port:1 k4
   - some_host_port:1 k5
   - some_host_port:1 k6
   - some_host_port:1 k7
   - some_host_port:1 k8
   - some_host_port:1 k9
   - some_host_port:1 k10
   - some_host_port:1 k11
   - some_host_port:1 k12
  1. Clients connected to Twemproxy issue a single GET command
  2. The throughput is constant and is ~ 2.2K QPS
  3. No: of clients connected to twemproxy is fairly low number < 100
  4. No: of items in In Queue(and in_queue_bytes) is very less

At some point twemproxy started using very large amount of memory. Also saw the connection timeouts around the same time: close s: Connection timed out in debug logs

Memory consumption grows high until Kernel kills the nutcracker instance

@charsyam
Copy link
Contributor

charsyam commented Apr 4, 2018

@auror Do you use pipeline mode with client?

@auror
Copy link
Author

auror commented Apr 4, 2018

@charsyam No, we aren't.

@charsyam
Copy link
Contributor

charsyam commented Apr 4, 2018

what is common data size of item for get command?

@auror
Copy link
Author

auror commented Apr 4, 2018

Key size is 12 bytes and value could be between 30 - 1024 bytes

@auror
Copy link
Author

auror commented Apr 5, 2018

@charsyam

Observation:

I observed high values in Recv Q(from netstat) of sockets made to twemproxy from application when twemproxy's RSS memory started growing. Wanted to see if application restart would bring memory down. But it stayed constant

screen shot 2018-04-05 at 4 11 34 pm

  1. Usual memory consumption: 10 - 15 MB
  2. Memory Used started going high at 15:50
  3. Application restarted at 15:55

@charsyam
Copy link
Contributor

charsyam commented Apr 5, 2018

@auror Thanks.
1] How much memory do you use in your server?
2] Could you show your running option with twemproxy? Did you set mbuf-size?(default mbuf-size is 16384)
for ex) ./src/nutcracker -c conf.yml
3] what is your client for twemproxy?
4] was there some other issues at 15:50? other client?

I think 2.2K QPS * 16k mbuf is not too big.
I doubt there are some other issues, for example some other client send message with pipeline style

twemproxy expect client send requests sequentially. so that client doesn't wait response. it can exhaust all memory.

@auror
Copy link
Author

auror commented Apr 5, 2018

  1. Machine's configured RAM is 32 GB. Total usage would be close to 20 GB
  2. Command: /usr/local/sbin/nutcracker -d -o /var/log/twemproxy/error_11213.log -c conf/nutcracker_11213.yml -s 21213 --pid-file /var/run/nut_11213.pid
  3. Java client
  4. Didn't observe anything else, apart from high memory consumption

Our application is the only one connected to this twemproxy instance and afaik xmemcached doesn't pipeline requests internally

@charsyam
Copy link
Contributor

charsyam commented Apr 5, 2018

@auror oh, I just thought your backend is redis, but it's memcached.
another question. in your server, is there only twemproxy only? or memcached also ran?

@auror
Copy link
Author

auror commented Apr 5, 2018

It's actually Kyoto Tycoon which also serves requests from clients who speak Memcached Protocol. These Kyoto instances run on remote hosts

@charsyam
Copy link
Contributor

charsyam commented Apr 5, 2018

@auror Did you see any other performance down in Kyoto Tycoon?
if it returns lately, it is possible.

@auror
Copy link
Author

auror commented Apr 5, 2018

@charsyam performance down as in High Response times?

Yes. There is a little degradation in Kyoto's performance recently. It's significant(but not very high) at 90th percentile. Does that impact twemproxy?

But, there's a 30 ms timeout configured at twemproxy. Doesn't that help? Can you please elaborate?

@charsyam
Copy link
Contributor

charsyam commented Apr 5, 2018

Actually, twemproxy can send only one request with one connection. so backend server returns response lately, twemproxy has to keep other requests in its own buffer. so, it will use much memory to keep client's requets.

but I don't know about How Kyoto Tycoon works.

so, it is just guessing.

@auror
Copy link
Author

auror commented Apr 5, 2018

@charsyam Thanks

So, when does twemproxy discard the outstanding buffered requests? So if one of the hosts starts becoming unresponsive or timing out, is ejecting them the only way?

@auror
Copy link
Author

auror commented Apr 16, 2018

@charsyam

Can you please have a look @ my question ^^^. Also, we've added more capacity and the performance has come back to normal. But yet, we're still observing high memory consumption and eventual death of twemproxy process(but, not so quite often as earlier)

@charsyam
Copy link
Contributor

@auror sorry to reply late.
Twemproxy will close connection when server send incorrect reply.
and drop the buffer for it.
and sometimes later, it will try to add server again to use.

@auror
Copy link
Author

auror commented Apr 16, 2018

Np. So, ho do we debug this? Do you need any additional info? Any clue/direction would help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants