Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks OOM #2062

Open
carl-mastrangelo opened this issue Jul 18, 2016 · 3 comments
Open

Benchmarks OOM #2062

carl-mastrangelo opened this issue Jul 18, 2016 · 3 comments
Milestone

Comments

@carl-mastrangelo
Copy link
Contributor

Running FlowControlledMessagePerSecond benchmark with 4 forks creates out of memory exceptions.

@carl-mastrangelo
Copy link
Contributor Author

Some thoughts:

  1. This shouldn't be possible, but here we are. The benchmark that causes this OOM to happen is the 4 channel case with 100 concurrent streams, both direct and default. Message size is 10 bytes, number of outstanding requests is 10, and maxDirectMemory is 2G. 100 * 4 * 10 *10 = nowhere near the direct memory limit. It doesn't seem possible for it to OOM unless there is a leak.
  2. Setting max heap -XX:MaxDirectMemorySize=16g alleviates the problem, though from testing the RSS only ever gets up to about 6g. Maybe 8g is enough? (this is with the existing heap size of 2g)
  3. Profiling with the gc profiler shows 60gcs per iteration (of 1 second) and eden churn of about 2g/s. Old gen churn and survivor churn are very low at 40MB/s and 4MB/s. I interpret this to mean that we are making a lot of garbage objects and throwing them away really quickly. I also interpret this to mean that I am hitting up against the memory throughput of my machine.

@carl-mastrangelo
Copy link
Contributor Author

Another interesting thing, The byte buf stream used by the benchmarks causes a heap allocation and a copy due to bad interaction with the MessageFramer. Here is how it is allocated:

Daemon Thread [CTF pool-5-4] (Suspended (breakpoint at line 27 in PooledUnsafeHeapByteBuf$1))   
    owns: Object  (id=82)   
    PooledUnsafeHeapByteBuf$1.newObject(Handle<PooledUnsafeHeapByteBuf>) line: 27   
    PooledUnsafeHeapByteBuf$1.newObject(Recycler$Handle) line: 24   
    PooledUnsafeHeapByteBuf$1(Recycler<T>).get() line: 107  
    PooledUnsafeHeapByteBuf.newUnsafeInstance(int) line: 32 
    PoolArena$HeapArena.newByteBuf(int) line: 661   
    PoolArena$HeapArena(PoolArena<T>).allocate(PoolThreadCache, int, int) line: 140 
    PooledByteBufAllocator.newHeapBuffer(int, int) line: 247    
    PooledByteBufAllocator(AbstractByteBufAllocator).heapBuffer(int, int) line: 160 
    PooledByteBufAllocator(AbstractByteBufAllocator).heapBuffer(int) line: 151  
    UnsafeByteBufUtil.getBytes(AbstractByteBuf, long, int, OutputStream, int) line: 601 
    PooledUnsafeDirectByteBuf.getBytes(int, OutputStream, int) line: 152    
    SlicedAbstractByteBuf(SlicedByteBuf).getBytes(int, OutputStream, int) line: 421 
    SlicedAbstractByteBuf(AbstractByteBuf).readBytes(OutputStream, int) line: 917   
    SimpleLeakAwareByteBuf(WrappedByteBuf).readBytes(OutputStream, int) line: 667   
    ByteBufInputStream.drainTo(OutputStream) line: 57   
    MessageFramer.writeToOutputStream(InputStream, OutputStream) line: 228  
    MessageFramer.writeKnownLength(InputStream, int, boolean) line: 193 
    MessageFramer.writeUncompressed(InputStream, int) line: 149 
    MessageFramer.writePayload(InputStream) line: 126   
    NettyClientTransport$2(AbstractStream<IdT>).writeMessage(InputStream) line: 172 
    DelayedStream$3.run() line: 201 
    DelayedClientTransport$PendingStream(DelayedStream).drainPendingCalls() line: 121   
    DelayedClientTransport$PendingStream(DelayedStream).setStream(ClientStream) line: 90    
    DelayedClientTransport$PendingStream.createRealStream(ClientTransport) line: 382    
    DelayedClientTransport$PendingStream.access$1(DelayedClientTransport$PendingStream, ClientTransport) line: 381  
    DelayedClientTransport$2.run() line: 261    
    MoreExecutors$DirectExecutor.execute(Runnable) line: 456    
    DelayedClientTransport.setTransportSupplier(Supplier<ClientTransport>) line: 258    
    DelayedClientTransport.setTransport(ClientTransport) line: 226  
    TransportSet$TransportListener.transportReady() line: 409   
    ClientTransportLifecycleManager.notifyReady() line: 58  
    NettyClientHandler$FrameListener.onSettingsRead(ChannelHandlerContext, Http2Settings) line: 581 
    DefaultHttp2ConnectionDecoder$FrameReadListener.onSettingsRead(ChannelHandlerContext, Http2Settings) line: 460  
    DefaultHttp2ConnectionDecoder$PrefaceFrameListener.onSettingsRead(ChannelHandlerContext, Http2Settings) line: 667   
    Http2InboundFrameLogger$1.onSettingsRead(ChannelHandlerContext, Http2Settings) line: 93 
    DefaultHttp2FrameReader.readSettingsFrame(ChannelHandlerContext, ByteBuf, Http2FrameListener) line: 516 
    DefaultHttp2FrameReader.processPayloadState(ChannelHandlerContext, ByteBuf, Http2FrameListener) line: 256   
    DefaultHttp2FrameReader.readFrame(ChannelHandlerContext, ByteBuf, Http2FrameListener) line: 155 
    Http2InboundFrameLogger.readFrame(ChannelHandlerContext, ByteBuf, Http2FrameListener) line: 41  
    DefaultHttp2ConnectionDecoder.decodeFrame(ChannelHandlerContext, ByteBuf, List<Object>) line: 113   
    Http2ConnectionHandler$FrameDecoder.decode(ChannelHandlerContext, ByteBuf, List<Object>) line: 333  
    Http2ConnectionHandler$PrefaceDecoder.decode(ChannelHandlerContext, ByteBuf, List<Object>) line: 214    
    NettyClientHandler(Http2ConnectionHandler).decode(ChannelHandlerContext, ByteBuf, List<Object>) line: 393   
    NettyClientHandler(ByteToMessageDecoder).callDecode(ChannelHandlerContext, ByteBuf, List<Object>) line: 411 
    NettyClientHandler(ByteToMessageDecoder).channelRead(ChannelHandlerContext, Object) line: 248   
    DefaultChannelHandlerContext(AbstractChannelHandlerContext).invokeChannelRead(Object) line: 354 
    AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext, Object) line: 340    
    DefaultChannelPipeline$HeadContext(AbstractChannelHandlerContext).fireChannelRead(Object) line: 332 
    DefaultChannelPipeline$HeadContext.channelRead(ChannelHandlerContext, Object) line: 1319    
    DefaultChannelPipeline$HeadContext(AbstractChannelHandlerContext).invokeChannelRead(Object) line: 354   
    AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext, Object) line: 340    
    DefaultChannelPipeline.fireChannelRead(Object) line: 904    
    NioSocketChannel$NioSocketChannelUnsafe(AbstractNioByteChannel$NioByteUnsafe).read() line: 123  
    NioEventLoop.processSelectedKey(SelectionKey, AbstractNioChannel) line: 571 
    NioEventLoop.processSelectedKeysOptimized(SelectionKey[]) line: 512 
    NioEventLoop.processSelectedKeys() line: 426    
    NioEventLoop.run() line: 398    
    SingleThreadEventExecutor$5.run() line: 805 
    DefaultThreadFactory$DefaultRunnableDecorator.run() line: 145   
    FastThreadLocalThread(Thread).run() line: 745   

@carl-mastrangelo
Copy link
Contributor Author

More notes:

NettyWritableBufferAllocator allocates a minimum of 4k buffers, which is much larger than the small 14 and 13 byte buffers that are sent by The flow control benchmark. This makes the system run out of direct extremely fast since it effectively costs 4k per rpc. At a million messages per second, it is easy to run out.

carl-mastrangelo added a commit to carl-mastrangelo/grpc-java that referenced this issue Jul 23, 2016
MessageFramer calls Drainable.drainTo with a special output stream of
OutputStreamAdapter.  Currently, ByteBufInputStream writes to this output
stream by allocating a heapBuffer in UnsafeByteBufUtil.getBytes, copying
from the direct byte buffer of BBIS, and then copies to the direct byte
buffer from MessageFramer.writeRaw().

This change is an easy way to cut down on wasted memory, even though
ideally there would be some way to have less copies.  The actual data is
only around 10 bytes, but causes O(10)s of megabytes allocation for the
heap pool.

For grpc#2062
carl-mastrangelo added a commit to carl-mastrangelo/grpc-java that referenced this issue Jul 26, 2016
MessageFramer calls Drainable.drainTo with a special output stream of
OutputStreamAdapter.  Currently, ByteBufInputStream writes to this output
stream by allocating a heapBuffer in UnsafeByteBufUtil.getBytes, copying
from the direct byte buffer of BBIS, and then copies to the direct byte
buffer from MessageFramer.writeRaw().

This change is an easy way to cut down on wasted memory, even though
ideally there would be some way to have less copies.  The actual data is
only around 10 bytes, but causes O(10)s of megabytes allocation for the
heap pool.

For grpc#2062
@ejona86 ejona86 added this to the Unscheduled milestone Jul 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants