Skip to content

SearchStats counters prone to overflow #23675

@DrMurx

Description

@DrMurx

Elasticsearch version: 5.2.1

Plugins installed: -/-

JVM version: OpenJDK 1.8.0_111

OS version: Ubuntu 14.04, Kernel 4.4.0

Description of the problem including expected versus actual behavior:

In an environment with extensive use of scroll API, the signed long scrollTimeInMillis defined in org.elasticsearch.index.search.stats.SearchStats can overflow Long.MAX_VALUE, causing StreamOutput. writeVLong used in SearchStats$Stats.writeTo to throw java.lang.IllegalStateException: Negative longs unsupported, use writeLong or writeZLong.

This breaks the monitoring endpoints on data nodes.

Expected behavior is, of course, not to break. Time counters could be implemented as double, with the overhead of floating point arithmetic, and then just losing precision when the value becomes bigger. Or just don't not increment counters beyond Long.MAX_VALUE and return "inf" or "NaN" instead so you'd just lose that value for monitoring. I also doubt that millisecond precision is useful here, but changing it to seconds would only defer the problem.

Provide logs (if relevant):

[2017-03-21T13:32:56,002][WARN ][o.e.a.a.c.n.s.TransportNodesStatsAction] [elastic-node1] not accumulating exceptions, excluding exception from response
org.elasticsearch.action.FailedNodeException: Failed node [QjF2heCMSImu6bPfZ-HzIg]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:247) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$300(TransportNodesAction.java:160) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1024) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport.lambda$handleException$17(TcpTransport.java:1411) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:109) [elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1409) [elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport.handlerResponseError(TcpTransport.java:1401) [elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1345) [elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) [transport-netty4-5.2.1.jar:5.2.1]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:642) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:527) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:481) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.7.Final.jar:4.1.7.Final]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111-internal]
Caused by: org.elasticsearch.transport.RemoteTransportException: [elastic-node2][a.b.c.d:9300][cluster:monitor/nodes/stats[n]]
Caused by: java.lang.IllegalStateException: Negative longs unsupported, use writeLong or writeZLong for negative numbers [-1426090323412]
	at org.elasticsearch.common.io.stream.StreamOutput.writeVLong(StreamOutput.java:219) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.index.search.stats.SearchStats$Stats.writeTo(SearchStats.java:211) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.index.search.stats.SearchStats.writeTo(SearchStats.java:353) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.common.io.stream.StreamOutput.writeOptionalStreamable(StreamOutput.java:723) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.action.admin.indices.stats.CommonStats.writeTo(CommonStats.java:255) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.indices.NodeIndicesStats.writeTo(NodeIndicesStats.java:175) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.action.admin.cluster.node.stats.NodeStats.writeTo(NodeStats.java:235) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport.buildMessage(TcpTransport.java:1179) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:1123) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:1101) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:67) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:61) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.DelegatingTransportChannel.sendResponse(DelegatingTransportChannel.java:58) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.RequestHandlerRegistry$TransportChannelWrapper.sendResponse(RequestHandlerRegistry.java:111) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:270) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:266) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1488) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:596) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.2.1.jar:5.2.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_111-internal]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_111-internal]
	... 1 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions