Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed accessing MessageImpl after it was enqueued on user queue #11824

Merged
merged 1 commit into from
Aug 28, 2021

Conversation

merlimat
Copy link
Contributor

Motivation

There is an exception that is thrown (and not shown, but that's a different issue) when using a consumer with message pooling enabled and consuming from a partitioned topic.
After a short amount of time the consumption silently stops.

The exception is:

java.lang.NullPointerException
	at org.apache.pulsar.client.impl.MessageImpl.size(MessageImpl.java:398)
	at org.apache.pulsar.client.impl.TopicMessageImpl.size(TopicMessageImpl.java:98)
	at org.apache.pulsar.client.impl.ConsumerBase.increaseIncomingMessageSize(ConsumerBase.java:974)
	at org.apache.pulsar.client.impl.ConsumerBase.enqueueMessageAndCheckBatchReceive(ConsumerBase.java:731)
	at org.apache.pulsar.client.impl.MultiTopicsConsumerImpl.messageReceived(MultiTopicsConsumerImpl.java:301)
	at org.apache.pulsar.client.impl.MultiTopicsConsumerImpl.lambda$receiveMessageFromConsumer$8(MultiTopicsConsumerImpl.java:255)
	at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:714)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2144)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$internalReceiveAsync$3(ConsumerImpl.java:434)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:830)

The issue is easily reproducible by using pulsar-perf consume over a partitioned topic, since pulsar-perf is using message pooling by default.

The root cause is that we are accessing the message object instance after it was enqueued on the queue where the user will pick it up.

The user is doing something like:

while (true) {
   Message<?> msg = consumer.receive();
   ///
   consumer.acknowledge();
   msg.release();
}

When msg.release() gets called, the object fields are nulled and then the object goes back to the pool. For that, from inside the client lib, we cannot touch that object from the moment it gets put on the queue.

@merlimat merlimat added type/bug The PR fixed a bug or issue reported a bug doc-not-needed Your PR changes do not impact docs labels Aug 27, 2021
@merlimat merlimat added this to the 2.9.0 milestone Aug 27, 2021
@merlimat merlimat self-assigned this Aug 27, 2021
Copy link
Contributor

@315157973 315157973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Job!

@eolivelli
Copy link
Contributor

@MMirelli this may be a problem that should make some Fallout tests against current master to fail.
I wonder why we haven't caught this, can you please check?

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

It would be good to have a test case that covers this problem

@merlimat merlimat merged commit 666ad3b into apache:master Aug 28, 2021
@merlimat merlimat deleted the error-pooled-messages branch August 28, 2021 16:26
hangc0276 pushed a commit that referenced this pull request Aug 29, 2021
@hangc0276 hangc0276 added cherry-picked/branch-2.8 Archived: 2.8 is end of life area/client labels Aug 29, 2021
eolivelli pushed a commit to datastax/pulsar that referenced this pull request Sep 24, 2021
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/client cherry-picked/branch-2.8 Archived: 2.8 is end of life doc-not-needed Your PR changes do not impact docs release/2.8.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants