Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ServerCnx] Improve error logging for topic not found #13950

Merged

Conversation

michaeljmarshall
Copy link
Member

Motivation

When trying to load a topic without auto topic creation enabled, the broker logs a large stack trace that isn't very helpful. The error isn't helpful because the stack trace is truncated and because it comes from calling .get on an empty optional

return getTopic(topic, isAllowAutoTopicCreation(topic)).thenApply(Optional::get);

Here is a sample stack trace:

22:57:37.017 [bookkeeper-ml-scheduler-OrderedScheduler-6-0] ERROR org.apache.pulsar.broker.service.ServerCnx - [/192.168.49.41:46644] Failed to create topic persistent://public/default/test, producerId=1
java.util.concurrent.CompletionException: java.util.NoSuchElementException: No value present
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:645) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) ~[?:?]
        at org.apache.pulsar.broker.service.BrokerService$3.openLedgerFailed(BrokerService.java:1286) ~[com.datastax.oss-pulsar-broker-2.8.0.1.1.14.jar:2.8.0.1.1.14]
        at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.lambda$asyncOpen$8(ManagedLedgerFactoryImpl.java:403) ~[com.datastax.oss-managed-ledger-2.8.0.1.1.14.jar:2.8.0.1.1.14]
        at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]
        at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl$2.initializeFailed(ManagedLedgerFactoryImpl.java:398) ~[com.datastax.oss-managed-ledger-2.8.0.1.1.14.jar:2.8.0.1.1.14]
        at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl$1.operationFailed(ManagedLedgerImpl.java:396) ~[com.datastax.oss-managed-ledger-2.8.0.1.1.14.jar:2.8.0.1.1.14]
        at org.apache.bookkeeper.mledger.impl.MetaStoreImpl.lambda$getManagedLedgerInfo$2(MetaStoreImpl.java:87) ~[com.datastax.oss-managed-ledger-2.8.0.1.1.14.jar:2.8.0.1.1.14]
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:714) [?:?]
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?]
        at org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203) [com.datastax.oss-bookkeeper-common-4.14.3.1.0.0.jar:4.14.3.1.0.0]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final]
        at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.util.NoSuchElementException: No value present
        at java.util.Optional.get(Optional.java:148) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) ~[?:?]
        ... 21 more

I would have preferred to modify the getOrCreateTopic method, but it seems that method won't be easy to modify without touching a bunch of tests. Instead, I update the exceptionally block to improve our handling of the failed topic load.

Modifications

  • Don't log the stack trace when a topic fails to load with a NoSuchElementException. This case happens when auto topic creation is turned off.

Verifying this change

This is a trivial change to a log line, so I do not have any tests.

Does this pull request potentially affect one of the following parts:

There are no breaking/public changes here.

Documentation

  • no-need-doc
    This update only affects a log line, so it doesn't need documentation.

ServiceUnitNotReadyException.class, ManagedLedgerException.class)) {
// Do not print stack traces for expected exceptions
log.error("[{}] Failed to create topic {}, producerId={}",
remoteAddress, topicName, producerId, exception);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it still print the exception stack in exception ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still prints the exception stack here, but it doesn't for exceptions that have a cause of type NoSuchElementException. I relocated the comment because we're avoiding printing stack traces in this conditional block.

@@ -1273,12 +1273,13 @@ protected void handleProducer(final CommandProducer cmdProducer) {
return null;
}

// Do not print stack traces for expected exceptions
if (cause instanceof NoSuchElementException) {
cause = new TopicNotFoundException("Topic Not Found.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could also try to not create a 2nd exception here which is only used to do getMessage()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, although we also use the exception here: BrokerServiceException.getClientErrorCode(cause).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merlimat - would you like me to introduce more complicated logic for getting the client error code, or is this sufficient?

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@michaeljmarshall
Copy link
Member Author

Currently blocked by #13965. There is a chance that the cpp tests will succeed (it looks like a race?). Once #13965 is resolved, we should be able to get these tests passing.

@BewareMyPower
Copy link
Contributor

@michaeljmarshall The C++ tests have already been fixed by disabling subscriptionKeySharedUseConsistentHashing in #13963. You can rebase to master. However, I think the CI already passed.

@michaeljmarshall
Copy link
Member Author

@BewareMyPower - thanks, for your note--that last retry worked.

I am going to merge this now and we can revisit the exception later, if it's an issue.

This was referenced Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-not-needed Your PR changes do not impact docs type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.