-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32033][SS][DSTEAMS]Use new poll API in Kafka connector executor side to avoid infinite wait #28871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…r side to avoid infinite wait
|
Test build #124278 has finished for PR 28871 at commit
|
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change looks good.
Since we're here and this changes the semantic of timeout, could we double-check the default value of timeout is enough, and the documentation is correct?
I roughly skimmed it and found the option follows spark.network.timeout as the default value which is 120s by default, but the default value of kafkaConsumer.pollTimeoutMs in integration doc is 512 ms. Would be nice to fix either one, and probably we would want to higher one, as now it counts metadata update.
I guess it'd be nice to also update the doc as well as migration guide (not 100% sure about this, so let's hear other's voice) so that the semantic of timeout now includes metadata update, so end users will want to set higher value than before.
|
The reason why not added any increase in the Honestly didn't check the doc, presumed it's set properly. Nice catch! Fixed it. :) |
|
Related the migration guide I don't think it's required. What would you write there, which represents real value? I've seen setups which contain enough room to handle such situations + admins know this issue well enough not to write this down. |
|
Test build #124288 has finished for PR 28871 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @gaborgsomogyi and @HeartSaVioR .
Merged to master.
|
Thank you @dongjoon-hyun and @HeartSaVioR for the quick action. |
…or side to avoid infinite wait ### What changes were proposed in this pull request? Spark uses an old and deprecated API named `KafkaConsumer.poll(long)` which never returns and stays in live lock if metadata is not updated (for instance when broker disappears at consumer creation). Please see [Kafka documentation](https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#poll-long-) and [standalone test application](https://github.com/gaborgsomogyi/kafka-get-assignment) for further details. In this PR I've applied the new `KafkaConsumer.poll(Duration)` API on executor side. Please note driver side still uses the old API which will be fixed in SPARK-32032. ### Why are the changes needed? Infinite wait in `KafkaConsumer.poll(long)`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. Closes apache#28871 from gaborgsomogyi/SPARK-32033. Authored-by: Gabor Somogyi <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Spark uses an old and deprecated API named
KafkaConsumer.poll(long)which never returns and stays in live lock if metadata is not updated (for instance when broker disappears at consumer creation). Please see Kafka documentation and standalone test application for further details.In this PR I've applied the new
KafkaConsumer.poll(Duration)API on executor side. Please note driver side still uses the old API which will be fixed in SPARK-32032.Why are the changes needed?
Infinite wait in
KafkaConsumer.poll(long).Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing unit tests.