Skip to content

cherry-pick some necessary PRs#10

Merged
3AceShowHand merged 3 commits intopingcap:v1.41.2-pingcapfrom
3AceShowHand:first-round
Apr 17, 2026
Merged

cherry-pick some necessary PRs#10
3AceShowHand merged 3 commits intopingcap:v1.41.2-pingcapfrom
3AceShowHand:first-round

Conversation

@3AceShowHand
Copy link
Copy Markdown
Collaborator

No description provided.

prestona and others added 3 commits April 17, 2026 17:20
The underlying case was not waiting for the goroutine running the
`responseReceiver()` method to fully complete if SASL authentication
failed. This created a window where a further call to `Broker.Open()`
could overwrite the `Broker.done` channel value while the goroutine
still running `responseReceiver()` was trying to close the same channel.

Fixes: IBM#2382

Signed-off-by: Adrian Preston <PRESTONA@uk.ibm.com>
Related to:
- golang/go#13828
- IBM#1722

We're using https://github.com/Mongey/terraform-provider-kafka to manage
Kafka Topics with Terraform. Recently we've changed from Plaintext
communications to AWS IAM Authentication. When doing so, our provider
sometimes would hang indefinitely on some plans. We pinned this to the
`kafka.t3.small` cluster tiers, as these have several limitations,
including a maximum of 4 TCP connections per second.

While debugging the provider, we understood that the Call Stack was
stuck on writing to the cluster, more specifically right on the first
communication that it was trying to do with the clusters. Reading
through the code, we found a very interesting comment for the Write
function of the TLS package.


https://github.com/golang/go/blob/go1.23.0/src/crypto/tls/conn.go#L1192-L1195
```
// As Write calls [Conn.Handshake], in order to prevent indefinite blocking a deadline
// must be set for both [Conn.Read] and Write before Write is called when the handshake
// has not yet completed. See [Conn.SetDeadline], [Conn.SetReadDeadline], and
// [Conn.SetWriteDeadline].
```
Based on this, TLS requires both Write and Read Deadlines to be set
because the Write function may do a handshake on the fist communication,
and the handshake both Writes and Reads.

I believe that in our case, since we are working with brokers that don't
have a very reliable network, sometimes the handshake would not progress
on the server side, and we would indefinitely wait for a Read that would
never come.

After implementing this change in our local workstation, instead of
experiencing indefinite hanging, the program would finally report some
time of error:
```
Error: kafka: client has run out of available brokers to talk to: read tcp 10.xxx.xxx.xxx:59582->10.xxx.xxx.xxx:9098: i/o timeout
```

Signed-off-by: Bernardo Valente <bernardofvalente@gmail.com>
We should skip the metadata refresh if the startup phase broker returns empty brokers in metadata response. The Java client skips the empty response to update the metadata cache (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1149) and we should make a feature parity in Sarama too

Fixes IBM#2664

Signed-off-by: Hao Sun <haos@uber.com>
@3AceShowHand 3AceShowHand merged commit e5b7e7d into pingcap:v1.41.2-pingcap Apr 17, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants