Skip to content

v1.9.0

Compare
Choose a tag to compare
@edenhill edenhill released this 16 Jun 11:52
· 262 commits to master since this release

librdkafka v1.9.0

librdkafka v1.9.0 is a feature release:

Upgrade considerations

  • Consumer:
    rd_kafka_offsets_store() (et.al) will now return an error for any
    partition that is not currently assigned (through rd_kafka_*assign()).
    This prevents a race condition where an application would store offsets
    after the assigned partitions had been revoked (which resets the stored
    offset), that could cause these old stored offsets to be committed later
    when the same partitions were assigned to this consumer again - effectively
    overwriting any committed offsets by any consumers that were assigned the
    same partitions previously. This would typically result in the offsets
    rewinding and messages to be reprocessed.
    As an extra effort to avoid this situation the stored offset is now
    also reset when partitions are assigned (through rd_kafka_*assign()).
    Applications that explicitly call ..offset*_store() will now need
    to handle the case where RD_KAFKA_RESP_ERR__STATE is returned
    in the per-partition .err field - meaning the partition is no longer
    assigned to this consumer and the offset could not be stored for commit.

Enhancements

  • Improved producer queue scheduling. Fixes the performance regression
    introduced in v1.7.0 for some produce patterns. (#3538, #2912)
  • Windows: Added native Win32 IO/Queue scheduling. This removes the
    internal TCP loopback connections that were previously used for timely
    queue wakeups.
  • Added socket.connection.setup.timeout.ms (default 30s).
    The maximum time allowed for broker connection setups (TCP connection as
    well as SSL and SASL handshakes) is now limited to this value.
    This fixes the issue with stalled broker connections in the case of network
    or load balancer problems.
    The Java clients has an exponential backoff to this timeout which is
    limited by socket.connection.setup.timeout.max.ms - this was not
    implemented in librdkafka due to differences in connection handling and
    ERR__ALL_BROKERS_DOWN error reporting. Having a lower initial connection
    setup timeout and then increase the timeout for the next attempt would
    yield possibly false-positive ERR__ALL_BROKERS_DOWN too early.
  • SASL OAUTHBEARER refresh callbacks can now be scheduled for execution
    on librdkafka's background thread. This solves the problem where an
    application has a custom SASL OAUTHBEARER refresh callback and thus needs to
    call rd_kafka_poll() (et.al.) at least once to trigger the
    refresh callback before being able to connect to brokers.
    With the new rd_kafka_conf_enable_sasl_queue() configuration API and
    rd_kafka_sasl_background_callbacks_enable() the refresh callbacks
    can now be triggered automatically on the librdkafka background thread.
  • rd_kafka_queue_get_background() now creates the background thread
    if not already created.
  • Added rd_kafka_consumer_close_queue() and rd_kafka_consumer_closed().
    This allow applications and language bindings to implement asynchronous
    consumer close.
  • Bundled zlib upgraded to version 1.2.12.
  • Bundled OpenSSL upgraded to 1.1.1n.
  • Added test.mock.broker.rtt to simulate RTT/latency for mock brokers.

Fixes

General fixes

  • Fix various 1 second delays due to internal broker threads blocking on IO
    even though there are events to handle.
    These delays could be seen randomly in any of the non produce/consume
    request APIs, such as commit_transaction(), list_groups(), etc.
  • Windows: some applications would crash with an error message like
    no OPENSSL_Applink() written to the console if ssl.keystore.location
    was configured.
    This regression was introduced in v1.8.0 due to use of vcpkgs and how
    keystore file was read. #3554.
  • Windows 32-bit only: 64-bit atomic reads were in fact not atomic and could
    in rare circumstances yield incorrect values.
    One manifestation of this issue was the max.poll.interval.ms consumer
    timer expiring even though the application was polling according to profile.
    Fixed by @WhiteWind (#3815).
  • rd_kafka_clusterid() would previously fail with timeout if
    called on cluster with no visible topics (#3620).
    The clusterid is now returned as soon as metadata has been retrieved.
  • Fix hang in rd_kafka_list_groups() if there are no available brokers
    to connect to (#3705).
  • Millisecond timeouts (timeout_ms) in various APIs, such as rd_kafka_poll(),
    was limited to roughly 36 hours before wrapping. (#3034)
  • If a metadata request triggered by rd_kafka_metadata() or consumer group rebalancing
    encountered a non-retriable error it would not be propagated to the caller and thus
    cause a stall or timeout, this has now been fixed. (@aiquestion, #3625)
  • AdminAPI DeleteGroups() and DeleteConsumerGroupOffsets():
    if the given coordinator connection was not up by the time these calls were
    initiated and the first connection attempt failed then no further connection
    attempts were performed, ulimately leading to the calls timing out.
    This is now fixed by keep retrying to connect to the group coordinator
    until the connection is successful or the call times out.
    Additionally, the coordinator will be now re-queried once per second until
    the coordinator comes up or the call times out, to detect change in
    coordinators.
  • Mock cluster rd_kafka_mock_broker_set_down() would previously
    accept and then disconnect new connections, it now refuses new connections.

Consumer fixes

  • rd_kafka_offsets_store() (et.al) will now return an error for any
    partition that is not currently assigned (through rd_kafka_*assign()).
    See Upgrade considerations above for more information.
  • rd_kafka_*assign() will now reset/clear the stored offset.
    See Upgrade considerations above for more information.
  • seek() followed by pause() would overwrite the seeked offset when
    later calling resume(). This is now fixed. (#3471).
    Note: Avoid storing offsets (offsets_store()) after calling
    seek() as this may later interfere with resuming a paused partition,
    instead store offsets prior to calling seek.
  • A ERR_MSG_SIZE_TOO_LARGE consumer error would previously be raised
    if the consumer received a maximum sized FetchResponse only containing
    (transaction) aborted messages with no control messages. The fetching did
    not stop, but some applications would terminate upon receiving this error.
    No error is now raised in this case. (#2993)
    Thanks to @jacobmikesell for providing an application to reproduce the
    issue.
  • The consumer no longer backs off the next fetch request (default 500ms) when
    the parsed fetch response is truncated (which is a valid case).
    This should speed up the message fetch rate in case of maximum sized
    fetch responses.
  • Fix consumer crash (assert: rkbuf->rkbuf_rkb) when parsing
    malformed JoinGroupResponse consumer group metadata state.
  • Fix crash (cant handle op type) when using consume_batch_queue() (et.al)
    and an OAUTHBEARER refresh callback was set.
    The callback is now triggered by the consume call. (#3263)
  • Fix partition.assignment.strategy ordering when multiple strategies are configured.
    If there is more than one eligible strategy, preference is determined by the
    configured order of strategies. The partitions are assigned to group members according
    to the strategy order preference now. (#3818)
  • Any form of unassign*() (absolute or incremental) is now allowed during
    consumer close rebalancing and they're all treated as absolute unassigns.
    (@kevinconaway)

Transactional producer fixes

  • Fix message loss in idempotent/transactional producer.
    A corner case has been identified that may cause idempotent/transactional
    messages to be lost despite being reported as successfully delivered:
    During cluster instability a restarting broker may report existing topics
    as non-existent for some time before it is able to acquire up to date
    cluster and topic metadata.
    If an idempotent/transactional producer updates its topic metadata cache
    from such a broker the producer will consider the topic to be removed from
    the cluster and thus remove its local partition objects for the given topic.
    This also removes the internal message sequence number counter for the given
    partitions.
    If the producer later receives proper topic metadata for the cluster the
    previously "removed" topics will be rediscovered and new partition objects
    will be created in the producer. These new partition objects, with no
    knowledge of previous incarnations, would start counting partition messages
    at zero again.
    If new messages were produced for these partitions by the same producer
    instance, the same message sequence numbers would be sent to the broker.
    If the broker still maintains state for the producer's PID and Epoch it could
    deem that these messages with reused sequence numbers had already been
    written to the log and treat them as legit duplicates.
    This would seem to the producer that these new messages were successfully
    written to the partition log by the broker when they were in fact discarded
    as duplicates, leading to silent message loss.
    The fix included in this release is to save the per-partition idempotency
    state when a partition is removed, and then recover and use that saved
    state if the partition comes back at a later time.
  • The transactional producer would retry (re)initializing its PID if a
    PRODUCER_FENCED error was returned from the
    broker (added in Apache Kafka 2.8), which could cause the producer to
    seemingly hang.
    This error code is now correctly handled by raising a fatal error.
  • If the given group coordinator connection was not up by the time
    send_offsets_to_transactions() was called, and the first connection
    attempt failed then no further connection attempts were performed, ulimately
    leading to send_offsets_to_transactions() timing out, and possibly
    also the transaction timing out on the transaction coordinator.
    This is now fixed by keep retrying to connect to the group coordinator
    until the connection is successful or the call times out.
    Additionally, the coordinator will be now re-queried once per second until
    the coordinator comes up or the call times out, to detect change in
    coordinators.

Producer fixes

  • Improved producer queue wakeup scheduling. This should significantly
    decrease the number of wakeups and thus syscalls for high message rate
    producers. (#3538, #2912)
  • The logic for enforcing that message.timeout.ms is greather than
    an explicitly configured linger.ms was incorrect and instead of
    erroring out early the lingering time was automatically adjusted to the
    message timeout, ignoring the configured linger.ms.
    This has now been fixed so that an error is returned when instantiating the
    producer. Thanks to @larry-cdn77 for analysis and test-cases. (#3709)

Checksums

Release asset checksums:

  • v1.9.0.zip SHA256 a2d124cfb2937ec5efc8f85123dbcfeba177fb778762da506bfc5a9665ed9e57
  • v1.9.0.tar.gz SHA256 59b6088b69ca6cf278c3f9de5cd6b7f3fd604212cd1c59870bc531c54147e889