Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
rewrite the consumer & source code; other bug fixes
This is a very large commit that aims to make a lot of things simpler to reason about in the consumer code, and then includes a few other bugfixes bundled in that I noticed along the way. Before this, the consumer used a bunch of sequence numbers to track changes across assignments / etc. These sequence numbers were then used to knife out stale partitions / ignore in-progress work that was completing for stale partitions, while still accepting the non-stale results. This sequence number tracking and partition knifing was actually really complicated and difficult to reason about, and I was never fully confident in it. I knew there was some bug somewhere, and getting it to work in the first place had me tracking down some really complicated bugs to begin with. This mostly rewrites the source and consumer code to ideally simplify things. The state of partitions is no longer tracked through sequence numbers that can be bumped whenever; instead, we use consumer sessions that are stopped and started in full, and modifications happen while stopped. This also attempts to simplify state changes from basically happening anywhere to happening in fewer functions, and then adds a lot of long winded documentation for why the state changes are safe. Some simplifications still remain, but this is passing integration tests right now. I'd like to clarify usedCursors, and to switch the useState in a cursor to only have two states rather than three. I'll shortly be following this with some consumer group API changes as well, and will be simplifying the guts of a consumer group. Some changes were already made to make things clearer, and to have fewer blocking functions. I noticed that fetch sessions had some lock inversion; that's fixed, and the mutex aspects of a source / cursor have been drastically simplified. The session also looked to have some flaws on unhappy path errors; ideally those have been fixed, but they're hard to test. I noticed that sinks and sources never updated their broker pointer, which would be problematic if a broker changed but kept the same node ID. There may have been some stuff in the metadata that fixed this up, but regardless it was not the cleanest flow. The broker pointer was definitely never updated. This commit changes that by putting sinks and sources as a dedicated nodeID => sink/source field in the client itself, and the sink/source looks up the broker pointer to use whenever issuing a request. This allows brokers to change at any moment. This also moves the sequenced async request from the broker to the sink itself, since the sink is the only one that needed that, and putting it on the sink allows brokers to change between requests. This also removes the potential for a recBuf's sink or a cursor's source to ever be nil. Instead, we default to the first seed broker. This is an OK compromise because it allows us to start sending (and receiving errors) immediately, while we expect to always have the relevant leader broker loaded. This simplifies using a sink or a source, and also allows things to begin notifying of load errors.
- Loading branch information