You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(metrics): fix race when accessing metric registry (#2409)
A race condition was introduced in
5b04c98 (feat(metrics): track
consumer-fetch-response-size) when passing the metric registry around to
get additional metrics. Notably, `handleResponsePromise()` could access
the registry after the broker has been closed and is tentatively being
reopened. This triggers a data race because `b.metricRegistry` is being
set during `Open()` (as it is part of the configuration).
We fix this by reverting the addition of `handleResponsePromise()` as a
method to `Broker`. Instead, we provide it with the metric registry as
an argument. An alternative would have been to get the metric registry
before the `select` call. However, removing it as a method make it
clearer than this function is not allowed to access the broker internals
as they are not protected by the lock and the broker may not be alive
any more.
All the following calls to `b.metricRegistry` are done while the lock is
held:
- inside `Open()`, the lock is held, including inside the goroutine
- inside `Close()`, the lock is held
- `AsyncProduce()` has a contract that it must be called while the broker
is open, we keep a copy of the metric registry to use inside the callback
- `sendInternal()`, has a contract that the lock should be held
- `authenticateViaSASLv1()` is called from `Open()` and
`sendWithPromise()`, both of them holding the lock
- `sendAndReceiveSASLHandshake()` is called from
- `authenticateViaSASLv0/v1()`, which are called from `Open()` and
`sendWithPromise()`
I am unsure about `responseReceiver()`, however, it is also calling
`b.readFull()` which accesses `b.conn`, so I suppose it is safe.
This leaves `sendAndReceive()` which is calling `send()`, which is
calling `sendWithPromise()` which puts a lock. We move the lock to
`sendAndReceive()` instead. `send()` is only called from
`sendAndReceiver()` and we put a lock for `sendWithPromise()` other
caller.
The test has been stolen from #2393 from @samuelhewitt. #2393 is an
alternative proposal using a RW lock to protect `b.metricRegistry`.
Fix#2320
0 commit comments