Skip to content

Cherry-pick PRs for v2.10.14-RC.2#5278

Merged
derekcollison merged 33 commits intorelease/v2.10.14from
neil/release/v2.10.14
Apr 4, 2024
Merged

Cherry-pick PRs for v2.10.14-RC.2#5278
derekcollison merged 33 commits intorelease/v2.10.14from
neil/release/v2.10.14

Conversation

@neilalexander
Copy link
Copy Markdown
Member

@neilalexander neilalexander commented Apr 4, 2024

@neilalexander neilalexander requested a review from a team as a code owner April 4, 2024 18:59
@neilalexander neilalexander force-pushed the neil/release/v2.10.14 branch from c204c49 to f7f7a09 Compare April 4, 2024 20:47
depthlending and others added 28 commits April 4, 2024 21:50
Signed-off-by: depthlending <bikangning@outlook.com>
…dexCacheBuf.

Signed-off-by: Derek Collison <derek@nats.io>
The code was just checking if max_deliver is >= than backoff,
however infinite max_deliver is represented by -1, so it was erroring.
This fixes that behavior.

Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
Signed-off-by: Waldemar Quevedo <wally@nats.io>
Signed-off-by: Waldemar Quevedo <wally@nats.io>
The main issue for router/gateway connections were that they would
be registered with the global account but never removed from there,
which would cause the connection to be retained. The memory leak
was apparent in one user setup with large amount of subscriptions
that were transfered to a slow gateway, causing accumulation of
partial buffers before the connection was dropped.

For gateway connections, we also needed to clean the outbound's
map since it can contain sublist referencing its own connection.
Same for subscriptions in c.subs.

Another found cause for retained connection is with leafnode where
we are using a subscription referencing the leafnode connection
that was globally registered in a global sublist but never removed.

Also the connection's c.out.sg (a `*Cond`) needed to be set to `nil`
when the connection is closed. Without that, it seems that the connection
would not be released (at least based on experimentation). We now
make sure that c.out.sg is not nil before sending a signal.

The bottom line is that it looks like having an object referencing
itself in some way, prevents the GC from releasing the object, even
if the "top level" object is no longer reachable. For instance,
suppose `obj.someField = obj` seem to prevent the GC from releasing
`obj`, even if `obj` itself is no longer referenced.

The original issue/PR (#5212) was simply cleaning the c.out.wnb,
but the way it was done was racy since c.out.wnb is used without
lock in flushOutbound. Once the retain/release issue is fixed,
cleaning this buffer is not really required (but good practice
especially if not running with GOMEMLIMIT), so we take care
of cleaning this up, but under the protection of the flushOutbound
flag. If set, flushAndClose() will not do the cleaning, flushOutbound
will do it.

Relates to #5212

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Failed reads here might result in duplicate nonces.

Signed-off-by: Neil Twigg <neil@nats.io>
Otherwise we might not overwrite the message with random
bytes properly.

Signed-off-by: Neil Twigg <neil@nats.io>
In this case, an empty-but-non-nil map would not be copied, which
could result in the clone referring to the original map.

Also implement a deep copy of `AllowedConnectionTypes` for `NkeyUser`,
like `User` already has.

Signed-off-by: Neil Twigg <neil@nats.io>
…r stream and has a max messages limit and has discard policy new. The stream should never go over the limit but should still get all the messages in the stream being sourced (i.e. getting more messages as they are consumed from the WQ stream)

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
Signed-off-by: Neil Twigg <neil@nats.io>
Tests were executed against the parent commit of when they were
added to make sure that the tuning is not affecting the reproducibility
of the test. Note that some of the tests were sometimes not detecting
the bug, even when running with `-count=xxx`, so in that case, it
is difficult to know if the tuning would prevent detection if the
bug were to be reintroduced.

The FileStore and MsgTrace tests were also added as a matrix run
in Travis/GA.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Excluded the files that were auto-generated (with a do not edit
comment).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
…store if upto sequence was not found, e.g. deleted already.

Signed-off-by: Derek Collison <derek@nats.io>
We were getting partial or no cache errors when hitting it from multiple go routines in quick succession.

Signed-off-by: Derek Collison <derek@nats.io>
…valid.

If we think we need to calculate a new first sequence but the one we have is still valid, this could lead to messages being skipped and not sent to a consumer.

Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
…raddles multiple allocations in go 1.22.1 on darwin.

Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Specifically around interest and limited streams that are clustered. Since we operate consumers and streams independently, when a stream is removing msgs when acked and acks are coming into different servers at different times, etc. Messages can get stranded.

We introduce a better check of the interest state based on a collective low ack floor that will purge our copy of the stream to catch up.
We run this periodically in monitorStream and during certain conditions.

Thanks to Wally for introducing the test that stresses this part of the system.

Signed-off-by: Derek Collison <derek@nats.io>
For the example hardened systemd file, which is derived from config files which
Synadia uses in production, add more examples:

 * Show setting GOMEMLIMIT env var
   + but also nudge towards using the external environment file for it
 * Use an external environment config file if it exists
 * Show mount-point dependency for JetStream
 * Configure automatic restarts
 * Show using InaccessiblePaths to lock down FS access even further
 * Setup a service alias of simply 'nats'.
Signed-off-by: Neil Twigg <neil@nats.io>
derekcollison and others added 4 commits April 4, 2024 21:50
Signed-off-by: Derek Collison <derek@nats.io>
…ects.

Signed-off-by: Derek Collison <derek@nats.io>
It concerns me a little that by duplicating the same test for both
file and memory stores, that differences could slip in later that
mean we are no longer testing for consistency between the two stores.

This approach allows us to write tests that only use the `StreamStore`
API and have them tested against all store permutations.

Signed-off-by: Neil Twigg <neil@nats.io>
Signed-off-by: Neil Twigg <neil@nats.io>
@neilalexander neilalexander force-pushed the neil/release/v2.10.14 branch from f7f7a09 to e3c9236 Compare April 4, 2024 20:50
… unprotected.

Signed-off-by: Derek Collison <derek@nats.io>
Copy link
Copy Markdown
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit 8cafe5b into release/v2.10.14 Apr 4, 2024
@derekcollison derekcollison deleted the neil/release/v2.10.14 branch April 4, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants