Fix sqlite tests by dmytro-pryvedeniuk · Pull Request #2636 · JasperFx/wolverine

dmytro-pryvedeniuk · 2026-04-30T13:11:44Z

This PR fixes sqlite tests.

In scheduled_messages_are_processed_in_tenant_files we schedule messages with 2s delay, wait 300ms and check whether the messages are received. It turns out that when the sending code gets to the messages they are considered not "scheduled for later time" but "to be sent immediately" as the scheduled time is before current time.

The simple fix would be just getting rid of this check or increasing schedule delay. For this test the main interest should be that the messages are received by the correct tenants.

On the other hand sqlite advisory lock is created after several delayed retries to attain migration lock. These attempts fail as wolverine_locks table does not exist before migration. At the end migration is executed without the lock but the table is created only for default tenant. As a fix I moved table creation to SqliteAdvisoryLock - it's executed before an attempt to attain the lock (for each tenant).

Other fixed issues:

SqliteMessageStore.TryAttainLockAsync wrongly assumes that if the SQL command is executed the lock is attained, but actually SQL is 'INSERT OR IGNORE' so it must check the affected record. Fixed delegating locking to the used SqliteAdvisoryLock. The downside is that the passed connection is ignored as SqliteAdvisoryLock has own connection. Can it be a problem?
SqliteMessageStore does not release the lock acquired in TryAttainLockAsync. It means the migration lock remains in the table. Fixed by implementing ReleaseLockAsync. Again, the passed connection is ignored as well as the cancellation token.
SqliteAdvisoryLock.TryAttainLockAsync is not idempotent. It returns false second time even though HasLock is true. Fixed calling HasLock from TryAttainLockAsync.
SqliteAdvisoryLock.DisposeAsync throws NullReferenceException as ReleaseLockAsync called above nulls the connection when there is no lock anymore. Fixed by removing failing code (finally section handles it already).

All this unblocks sqlite tests and makes them faster (~2.7mins vs ~1min).

@jeremydmiller There are open questions:

What do you think about these changes in general?
Is it ok to delegate locking to SqliteAdvisoryLock ignoring the passed connection?
Any idea how to handle the stale locks? See should_not_attain_lock_when_previous_owner_crashes_without_releasing test. It shows that if the app crashes the lock remains. Maybe some file-based locks instead (or in addition) or some background cleaner based on node id.
SqliteAdvisoryLock uses db connection that is kept open while it holds any lock and the state of the connection is used for decision making (e.g. in TryAttainLockAsync if the connection is closed it returns false). Not sure how long a lock is supposed to be in use. IMO it's better to open new connection each time, use and dispose returning to the connection pool. Does it make sense?

jeremydmiller · 2026-04-30T13:52:42Z

@dmytro-pryvedeniuk I'm good with these changes for now. In retrospect, it really doesn't make any sense to even have the locking on sqlite as you can't run it in a cluster anyway, right?

@mysticmind, what do you think?

I'm going to hold off on this for the next release though just to get bug fixes out

dmytro-pryvedeniuk · 2026-05-01T11:55:58Z

@jeremydmiller I read https://wolverinefx.net/guide/durability/sqlite.html#sqlite-messaging-transport as "multiple processes can use the same DB taking into account SQLite single-writer limitation". So the locking is needed for migration at least (or not, if anyway the lock is ignored after retries). Not sure about other use cases (current or future) though.

Does not this (https://github.com/JasperFx/wolverine/blob/e3caa7d614f3dd0ff01cbbf3c85bb831ea2d3bdd/src/Persistence/SqliteTests/message_store_initialization_and_configuration.cs) mean that the multiple nodes are possible?

jeremydmiller · 2026-05-01T12:56:10Z

@dmytro-pryvedeniuk I'm going to punt a bit and ask @mysticmind to review this as he has vastly more Sqlite experience than I do.

Babu, thank you in advance!

mysticmind · 2026-05-03T10:42:09Z

I haven't got a chance to look at this, will do in the coming week and revert.

mysticmind · 2026-05-03T11:07:00Z

Does not this (https://github.com/JasperFx/wolverine/blob/e3caa7d614f3dd0ff01cbbf3c85bb831ea2d3bdd/src/Persistence/SqliteTests/message_store_initialization_and_configuration.cs) mean that the multiple nodes are possible?

Docs clearly states that it is single node usage. You can't scale/doesn't work for multi-node usage. Using a single process is the right usage for SQLite.

mysticmind · 2026-05-03T12:48:19Z

Hi @dmytro-pryvedeniuk — really appreciated this PR. Your analysis pinned down several real bugs and the test that motivated it. I've taken your work as the starting point and pushed an alternative approach in a separate PR (link to follow): fix(sqlite): use BEGIN EXCLUSIVE for migration lock. The key difference is on bug no.2 (the wolverine_locks chicken-and-egg). Rather than creating the table from inside SqliteAdvisoryLock.TryAttainLockAsync which adds a CREATE TABLE IF NOT EXISTS to a hot polling path that fires every 200 ms per tenant. I split the migration lock from the polling lock entirely:

Migration lock uses BEGIN EXCLUSIVE TRANSACTION. No schema dependency, so the chicken-and-egg disappears. As a bonus, the OS releases the file-level lock automatically on process death, which sidesteps your open question Q3 about stale locks.
Polling lock continues to use the wolverine_locks row scheme. By the time polling runs, migration has already created the table.

Implementation-wise this required making acquireMigrationLockAsync protected virtual on MessageDatabase<T> and adding a parallel releaseMigrationLockAsync virtual hook so SQLite can substitute its own primitive without touching the Postgres / SQL Server / RavenDb paths.

What I kept directly from your PR (with credit in the commit body):

SqliteAdvisoryLock.TryAttainLockAsync idempotency via HasLock short-circuit (your bug no.5).
SqliteAdvisoryLock.DisposeAsync NRE fix — dropped the duplicate close-and-dispose lines (your bug no.6).
SqliteMessageStore.TryAttainLockAsync delegating to SqliteAdvisoryLock so the affected-rows check is honored (your bug no.3).
New SqliteMessageStore.ReleaseLockAsync override so the lock row actually gets deleted (your bug no.4).
Dropping the racy Task.Delay(300ms) + ShouldBeFalse block in scheduled_messages_are_processed_in_tenant_files (your bug no.1).

Tests:

4 focused new tests in sqlite_migration_lock.cs covering: migration leaves no row in wolverine_locks (proves BEGIN EXCLUSIVE is in effect), two hosts start concurrently against the same file in <1 s (no retry storm), TryAttainLockAsync idempotency, ReleaseLockAsync deletes the row.
Full SqliteTests suite: 378/378 pass on net8.0/net9.0/net10.0, ~1m 14s wall-clock.

Open question Q4 (long-lived vs per-call connection in SqliteAdvisoryLock): I left it as long-lived — keeps the HasLock ping cheap and the lock-row ownership unambiguous.
Open question Q2 (ignoring the caller's DbConnection): same conclusion as you, with the structural justification that SQLite's lock primitive doesn't need a shared connection.

@jeremydmiller In retrospect, I am also thinking that SQLite provider may not be used much in production and messaging layer does not come into play in really small apps. We may have to deprecate support for SQLite all together to reduce spending time and effort on maintaining this.

dmytro-pryvedeniuk · 2026-05-03T17:22:31Z

@mysticmind Your solution is better, efcore also uses an exclusive lock for migration. What about non-migration locks, they still can be left stale, right? Reg. single node restriction does it mean the same as "single process"?

mysticmind · 2026-05-03T23:52:17Z

Good catches, both right.

Stale non-migration locks: yes, the row-based wolverine_locks scheme can leave stale rows on a hard crash. The row isn't tied to the writing connection (unlike the new BEGIN EXCLUSIVE migration lock, which SQLite tears down with the connection). I just pushed a fix on fix/sqlite-migration-lock (ff0f3f8) that adds:

A TTL sweep in TryAttainLockAsync: DELETE FROM wolverine_locks WHERE lock_id = @id AND acquired_at < @cutoff before the INSERT OR IGNORE.
An implicit heartbeat: live holders re-attempt every poll tick (HealthCheckPollingTime for the leadership lock, ScheduledJobPollingTime for recovery/external-table locks), and the idempotent path now UPDATEs acquired_at so live holders are never reaped.

Default TTL 2 min, comfortably above the 10 s heartbeat cadence. Dead holders unblock peers within one TTL window of the next attempt.

"Single node" vs "single process": for the SQLite provider they're effectively the same. SQLite the engine permits multiple processes on one .db, but Wolverine on top assumes one host per file:

The polling-lock scheme can't recover from a peer crash without the TTL sweep above (and even with it, two processes contending on a single SQLite file will hit SQLITE_BUSY retry storms under load).
can_send_from_one_node_to_another_by_destination compliance tests was skipped on the SQLite local fixture in 9750cb4 precisely because the SQLite setup models a single node.
Multi-tenancy splits per-tenant files across one host; it doesn't enable a second host.

So: one Wolverine process per file. For multi-process or true multi-node, the docs steer to Postgres / SQL Server.

dmytro-pryvedeniuk · 2026-05-04T06:04:20Z

Nice, "one file-one process" is simpler and safer as writes are needed. Do you think we can make it clearer in documentation? I see NodeReassignmentPollingTime in a sample, dbcontrol queue, sqlite transport as a feature. All this can make someone to think that at least multiple processes per file are supported.

mysticmind · 2026-05-04T06:45:02Z

Sure, will recheck the samples and sort it out.

…tbeat for non-migration advisory locks (#2666) * fix(sqlite): use BEGIN EXCLUSIVE for migration lock The row-based wolverine_locks scheme couldn't serialize migration: the table is created by the migration it's supposed to protect, so the first migration per tenant burned ~5.5s of failed lock retries before running unprotected. Migration now uses BEGIN EXCLUSIVE; polling keeps the row lock (by then the table exists). From #2636 (dmytro-pryvedeniuk): - SqliteAdvisoryLock.TryAttainLockAsync now idempotent via HasLock - SqliteAdvisoryLock.DisposeAsync NRE: dropped duplicate close+dispose - SqliteMessageStore polling overrides delegate to SqliteAdvisoryLock (TryAttain checks affected rows; Release actually deletes the row) - Dropped racy "not yet delivered" assertion in scheduled-tenant test Not picked: CreateLocksTableIfMissing in the lock hot path -- unneeded once migration uses BEGIN EXCLUSIVE. Also skips can_send_from_one_node_to_another_by_destination on the SQLite local fixture (single-host, no second node) and adds 4 focused migration-lock tests. * fix(sqlite): TTL sweep + heartbeat for non-migration advisory locks wolverine_locks rows aren't bound to the writing connection, so a hard-killed holder leaves a row no peer reaps. Sweep stale rows on attempt; refresh acquired_at when the live holder re-attains. Live holders re-attempt on every poll tick (HealthCheck/ScheduledJob), so the heartbeat is implicit. Default TTL 2m. Split sqlite_migration_lock.cs: migration-lock tests stay; advisory- lock tests (idempotency, release, TTL/heartbeat) move to a new sqlite_advisory_lock.cs.

dmytro-pryvedeniuk added 2 commits April 28, 2026 14:33

Stop trying to catch a moment when scheduled messages are not yet sent

2bacecf

Improve sqlite advisory lock

88f27d4

mysticmind mentioned this pull request May 3, 2026

fix(sqlite): use BEGIN EXCLUSIVE for migration lock; TTL sweep + heartbeat for non-migration advisory locks #2666

Merged

dmytro-pryvedeniuk closed this May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix sqlite tests#2636

Fix sqlite tests#2636
dmytro-pryvedeniuk wants to merge 2 commits intoJasperFx:mainfrom
dmytro-pryvedeniuk:fix-sqlite-tests

dmytro-pryvedeniuk commented Apr 30, 2026 •

edited

Loading

Uh oh!

jeremydmiller commented Apr 30, 2026

Uh oh!

dmytro-pryvedeniuk commented May 1, 2026

Uh oh!

jeremydmiller commented May 1, 2026

Uh oh!

mysticmind commented May 3, 2026

Uh oh!

mysticmind commented May 3, 2026 •

edited

Loading

Uh oh!

mysticmind commented May 3, 2026 •

edited

Loading

Uh oh!

dmytro-pryvedeniuk commented May 3, 2026

Uh oh!

mysticmind commented May 3, 2026

Uh oh!

dmytro-pryvedeniuk commented May 4, 2026

Uh oh!

mysticmind commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

dmytro-pryvedeniuk commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremydmiller commented Apr 30, 2026

Uh oh!

dmytro-pryvedeniuk commented May 1, 2026

Uh oh!

jeremydmiller commented May 1, 2026

Uh oh!

mysticmind commented May 3, 2026

Uh oh!

mysticmind commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mysticmind commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmytro-pryvedeniuk commented May 3, 2026

Uh oh!

mysticmind commented May 3, 2026

Uh oh!

dmytro-pryvedeniuk commented May 4, 2026

Uh oh!

mysticmind commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dmytro-pryvedeniuk commented Apr 30, 2026 •

edited

Loading

mysticmind commented May 3, 2026 •

edited

Loading

mysticmind commented May 3, 2026 •

edited

Loading