Skip to content

Conversation

@tiensi
Copy link
Contributor

@tiensi tiensi commented Nov 18, 2025

Summary

There seems to be a WAL race condition within the builder.rs file where a session is created and get session is called immediately after, which in some instances can fail because the get_session call is looking at an old version of the sessions db + WAL from one thread before the create_session changes finishes propagating.

I believe the culprit to be the lack of an explicit commit transaction with WAL enabled. In the concurrency section of sqlite documentation, When a read operation begins on a WAL-mode database, it first remembers the location of the last valid commit record in the WAL. So even though we were relying on concurrency through await?; create_session never explicitly called commit, which possibly resulted in get_session "misses" on an old version of the database.

This could also explain why the Pragma wal_checkpoint approach didn't work as the checkpoint didn't have a completed commit to apply WAL file changes to the database.

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

I was unable to reproduce on multiple linux docker images, I went ahead and reproduced the "bug" (create_session and get_session race condition) by writing concurrent create_session -> get_session race condition tests and ran them a few thousand times.

Ran a timing check on an existing concurrency test. time cargo test test_concurrent_session_creation --release -- --test-threads=1;

Eyeballing the results, WAL takes about .763s and without takes .848s, about a 10% improvement.

Related Issues

Relates to #5197
Discussion:

@tiensi tiensi changed the title Re-enabled wal with proper commit transaction management Re-enabled WAL with commit transaction management Nov 18, 2025
Signed-off-by: Vincent Huang <[email protected]>
@tiensi tiensi changed the title Re-enabled WAL with commit transaction management [Verification Requested] Re-enabled WAL with commit transaction management Nov 18, 2025
@tiensi tiensi marked this pull request as ready for review November 18, 2025 16:56
Copilot AI review requested due to automatic review settings November 18, 2025 16:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR re-enables SQLite WAL (Write-Ahead Logging) mode with explicit transaction management to fix a race condition where get_session could read stale data immediately after create_session. The solution adds explicit tx.commit() calls to ensure WAL changes are visible to concurrent readers.

Key changes:

  • Enabled WAL journal mode in SQLite connection options
  • Added explicit transaction management with commit calls to write operations
  • Ensures WAL checkpoints occur at transaction boundaries for consistent reads

Comment on lines +651 to 655
tx.commit().await?;

if let Some(conversation) = &session.conversation {
self.replace_conversation(&session.id, conversation).await?;
}
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transaction commits before calling replace_conversation, breaking atomicity of the import. If replace_conversation fails, the session will be imported without messages. Consider including the conversation import in the same transaction, or document why this split is intentional.

Suggested change
tx.commit().await?;
if let Some(conversation) = &session.conversation {
self.replace_conversation(&session.id, conversation).await?;
}
if let Some(conversation) = &session.conversation {
self.replace_conversation_tx(&session.id, conversation, &mut tx).await?;
}
tx.commit().await?;

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence with changing this, replace_conversation already does it's own commit independent of import_session. Easiest solution would be to unwrap replace_conversation inside of import_session.

If we move towards foreign keys then messages should not be session orphaned by default and can be addressed there.

@tiensi tiensi changed the title [Verification Requested] Re-enabled WAL with commit transaction management feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) Nov 18, 2025
@jamadeo jamadeo self-requested a review November 19, 2025 00:27
@jamadeo jamadeo self-assigned this Nov 19, 2025
Copy link
Collaborator

@jamadeo jamadeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thank you @tiensi . I was able to test this on @DOsinga 's machine, the same environment that suffered the original bug, and this does indeed fix it.

@jamadeo jamadeo merged commit 7a3725d into block:main Nov 19, 2025
22 checks passed
katzdave added a commit that referenced this pull request Nov 19, 2025
…xt-test

* 'main' of github.com:block/goose:
  chore: Add Adrian Cole to Maintainers (#5815)
  [MCP-UI] Proxy and Better Message Handling (#5487)
  Release 1.15.0
  Document New Window menu in macOS dock (#5811)
  Catch cron errors (#5707)
  feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) (#5793)
  chore: remove autopilot experimental feature (#5781)
  Read paths from an interactive & login shell (#5774)
  docs: acp clients (#5800)
michaelneale added a commit that referenced this pull request Nov 19, 2025
* main:
  feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) (#5793)
  chore: remove autopilot experimental feature (#5781)
  Read paths from an interactive & login shell (#5774)
  docs: acp clients (#5800)
  Provider error proxy for simulating various types of errors (#5091)
  chore: Add links to maintainer profiles (#5788)
  Quick fix for community all stars script (#5798)
  Document Mistral AI provider (#5799)
  docs: Add Community Stars recipe script and txt file (#5776)
wpfleger96 added a commit that referenced this pull request Nov 20, 2025
* main: (33 commits)
  fix: support Gemini 3's thought signatures (#5806)
  chore: Add Adrian Cole to Maintainers (#5815)
  [MCP-UI] Proxy and Better Message Handling (#5487)
  Release 1.15.0
  Document New Window menu in macOS dock (#5811)
  Catch cron errors (#5707)
  feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) (#5793)
  chore: remove autopilot experimental feature (#5781)
  Read paths from an interactive & login shell (#5774)
  docs: acp clients (#5800)
  Provider error proxy for simulating various types of errors (#5091)
  chore: Add links to maintainer profiles (#5788)
  Quick fix for community all stars script (#5798)
  Document Mistral AI provider (#5799)
  docs: Add Community Stars recipe script and txt file (#5776)
  chore: incorporate LF feedback (#5787)
  docs: quick launcher (#5779)
  Bump auto scroll threshold (#5738)
  fix: add one-time cleanup for linux hermit locking issues (#5742)
  Don't show update tray icon if GOOSE_VERSION is set (#5750)
  ...
BlairAllan pushed a commit to BlairAllan/goose that referenced this pull request Nov 29, 2025
…ification Requested) (block#5793)

Signed-off-by: Vincent Huang <[email protected]>
Signed-off-by: Blair Allan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants