Skip to content

Conversation

@jasagredo
Copy link
Contributor

The backing store tracer

As lgrTracer was lazy, and also >$< is lazy, the bsTracer was a thunk that retained the LedgerDbArgs which contain the Genesis ledger state.

The ProtocolInfo

As pInfoConfig is lazy, the codecConfig bind was remaining as a thunk and it was retaining the whole ProtocolInfo which contains the Genesis ledger state.

Copy link
Contributor

@bladyjoker bladyjoker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! I want to see you debug these issues live some day, quite cool you were able to find where the issue was.

@jasagredo
Copy link
Contributor Author

jasagredo commented Oct 24, 2025

This comment explains how I found this issue and why I think it has been solved by these changes.

The initial problem statement was that when benchmarking the node in the workbench there was not a big observable difference between the InMemory and LMDB backends. This is counterintuitive because the LMDB backend should flush all UTxOs to the backing store, so they should be gone from the heap.

Thanks to @fmaste, I got a first heap snapshot using ghc-debug, which already narrowed the problem. Thanks to @mgmeier I was able to get a workbench profile in which the nodes run for a long time, with the LMDB backend and doing "nothing" (no transactions in the network). To instrument the node, it is sufficient to add:

+import GHC.Debug.Stub

-main = do
+main = withGhcDebug $ do

and

executable cardano-node
  ...
-  build-depends:
+  build-depends: ghc-debug-stub,

For this to be useful, the libraries to be inspected need to be compiled with info-tables:

package ouroboros-consensus
  ghc-options: -finfo-table-map -fdistinct-constructor-tables

package ouroboros-consensus-diffusion
  ghc-options: -finfo-table-map -fdistinct-constructor-tables

Un-retaining the LedgerDbArgs

Once the node starts, we can connect to it with ghc-debug-brick and pause the process to inspect the heap. Searching for retainers of TxIn (Ctrl+e), the following shows up:
image
which already points to the LedgerDbArgs being retained. In particular it points to lgrGenesis (because of the entries above the highlighted one).

At first I thought about reworking that field to get it out of the LedgerDbArgs, but actually those args should "dissolve" once we opened the database so the issue is that the arguments themselves are retained.

Looking at the entry just below the highlighted one, we see the following description (see top of the screenshot):
image
So that already points to the bsTracer. Adding a bang there still didn't fix it, and adding a bang to lgrTracer and bsTracer also didn't fix it, I suspect because >$< might also induce undesired laziness. Forcing lgrTracer with the !tr bind fixes the issue.

Un-retaining the ProtocolInfo

If we run the node again, and search for TxIn retainers now we see the following heap:
image
which shows another retainer chain, this time ProtocolInfo. We can see that the field that retains the Genesis is pInfoInitLedger as we can see above the highlighted entry.

But once again, the ProtocolInfo should dissolve once we started the node. This time looking at the entries below ProtocolInfo we can see that the CodecConfig is being retained.
image

This time pInfoConfig is lazy, so adding a bang there and also in the codecConfig binding solves the issue.

Note: if we let the node run until it connects to other peers, it will force the CodecConfig eventually so the ProtocolInfo (and with it the Genesis, and with it all the TxIns) will disappear, as it can be seen in this heap from a node that already established connections: image
In any case I think it is worth it forcing this value just in case.

@jasagredo jasagredo force-pushed the js/bangs branch 2 times, most recently from 85e9af5 to 06ddffa Compare October 24, 2025 14:16
@jasagredo jasagredo mentioned this pull request Oct 24, 2025
9 tasks
@jasagredo jasagredo added this pull request to the merge queue Oct 29, 2025
github-merge-queue bot pushed a commit that referenced this pull request Oct 29, 2025
#1731)

# The backing store tracer

As `lgrTracer` was lazy, and also `>$<` is lazy, the `bsTracer` was a
thunk that retained the `LedgerDbArgs` which contain the Genesis ledger
state.

# The ProtocolInfo

As `pInfoConfig` is lazy, the `codecConfig` bind was remaining as a
thunk and it was retaining the whole `ProtocolInfo` which contains the
Genesis ledger state.
@jasagredo jasagredo removed this pull request from the merge queue due to a manual request Oct 29, 2025
@jasagredo jasagredo enabled auto-merge October 29, 2025 12:05
@jasagredo jasagredo self-assigned this Oct 29, 2025
@jasagredo jasagredo moved this to 👀 In review in Consensus Team Backlog Oct 29, 2025
@jasagredo jasagredo added this pull request to the merge queue Oct 29, 2025
Merged via the queue into main with commit 91c8a1b Oct 29, 2025
22 of 33 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Consensus Team Backlog Oct 29, 2025
@jasagredo jasagredo deleted the js/bangs branch October 29, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

3 participants