Skip to content

test_bank_forks_new_rw_arc_memory_leak should test hard fork count, rather than looping until OOM#8585

Merged
roryharr merged 2 commits intoanza-xyz:masterfrom
roryharr:test_bank_forks_new_rw_arc_memory_leak_test_update
Oct 21, 2025
Merged

test_bank_forks_new_rw_arc_memory_leak should test hard fork count, rather than looping until OOM#8585
roryharr merged 2 commits intoanza-xyz:masterfrom
roryharr:test_bank_forks_new_rw_arc_memory_leak_test_update

Conversation

@roryharr
Copy link
Copy Markdown

@roryharr roryharr commented Oct 20, 2025

Problem

test_bank_forks_new_rw_arc_memory_leak is attempting to loop until OOM by creating bank_forks. This is indeterminate based on system settings. Tested the original commit (prior to the test introduction) and did not OOM on my system.

Summary of Changes

  • Change test to count the hard forks of the data structure. This test failed on the original commit (IE It reproduced the issue this test is trying to catch).
  • Test now runs in 100ms instead of 112s and is more effective at catching the intended issue

Note: this test + one other cause the runtime tests to take 120s instead of 11s, impeding local development

Fixes #

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.1%. Comparing base (5565013) to head (1dfd751).
⚠️ Report is 21 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #8585   +/-   ##
=======================================
  Coverage    83.1%    83.1%           
=======================================
  Files         847      847           
  Lines      368907   368906    -1     
=======================================
+ Hits       306810   306825   +15     
+ Misses      62097    62081   -16     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@roryharr roryharr requested a review from brooksprumo October 21, 2025 00:58
@roryharr roryharr marked this pull request as ready for review October 21, 2025 00:58
@roryharr
Copy link
Copy Markdown
Author

I'm not sure if there is anyone else who should review this one. Test was originally added by a public contribution.

@roryharr roryharr requested a review from HaoranYi October 21, 2025 01:00
@brooksprumo
Copy link
Copy Markdown

brooksprumo commented Oct 21, 2025

Can you share the PR that added this test originally, please?

Edit: original PR is #1893

@brooksprumo
Copy link
Copy Markdown

  • Change test to count the hard forks of the data structure. This test failed on the original commit (IE It reproduced the issue this test is trying to catch).

What did the failure on the original commit look like? Assert? OOM? Something else?

@brooksprumo
Copy link
Copy Markdown

For this PR, is the problem that the test is slow? Or that the test is flaky? Or the test isn't actually practically testing anything?

@roryharr
Copy link
Copy Markdown
Author

roryharr commented Oct 21, 2025

  • Change test to count the hard forks of the data structure. This test failed on the original commit (IE It reproduced the issue this test is trying to catch).

What did the failure on the original commit look like? Assert? OOM? Something else?

The test as written does not fail on the commit prior to the fix. Likely the system that I have is different than the system that the original test was run on (it was expecting OOM).
The modified version does fail on the commit prior to the fix for the assert on reference count

@roryharr
Copy link
Copy Markdown
Author

roryharr commented Oct 21, 2025

For this PR, is the problem that the test is slow? Or that the test is flaky? Or the test isn't actually practically testing anything?

The test (prior to changes) doesn't actually fail prior to the original fix on my system. So the test is not catching anything.

The test as modified is significantly faster, and actually catches the issue that it was intended to test. So both.

@HaoranYi
Copy link
Copy Markdown

The change is correct and improves the test significantly. However, the test needs a comment explaining what it's testing for, since the memory leak issue is not immediately obvious.

Suggestion: Add a comment to clarify the test's purpose

  #[test]
  fn test_bank_forks_new_rw_arc_memory_leak() {
      // This test verifies that BankForks::new_rw_arc() doesn't create a reference cycle.
      //
      // Before PR #1893, there was a cycle:
      //   Arc<RwLock<BankForks>> → Bank → ProgramCache → Arc<RwLock<BankForks>>
      //
      // This happened because new_rw_arc() called:
      //   root_bank.set_fork_graph_in_program_cache(bank_forks.clone())
      //
      // The fix changed it to use a Weak reference:
      //   root_bank.set_fork_graph_in_program_cache(Arc::downgrade(&bank_forks))
      //
      // Breaking the cycle:
      //   Arc<RwLock<BankForks>> → Bank → ProgramCache → Weak<RwLock<BankForks>>
      //
      // Without the fix: strong_count == 2 (the clone creates an extra strong ref)
      // With the fix: strong_count == 1 (only the returned Arc holds a strong ref)
      let GenesisConfigInfo { genesis_config, .. } = create_genesis_config(10_000);
      let bank_forks = BankForks::new_rw_arc(Bank::new_for_tests(&genesis_config));
      assert_eq!(Arc::strong_count(&bank_forks), 1);
  }

This makes it clear to future readers what the test is checking and why it matters, without needing to dig through git history.

@roryharr
Copy link
Copy Markdown
Author

The change is correct and improves the test significantly. However, the test needs a comment explaining what it's testing for, since the memory leak issue is not immediately obvious.

Suggestion: Add a comment to clarify the test's purpose

  #[test]
  fn test_bank_forks_new_rw_arc_memory_leak() {
      // This test verifies that BankForks::new_rw_arc() doesn't create a reference cycle.
      //
      // Before PR #1893, there was a cycle:
      //   Arc<RwLock<BankForks>> → Bank → ProgramCache → Arc<RwLock<BankForks>>
      //
      // This happened because new_rw_arc() called:
      //   root_bank.set_fork_graph_in_program_cache(bank_forks.clone())
      //
      // The fix changed it to use a Weak reference:
      //   root_bank.set_fork_graph_in_program_cache(Arc::downgrade(&bank_forks))
      //
      // Breaking the cycle:
      //   Arc<RwLock<BankForks>> → Bank → ProgramCache → Weak<RwLock<BankForks>>
      //
      // Without the fix: strong_count == 2 (the clone creates an extra strong ref)
      // With the fix: strong_count == 1 (only the returned Arc holds a strong ref)
      let GenesisConfigInfo { genesis_config, .. } = create_genesis_config(10_000);
      let bank_forks = BankForks::new_rw_arc(Bank::new_for_tests(&genesis_config));
      assert_eq!(Arc::strong_count(&bank_forks), 1);
  }

This makes it clear to future readers what the test is checking and why it matters, without needing to dig through git history.

Makes sense! Added.

Copy link
Copy Markdown

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@roryharr roryharr enabled auto-merge October 21, 2025 17:21
@roryharr roryharr added this pull request to the merge queue Oct 21, 2025
Merged via the queue into anza-xyz:master with commit 94ad644 Oct 21, 2025
43 checks passed
@roryharr roryharr deleted the test_bank_forks_new_rw_arc_memory_leak_test_update branch October 21, 2025 17:51
rustopian pushed a commit to rustopian/agave that referenced this pull request Nov 20, 2025
…ather than looping until OOM (anza-xyz#8585)

* test_bank_forks_new_rw_arc_memory_leak should test hard fork count, rather than looping until OOM

* Updating test description
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants