Make database cleaner active record seeded deletion start/clean more threadsafe #9672

jrafanie · 2025-10-17T18:53:03Z

~~[WIP] I'm still investigating some local failures when I run it a bunch of times.~~

EDIT: The above failures appear to be sporadic failures in other specs and happen on master too.

Prevent other threads from modifying the tables we're deleting from
Don't allow changes to table_max_id_cache by other threads while we're accessing it

EDIT: For now, I disabled the changes to the tenant test until this PR can be merged: #9684

Reverted Rollback the tenant test changes due to timing issue failures #9684 in the second commit to verify it works. I can extract it separately if needed.

I've been running the following in a loop and it seemed to fail 1 every 8-10 times before this PR. I noticed a BEGIN in a thread that never completed so it looks like it's waiting on a lock. In the failures I debugged, it looks like there was an API request that was being processed while it was cleaning the database tables. I suspect the UI had setup periodic API requests for notifications so even though the test is not currently driving the page, there was a possibility that other requests would be handled in other threads.

With the PR changes, I've run it probably 30-40 times and have yet to have the error return.

for x in 1 2 3 4 5 6 7 8 9; do; date; export CYPRESS=true; bundle exec rake spec:cypress; done

Details

Example error 1

from:
https://github.com/ManageIQ/manageiq-ui-classic/actions/runs/18583984959/job/52984103601 and
https://github.com/ManageIQ/manageiq-ui-classic/actions/runs/18568245125/job/52935140469

      Validate Manage Quotas in parent tenant
        ✓ Validate Reset & Cancel buttons in Manage Quotas form (3907ms)
        ✓ Validate Manage Quotas function (3855ms)
    Validate Child Tenant operations: Add, Edit, Add Project, Manage Quotas
      Validate Add child tenant function
        ✖(Attempt 1 of 10) Validate Add child tenant form elements
        ✖(Attempt 2 of 10) Validate Add child tenant form elements
        ✖(Attempt 3 of 10) Validate Add child tenant form elements
        ✖(Attempt 4 of 10) Validate Add child tenant form elements
        ✖(Attempt 5 of 10) Validate Add child tenant form elements
        ✖(Attempt 6 of 10) Validate Add child tenant form elements
        ✖(Attempt 7 of 10) Validate Add child tenant form elements
        ✖(Attempt 8 of 10) Validate Add child tenant form elements
        ✖(Attempt 9 of 10) Validate Add child tenant form elements
        ✖(Attempt 10 of 10) "before each" hook for "Validate Add child tenant form elements"
        ✖ "before each" hook for "Validate Add child tenant form elements" (829ms)


  6 passing (2m)
  1 failing

  1) Automate Tenant form operations: Settings > Application Settings > Access Control > Tenants
       Validate Child Tenant operations: Add, Edit, Add Project, Manage Quotas
         Validate Add child tenant function
           "before each" hook for "Validate Add child tenant form elements":
     CypressError: `cy.visit()` failed trying to load:

http://localhost:3000/

We attempted to make an http request to this URL but the request failed without a response.

We received this error at the network level:

  > Error: connect ECONNREFUSED 127.0.0.1:3000

Common situations why this would fail:
  - you don't have internet access
  - you forgot to run / boot your web server
  - your web server isn't accessible
  - you have weird network configuration settings on your computer

Because this error occurred during a `before each` hook we are skipping the remaining tests in the current suite: `Automate Tenant form operat...`
      at <unknown> (http://localhost:3000/__cypress/runner/cypress_runner.js:135083:74)
      at visitFailedByErr (http://localhost:3000/__cypress/runner/cypress_runner.js:134637:12)
      at <unknown> (http://localhost:3000/__cypress/runner/cypress_runner.js:135082:11)
      at tryCatcher (http://localhost:3000/__cypress/runner/cypress_runner.js:1777:23)
      at Promise._settlePromiseFromHandler (http://localhost:3000/__cypress/runner/cypress_runner.js:1489:31)
      at Promise._settlePromise (http://localhost:3000/__cypress/runner/cypress_runner.js:1546:18)
      at Promise._settlePromise0 (http://localhost:3000/__cypress/runner/cypress_runner.js:1591:10)
      at Promise._settlePromises (http://localhost:3000/__cypress/runner/cypress_runner.js:1667:18)
      at _drainQueueStep (http://localhost:3000/__cypress/runner/cypress_runner.js:2377:12)
      at _drainQueue (http://localhost:3000/__cypress/runner/cypress_runner.js:2370:9)
      at Async._drainQueues (http://localhost:3000/__cypress/runner/cypress_runner.js:2386:5)
      at Async.drainQueues (http://localhost:3000/__cypress/runner/cypress_runner.js:2256:14)
  From Your Spec Code:
      at Context.eval (webpack://manageiq-ui-classic/./cypress/support/commands/login.js:6:5)
      at wrapped (http://localhost:3000/__cypress/runner/cypress_runner.js:141610:43)
  
  From Node.js Internals:
    Error: connect ECONNREFUSED 127.0.0.1:3000
        at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1636:16)

Example error 2

from: https://github.com/ManageIQ/manageiq-ui-classic/actions/runs/18553317637/job/52884994391?pr=9535

      Validate Manage Quotas in parent tenant
        ✓ Validate Reset & Cancel buttons in Manage Quotas form (3879ms)
        ✓ Validate Manage Quotas function (3671ms)
    Validate Child Tenant operations: Add, Edit, Add Project, Manage Quotas
      Validate Add child tenant function
        ✖(Attempt 1 of 10) Validate Add child tenant form elements
        ✖(Attempt 2 of 10) Validate Add child tenant form elements
        ✖(Attempt 3 of 10) Validate Add child tenant form elements
        ✖(Attempt 4 of 10) Validate Add child tenant form elements
        ✖(Attempt 5 of 10) Validate Add child tenant form elements
        ✖(Attempt 6 of 10) Validate Add child tenant form elements
        ✖(Attempt 7 of 10) Validate Add child tenant form elements
        ✖(Attempt 8 of 10) Validate Add child tenant form elements
        ✖(Attempt 9 of 10) Validate Add child tenant form elements
        ✖(Attempt 10 of 10) "before each" hook for "Validate Add child tenant form elements"
        ✖ "before each" hook for "Validate Add child tenant form elements" (30354ms)


  6 passing (4m)
  1 failing

  1) Automate Tenant form operations: Settings > Application Settings > Access Control > Tenants
       Validate Child Tenant operations: Add, Edit, Add Project, Manage Quotas
         Validate Add child tenant function
           "before each" hook for "Validate Add child tenant form elements":
     CypressError: `cy.visit()` failed trying to load:

http://localhost:3000/

We attempted to make an http request to this URL but the request failed without a response.

We received this error at the network level:

  > Error: ESOCKETTIMEDOUT

Common situations why this would fail:
  - you don't have internet access
  - you forgot to run / boot your web server
  - your web server isn't accessible
  - you have weird network configuration settings on your computer

Because this error occurred during a `before each` hook we are skipping the remaining tests in the current suite: `Automate Tenant form operat...`
      at <unknown> (http://localhost:3000/__cypress/runner/cypress_runner.js:135083:74)
      at visitFailedByErr (http://localhost:3000/__cypress/runner/cypress_runner.js:134637:12)
      at <unknown> (http://localhost:3000/__cypress/runner/cypress_runner.js:135082:11)
      at tryCatcher (http://localhost:3000/__cypress/runner/cypress_runner.js:1777:23)
      at Promise._settlePromiseFromHandler (http://localhost:3000/__cypress/runner/cypress_runner.js:1489:31)
      at Promise._settlePromise (http://localhost:3000/__cypress/runner/cypress_runner.js:1546:18)
      at Promise._settlePromise0 (http://localhost:3000/__cypress/runner/cypress_runner.js:1591:10)
      at Promise._settlePromises (http://localhost:3000/__cypress/runner/cypress_runner.js:1667:18)
      at _drainQueueStep (http://localhost:3000/__cypress/runner/cypress_runner.js:2377:12)
      at _drainQueue (http://localhost:3000/__cypress/runner/cypress_runner.js:2370:9)
      at Async._drainQueues (http://localhost:3000/__cypress/runner/cypress_runner.js:2386:5)
      at Async.drainQueues (http://localhost:3000/__cypress/runner/cypress_runner.js:2256:14)
  From Your Spec Code:
      at Context.eval (webpack://manageiq-ui-classic/./cypress/support/commands/login.js:6:5)
      at wrapped (http://localhost:3000/__cypress/runner/cypress_runner.js:141610:43)
  
  From Node.js Internals:
    Error: ESOCKETTIMEDOUT
        at ClientRequest.<anonymous> (<embedded>:290:115570)
        at Object.onceWrapper (node:events:632:28)
        at ClientRequest.emit (node:events:518:28)
        at Socket.emitRequestTimeout (node:_http_client:863:9)
        at Object.onceWrapper (node:events:632:28)
        at Socket.emit (node:events:530:35)
        at Socket._onTimeout (node:net:609:8)
        at listOnTimeout (node:internal/timers:588:17)
        at process.processTimers (node:internal/timers:523:7)

lib/extensions/database_cleaner-activerecord-seeded_deletion.rb

Fryguy · 2025-10-17T19:26:19Z

lib/extensions/database_cleaner-activerecord-seeded_deletion.rb


      def start
-        Rails.logger.info "SeededDeletion strategy start"
+        Rails.logger.info "SeededDeletion strategy start" if defined?(Rails) && Rails.logger


Curious why you need the Rails.logger check - do we actually have cases with no logger present?

I don't think so. I removed it.

jrafanie · 2025-10-24T21:12:20Z

lib/extensions/database_cleaner-activerecord-seeded_deletion.rb

-        connection.transaction(:requires_new => true) do
+        # Use a transaction with serializable isolation to prevent other threads
+        # from modifying the tables during deletion
+        connection.transaction(:requires_new => true, :isolation => :read_committed) do


Switched to read_committed... it seems to prevent the errors we were seeing locally.

* Prevent other threads from modifying the tables we're deleting from * Used read_committed since it's the lowest level that works * Tested various isolations: * read_uncommitted - works, but PG treats it as read_committed[1] * read_committed - works * repeatable_read - skipped * serializable - works (highest isolation level) * Don't allow changes to table_max_id_cache by other threads while we're accessing it [1] https://www.postgresql.org/docs/13/transaction-iso.html "In PostgreSQL, you can request any of the four standard transaction isolation levels, but internally only three distinct isolation levels are implemented, i.e., PostgreSQL's Read Uncommitted mode behaves like Read Committed. This is because it is the only sensible way to map the standard isolation levels to PostgreSQL's multiversion concurrency control architecture."

This reverts commit 9c20c47. This puts back the changes reverted in ManageIQ#9684. This should allow us to verify the db setup/teardown in cypress on rails is no longer causing sporadic test failures in this area of code.

jrafanie · 2025-10-28T18:22:08Z

cypress/e2e/ui/Settings/Application-Settings/tenant.cy.js

  );
 }

-// TODO: Aside from test that validates deletion, replace with a more reliable cleanup mechanism when ready


The changes in is this file are reverts of #9684

Fryguy · 2025-10-28T18:31:36Z

lib/extensions/database_cleaner-activerecord-seeded_deletion.rb

      def self.table_max_id_cache=(table_id_hash)
-        @table_max_id_cache ||= table_id_hash
+        @mutex.synchronize do
+          @table_max_id_cache = table_id_hash


This would be cleaner as

Suggested change

@table_max_id_cache = table_id_hash

@table_max_id_cache ||= {}

@table_max_id_cache.replace(table_id_hash)

The reason is that if anyone has a reference to the "old" hash from the getter, then this would update that existing reference instead of creating a new one. Not a huge deal, but is a little safer.

jrafanie requested a review from a team as a code owner October 17, 2025 18:53

jrafanie assigned Fryguy Oct 17, 2025

jrafanie force-pushed the improve-database-cleaner-seeded-deletion-thread-safety branch 2 times, most recently from 66662c7 to 5047426 Compare October 17, 2025 19:01

jrafanie added cypress bug labels Oct 17, 2025

Fryguy reviewed Oct 17, 2025

View reviewed changes

lib/extensions/database_cleaner-activerecord-seeded_deletion.rb Outdated Show resolved Hide resolved

Fryguy reviewed Oct 17, 2025

View reviewed changes

lib/extensions/database_cleaner-activerecord-seeded_deletion.rb Show resolved Hide resolved

Fryguy reviewed Oct 17, 2025

View reviewed changes

lib/extensions/database_cleaner-activerecord-seeded_deletion.rb Outdated Show resolved Hide resolved

Fryguy reviewed Oct 17, 2025

View reviewed changes

jrafanie force-pushed the improve-database-cleaner-seeded-deletion-thread-safety branch from 5047426 to dc70812 Compare October 24, 2025 21:11

jrafanie commented Oct 24, 2025

View reviewed changes

jrafanie force-pushed the improve-database-cleaner-seeded-deletion-thread-safety branch from dc70812 to d28c1f6 Compare October 24, 2025 21:13

jrafanie mentioned this pull request Oct 24, 2025

Rollback the tenant test changes due to timing issue failures #9684

Merged

jrafanie changed the title ~~Make database cleaner active record seeded deletion start/clean more threadsafe~~ [WIP] Make database cleaner active record seeded deletion start/clean more threadsafe Oct 24, 2025

jrafanie added the wip label Oct 24, 2025

jrafanie added 2 commits October 28, 2025 12:01

jrafanie force-pushed the improve-database-cleaner-seeded-deletion-thread-safety branch from d28c1f6 to d1d4cbe Compare October 28, 2025 16:02

jrafanie changed the title ~~[WIP] Make database cleaner active record seeded deletion start/clean more threadsafe~~ Make database cleaner active record seeded deletion start/clean more threadsafe Oct 28, 2025

jrafanie removed the wip label Oct 28, 2025

jrafanie commented Oct 28, 2025

View reviewed changes

Fryguy reviewed Oct 28, 2025

View reviewed changes

Fryguy approved these changes Oct 28, 2025

View reviewed changes

Fryguy merged commit c3a0928 into ManageIQ:master Oct 28, 2025
19 checks passed

jrafanie deleted the improve-database-cleaner-seeded-deletion-thread-safety branch October 28, 2025 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make database cleaner active record seeded deletion start/clean more threadsafe #9672

Make database cleaner active record seeded deletion start/clean more threadsafe #9672

Uh oh!

jrafanie commented Oct 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fryguy Oct 17, 2025

Uh oh!

jrafanie Oct 24, 2025

Uh oh!

jrafanie Oct 24, 2025

Uh oh!

jrafanie Oct 28, 2025

Uh oh!

Fryguy Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	@table_max_id_cache = table_id_hash
	@table_max_id_cache \|\|= {}
	@table_max_id_cache.replace(table_id_hash)

Make database cleaner active record seeded deletion start/clean more threadsafe #9672

Make database cleaner active record seeded deletion start/clean more threadsafe #9672

Uh oh!

Conversation

jrafanie commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example error 1

Example error 2

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fryguy Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

jrafanie Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

jrafanie Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

jrafanie Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Fryguy Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jrafanie commented Oct 17, 2025 •

edited

Loading