Skip to content

Conversation

@jrafanie
Copy link
Member

@jrafanie jrafanie commented Oct 17, 2025

[WIP] I'm still investigating some local failures when I run it a bunch of times.

EDIT: The above failures appear to be sporadic failures in other specs and happen on master too.

  • Prevent other threads from modifying the tables we're deleting from
  • Don't allow changes to table_max_id_cache by other threads while we're accessing it

EDIT: For now, I disabled the changes to the tenant test until this PR can be merged: #9684

I've been running the following in a loop and it seemed to fail 1 every 8-10 times before this PR. I noticed a BEGIN in a thread that never completed so it looks like it's waiting on a lock. In the failures I debugged, it looks like there was an API request that was being processed while it was cleaning the database tables. I suspect the UI had setup periodic API requests for notifications so even though the test is not currently driving the page, there was a possibility that other requests would be handled in other threads.

With the PR changes, I've run it probably 30-40 times and have yet to have the error return.

for x in 1 2 3 4 5 6 7 8 9; do; date; export CYPRESS=true; bundle exec rake spec:cypress; done
Details

Example error 1

from:
https://github.com/ManageIQ/manageiq-ui-classic/actions/runs/18583984959/job/52984103601 and
https://github.com/ManageIQ/manageiq-ui-classic/actions/runs/18568245125/job/52935140469

      Validate Manage Quotas in parent tenant
        ✓ Validate Reset & Cancel buttons in Manage Quotas form (3907ms)
        ✓ Validate Manage Quotas function (3855ms)
    Validate Child Tenant operations: Add, Edit, Add Project, Manage Quotas
      Validate Add child tenant function
        ✖(Attempt 1 of 10) Validate Add child tenant form elements
        ✖(Attempt 2 of 10) Validate Add child tenant form elements
        ✖(Attempt 3 of 10) Validate Add child tenant form elements
        ✖(Attempt 4 of 10) Validate Add child tenant form elements
        ✖(Attempt 5 of 10) Validate Add child tenant form elements
        ✖(Attempt 6 of 10) Validate Add child tenant form elements
        ✖(Attempt 7 of 10) Validate Add child tenant form elements
        ✖(Attempt 8 of 10) Validate Add child tenant form elements
        ✖(Attempt 9 of 10) Validate Add child tenant form elements
        ✖(Attempt 10 of 10) "before each" hook for "Validate Add child tenant form elements"
        ✖ "before each" hook for "Validate Add child tenant form elements" (829ms)


  6 passing (2m)
  1 failing

  1) Automate Tenant form operations: Settings > Application Settings > Access Control > Tenants
       Validate Child Tenant operations: Add, Edit, Add Project, Manage Quotas
         Validate Add child tenant function
           "before each" hook for "Validate Add child tenant form elements":
     CypressError: `cy.visit()` failed trying to load:

http://localhost:3000/

We attempted to make an http request to this URL but the request failed without a response.

We received this error at the network level:

  > Error: connect ECONNREFUSED 127.0.0.1:3000

Common situations why this would fail:
  - you don't have internet access
  - you forgot to run / boot your web server
  - your web server isn't accessible
  - you have weird network configuration settings on your computer

Because this error occurred during a `before each` hook we are skipping the remaining tests in the current suite: `Automate Tenant form operat...`
      at <unknown> (http://localhost:3000/__cypress/runner/cypress_runner.js:135083:74)
      at visitFailedByErr (http://localhost:3000/__cypress/runner/cypress_runner.js:134637:12)
      at <unknown> (http://localhost:3000/__cypress/runner/cypress_runner.js:135082:11)
      at tryCatcher (http://localhost:3000/__cypress/runner/cypress_runner.js:1777:23)
      at Promise._settlePromiseFromHandler (http://localhost:3000/__cypress/runner/cypress_runner.js:1489:31)
      at Promise._settlePromise (http://localhost:3000/__cypress/runner/cypress_runner.js:1546:18)
      at Promise._settlePromise0 (http://localhost:3000/__cypress/runner/cypress_runner.js:1591:10)
      at Promise._settlePromises (http://localhost:3000/__cypress/runner/cypress_runner.js:1667:18)
      at _drainQueueStep (http://localhost:3000/__cypress/runner/cypress_runner.js:2377:12)
      at _drainQueue (http://localhost:3000/__cypress/runner/cypress_runner.js:2370:9)
      at Async._drainQueues (http://localhost:3000/__cypress/runner/cypress_runner.js:2386:5)
      at Async.drainQueues (http://localhost:3000/__cypress/runner/cypress_runner.js:2256:14)
  From Your Spec Code:
      at Context.eval (webpack://manageiq-ui-classic/./cypress/support/commands/login.js:6:5)
      at wrapped (http://localhost:3000/__cypress/runner/cypress_runner.js:141610:43)
  
  From Node.js Internals:
    Error: connect ECONNREFUSED 127.0.0.1:3000
        at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1636:16)


Example error 2

from: https://github.com/ManageIQ/manageiq-ui-classic/actions/runs/18553317637/job/52884994391?pr=9535

      Validate Manage Quotas in parent tenant
        ✓ Validate Reset & Cancel buttons in Manage Quotas form (3879ms)
        ✓ Validate Manage Quotas function (3671ms)
    Validate Child Tenant operations: Add, Edit, Add Project, Manage Quotas
      Validate Add child tenant function
        ✖(Attempt 1 of 10) Validate Add child tenant form elements
        ✖(Attempt 2 of 10) Validate Add child tenant form elements
        ✖(Attempt 3 of 10) Validate Add child tenant form elements
        ✖(Attempt 4 of 10) Validate Add child tenant form elements
        ✖(Attempt 5 of 10) Validate Add child tenant form elements
        ✖(Attempt 6 of 10) Validate Add child tenant form elements
        ✖(Attempt 7 of 10) Validate Add child tenant form elements
        ✖(Attempt 8 of 10) Validate Add child tenant form elements
        ✖(Attempt 9 of 10) Validate Add child tenant form elements
        ✖(Attempt 10 of 10) "before each" hook for "Validate Add child tenant form elements"
        ✖ "before each" hook for "Validate Add child tenant form elements" (30354ms)


  6 passing (4m)
  1 failing

  1) Automate Tenant form operations: Settings > Application Settings > Access Control > Tenants
       Validate Child Tenant operations: Add, Edit, Add Project, Manage Quotas
         Validate Add child tenant function
           "before each" hook for "Validate Add child tenant form elements":
     CypressError: `cy.visit()` failed trying to load:

http://localhost:3000/

We attempted to make an http request to this URL but the request failed without a response.

We received this error at the network level:

  > Error: ESOCKETTIMEDOUT

Common situations why this would fail:
  - you don't have internet access
  - you forgot to run / boot your web server
  - your web server isn't accessible
  - you have weird network configuration settings on your computer

Because this error occurred during a `before each` hook we are skipping the remaining tests in the current suite: `Automate Tenant form operat...`
      at <unknown> (http://localhost:3000/__cypress/runner/cypress_runner.js:135083:74)
      at visitFailedByErr (http://localhost:3000/__cypress/runner/cypress_runner.js:134637:12)
      at <unknown> (http://localhost:3000/__cypress/runner/cypress_runner.js:135082:11)
      at tryCatcher (http://localhost:3000/__cypress/runner/cypress_runner.js:1777:23)
      at Promise._settlePromiseFromHandler (http://localhost:3000/__cypress/runner/cypress_runner.js:1489:31)
      at Promise._settlePromise (http://localhost:3000/__cypress/runner/cypress_runner.js:1546:18)
      at Promise._settlePromise0 (http://localhost:3000/__cypress/runner/cypress_runner.js:1591:10)
      at Promise._settlePromises (http://localhost:3000/__cypress/runner/cypress_runner.js:1667:18)
      at _drainQueueStep (http://localhost:3000/__cypress/runner/cypress_runner.js:2377:12)
      at _drainQueue (http://localhost:3000/__cypress/runner/cypress_runner.js:2370:9)
      at Async._drainQueues (http://localhost:3000/__cypress/runner/cypress_runner.js:2386:5)
      at Async.drainQueues (http://localhost:3000/__cypress/runner/cypress_runner.js:2256:14)
  From Your Spec Code:
      at Context.eval (webpack://manageiq-ui-classic/./cypress/support/commands/login.js:6:5)
      at wrapped (http://localhost:3000/__cypress/runner/cypress_runner.js:141610:43)
  
  From Node.js Internals:
    Error: ESOCKETTIMEDOUT
        at ClientRequest.<anonymous> (<embedded>:290:115570)
        at Object.onceWrapper (node:events:632:28)
        at ClientRequest.emit (node:events:518:28)
        at Socket.emitRequestTimeout (node:_http_client:863:9)
        at Object.onceWrapper (node:events:632:28)
        at Socket.emit (node:events:530:35)
        at Socket._onTimeout (node:net:609:8)
        at listOnTimeout (node:internal/timers:588:17)
        at process.processTimers (node:internal/timers:523:7)

@jrafanie jrafanie requested a review from a team as a code owner October 17, 2025 18:53
@jrafanie jrafanie force-pushed the improve-database-cleaner-seeded-deletion-thread-safety branch 2 times, most recently from 66662c7 to 5047426 Compare October 17, 2025 19:01

def start
Rails.logger.info "SeededDeletion strategy start"
Rails.logger.info "SeededDeletion strategy start" if defined?(Rails) && Rails.logger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why you need the Rails.logger check - do we actually have cases with no logger present?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I removed it.

@jrafanie jrafanie force-pushed the improve-database-cleaner-seeded-deletion-thread-safety branch from 5047426 to dc70812 Compare October 24, 2025 21:11
connection.transaction(:requires_new => true) do
# Use a transaction with serializable isolation to prevent other threads
# from modifying the tables during deletion
connection.transaction(:requires_new => true, :isolation => :read_committed) do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to read_committed... it seems to prevent the errors we were seeing locally.

@jrafanie jrafanie force-pushed the improve-database-cleaner-seeded-deletion-thread-safety branch from dc70812 to d28c1f6 Compare October 24, 2025 21:13
@jrafanie jrafanie changed the title Make database cleaner active record seeded deletion start/clean more threadsafe [WIP] Make database cleaner active record seeded deletion start/clean more threadsafe Oct 24, 2025
@jrafanie jrafanie added the wip label Oct 24, 2025
* Prevent other threads from modifying the tables we're deleting from
  * Used read_committed since it's the lowest level that works
  * Tested various isolations:
    * read_uncommitted - works, but PG treats it as read_committed[1]
    * read_committed - works
    * repeatable_read - skipped
    * serializable - works (highest isolation level)
* Don't allow changes to table_max_id_cache by other threads while we're
accessing it

[1] https://www.postgresql.org/docs/13/transaction-iso.html
"In PostgreSQL, you can request any of the four standard transaction isolation
levels, but internally only three distinct isolation levels are implemented,
i.e., PostgreSQL's Read Uncommitted mode behaves like Read Committed. This is
because it is the only sensible way to map the standard isolation levels to
PostgreSQL's multiversion concurrency control architecture."
This reverts commit 9c20c47.

This puts back the changes reverted in ManageIQ#9684.  This should allow us to verify
the db setup/teardown in cypress on rails is no longer causing sporadic test failures
in this area of code.
@jrafanie jrafanie force-pushed the improve-database-cleaner-seeded-deletion-thread-safety branch from d28c1f6 to d1d4cbe Compare October 28, 2025 16:02
@jrafanie jrafanie changed the title [WIP] Make database cleaner active record seeded deletion start/clean more threadsafe Make database cleaner active record seeded deletion start/clean more threadsafe Oct 28, 2025
@jrafanie jrafanie removed the wip label Oct 28, 2025
);
}

// TODO: Aside from test that validates deletion, replace with a more reliable cleanup mechanism when ready
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in is this file are reverts of #9684

def self.table_max_id_cache=(table_id_hash)
@table_max_id_cache ||= table_id_hash
@mutex.synchronize do
@table_max_id_cache = table_id_hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be cleaner as

Suggested change
@table_max_id_cache = table_id_hash
@table_max_id_cache ||= {}
@table_max_id_cache.replace(table_id_hash)

The reason is that if anyone has a reference to the "old" hash from the getter, then this would update that existing reference instead of creating a new one. Not a huge deal, but is a little safer.

@Fryguy Fryguy merged commit c3a0928 into ManageIQ:master Oct 28, 2025
19 checks passed
@jrafanie jrafanie deleted the improve-database-cleaner-seeded-deletion-thread-safety branch October 28, 2025 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants