High CPU utilization due to large number of `DELETE` statements #1412

crirusu · 2024-08-26T10:25:50Z

Symptoms

When using PostgreSQL transport, the database used as messaging infrastructure is put under heavy load caused by an excessive number of unnecessary DELETE statements i.e. statements that result in 0 rows being affected.

Who's affected

All PostgreSQL transport users.

Root cause

The problem is caused by the peek statement overestimating the number of messages ready to be received. The query calculates the difference between "oldest" and "newest" (based on a sequence number) messages including messages that are already being received. This in turn, causes situation when a single receive transaction that takes longer than the others can cause a significant overestimation of the number of messages.

Patched version

https://github.com/Particular/NServiceBus.SqlServer/releases/tag/8.1.5

Original content

### Describe the bug

Description

We deployed 9 net core services using NSB with Postgres which used a database on an amazon server with 48 acu on aurora postgress serverless v2.
According to Amazon 1 ACU = 2GB RAM
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html

Even with few data to be processed CPU did not dropped, and we were unable to keep up with processing.
I am attaching a picture with a very high number of commits and rollbacks.

With the database for the queues on a MSSQL Server with 8 cores and 32 GB of RAM we can run all services with less than 30% CPU.

Versions

Please list the version of the relevant packages or applications in which the bug exists.

Steps to reproduce

Try to see how many requests are actually done on the database.

The text was updated successfully, but these errors were encountered:

SzymonPobiega · 2024-08-29T07:33:00Z

Hi

For some reason I can't open the picture in full resolution but as far as I can tell, you get 10-20 times more calls for the DELETE than the rows affected. Is that correct?

Could you describe also the specifics of your deployment i.e. how many instances there is of each logical endpoint. In order to keep that data confidential, please send up an email to the support address.

crirusu · 2024-08-29T09:41:31Z

Hi,

We have 9 services but some have 2 NSB endpoints

Service 1 - 1 NSB endpoint with NServiceBusMaxConcurrency = 20
Service 2 - 1 NSB endpoint with NServiceBusMaxConcurrency = 32 and 1 NSB endpoint with NServiceBusMaxConcurrency = 18
Service 3 - 1 NSB endpoint with NServiceBusMaxConcurrency = 16
Service 4 - 1 NSB endpoint with NServiceBusMaxConcurrency = 16 and 1 NSB endpoint with NServiceBusMaxConcurrency = 25
Service 5 - 1 NSB endpoint with NServiceBusMaxConcurrency = 18
Service 6 - 1 NSB endpoint with NServiceBusMaxConcurrency = 1 and 1 NSB endpoint with NServiceBusMaxConcurrency = 1
Service 7 - 1 NSB endpoint with NServiceBusMaxConcurrency = 16
Service 8 - 1 NSB endpoint with NServiceBusMaxConcurrency = 16
Service 9 - 1 NSB endpoint with NServiceBusMaxConcurrency = 20

I can send the picture again on some email.
The problem is that are too many commits and rollbacks.
We now reverted to MSSQL Server instance and it is nowhere close to what was in postgres - compared with the number of transactions/ database activity.

SzymonPobiega · 2024-09-04T06:21:14Z

Yes, the number of commits is off but so is the ration of calls to rows affected. In the ideal scenario when queues always have some messages the number of DELETE calls should be equal to number of rows returned.

I am especially concerned about the Sub_ExportData endpoint. Is it one of the scaled-out endpoints? (2, 4 or 6).

Does the rows/s value of 50-80 match the expected number of messages per second these endpoints process?

crirusu · 2024-09-04T07:34:50Z

hi,
i don't think we process that much. This is the picture with the mssql server that we are running the same endpoints as we did in postgresql.

tmasternak · 2024-10-16T08:44:32Z

@crirusu it seems that the extensive number of DELETE statements are due to overestimating the size of the input queue (number of rows in the input table), which in turn is caused by the SQL statement running not skipping the locked (held by other receive transactions) rows.

We are working on the fix:

High CPU utilization due to large number of DELETE statements #1443

tmasternak · 2024-10-22T07:12:08Z

@crirusu fyi, the fix for PostgreSQL is out in the 8.1.5 version of the package.

crirusu added the Bug label Aug 26, 2024

tmasternak mentioned this issue Oct 10, 2024

High CPU utilization due to large number of DELETE statements #1443

Merged

boblangley added the Triaged label Oct 15, 2024

tmasternak changed the title ~~High CPU using NSB with Postgres~~ High CPU utilization due to large number of DELETE statements Oct 16, 2024

tmasternak added this to the 8.1.5 milestone Oct 16, 2024

tmasternak closed this as completed in #1443 Oct 16, 2024

tmasternak mentioned this issue Oct 17, 2024

High CPU utilization due to large number of DELETE statements #1450

Merged

tmasternak mentioned this issue Oct 22, 2024

PostgreSQL - High CPU utilization due to large number of DELETE statements Particular/ServiceControl#4547

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU utilization due to large number of `DELETE` statements #1412

High CPU utilization due to large number of `DELETE` statements #1412

crirusu commented Aug 26, 2024 •

edited by tmasternak

Loading

Description

Versions

Steps to reproduce

SzymonPobiega commented Aug 29, 2024

crirusu commented Aug 29, 2024 •

edited

Loading

SzymonPobiega commented Sep 4, 2024

crirusu commented Sep 4, 2024

tmasternak commented Oct 16, 2024

tmasternak commented Oct 22, 2024

High CPU utilization due to large number of DELETE statements #1412

High CPU utilization due to large number of DELETE statements #1412

Comments

crirusu commented Aug 26, 2024 • edited by tmasternak Loading

Symptoms

Who's affected

Root cause

Patched version

Description

Versions

Steps to reproduce

SzymonPobiega commented Aug 29, 2024

crirusu commented Aug 29, 2024 • edited Loading

SzymonPobiega commented Sep 4, 2024

crirusu commented Sep 4, 2024

tmasternak commented Oct 16, 2024

tmasternak commented Oct 22, 2024

High CPU utilization due to large number of `DELETE` statements #1412

High CPU utilization due to large number of `DELETE` statements #1412

crirusu commented Aug 26, 2024 •

edited by tmasternak

Loading

crirusu commented Aug 29, 2024 •

edited

Loading