Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.x]: Internal server error – Could not acquire a mutex lock for the queue #13052

Closed
MoritzLost opened this issue Apr 5, 2023 · 12 comments
Closed
Assignees

Comments

@MoritzLost
Copy link
Contributor

MoritzLost commented Apr 5, 2023

What happened?

Description

We've started seeing an issue on one site where the Control Panel is not accessible at all, it just shows an internal server error. The logs show an error related to the queue:

Could not acquire a mutex lock for the queue (queue).

Full stack trace below. This came out of nowhere, and I have no idea where to start. This happens only in the live environment. It's a VPS controlled by Laravel Forge. There's only one queue runner, which is a simple Laravel Daemon (which is just a wrapper around Supervisor). Even stopping the queue runner completely does not remove this error. We're using the default File Cache.

I have tried to manually release all queue jobs (php craft queue/release all), but this returns the same error as mentioned above.

I tried clearing all caches, manually clearing the storage/runtime/mutex folder, and even restarting the server, but the error is still occurring. I have no idea what caused this or how to reproduce it – we're using the same setup for multiple sites without issues. The site has been working fine until yesterday. The only thing I can think of – there were tons of queue jobs for pending image transforms, probably around 50,000 or so. Should I manually truncate the queue jobs table in the database? But even this might only fix the issue temporarily.

*Edit: I've tried to truncate the queue table. This works, but I've noticed that it fills right back up again. For some reason, something is creating hundreds of queue jobs per second. After a couple of seconds, there are already thousands of entries, all trying to create image transforms:

+----+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+------------+-----+-------+----------+--------------+-------------+----------+---------------+---------+------+------------+-------+
| id | channel | job                                                                                                                                                                                                                                                                                                                                                                                | description                              | timePushed | ttr | delay | priority | dateReserved | timeUpdated | progress | progressLabel | attempt | fail | dateFailed | error |
+----+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+------------+-----+-------+----------+--------------+-------------+----------+---------------+---------+------+------------+-------+
|  1 | queue   | 0x4F3A33393A2263726166745C71756575655C6A6F62735C47656E6572617465496D6167655472616E73666F726D223A343A7B733A31313A226465736372697074696F6E223B4E3B733A33303A220063726166745C71756575655C426173654A6F62005F70726F6772657373223B693A303B733A33353A220063726166745C71756575655C426173654A6F62005F70726F67726573734C6162656C223B4E3B733A31313A227472616E73666F726D4964223B693A3131323B7D | t9n:["app","Generating image transform"] | 1680693249 | 300 |     0 |     2048 | NULL         |        NULL |        0 | NULL          |    NULL |    0 | NULL       | NULL  |
+----+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+------------+-----+-------+----------+--------------+-------------+----------+---------------+---------+------+------------+-------+

This happens even while the system is offline, so now requests can be hitting the frontend, and the queue runner is deactivated. Something is stuck in a loop and is causing those queue jobs to be generated? Any ideas what this might be?

Steps to reproduce

Not sure.

Stacktrace

2023-04-05 10:42:18 [web.ERROR] [yii\base\Exception] Could not acquire a mutex lock for the queue (queue). {"trace":[
    "#0 /path/to/site/vendor/craftcms/cms/src/queue/Queue.php(774): craft\\queue\\Queue->_lock()",
    "#1 /path/to/site/vendor/craftcms/cms/src/queue/Queue.php(503): craft\\queue\\Queue->_moveExpired()",
    "#2 /path/to/site/vendor/twig/twig/src/Extension/CoreExtension.php(1607): craft\\queue\\Queue->getJobInfo()",
    "#3 /path/to/site/vendor/craftcms/cms/src/helpers/Template.php(146): twig_get_attribute()",
    "#4 /path/to/site/storage/runtime/compiled_templates/5d/5d87b5cb3f058ac1b3b8c9e8f13aa349.php(58): craft\\helpers\\Template::attribute()",
    "#5 /path/to/site/vendor/twig/twig/src/Template.php(394): __TwigTemplate_58328c0ff15f838ec9e3684d1e06846b->doDisplay()",
    "#6 /path/to/site/vendor/twig/twig/src/Template.php(367): Twig\\Template->displayWithErrorHandling()",
    "#7 /path/to/site/storage/runtime/compiled_templates/cc/cce4c1710a2253807e6c4ee39a1a4702.php(49): Twig\\Template->display()",
    "#8 /path/to/site/vendor/twig/twig/src/Template.php(394): __TwigTemplate_aa458fcc040bca7c11f4b4cc8ac4fe45->doDisplay()",
    "#9 /path/to/site/vendor/twig/twig/src/Template.php(367): Twig\\Template->displayWithErrorHandling()",
    "#10 /path/to/site/vendor/twig/twig/src/Template.php(379): Twig\\Template->display()",
    "#11 /path/to/site/vendor/twig/twig/src/TemplateWrapper.php(40): Twig\\Template->render()",
    "#12 /path/to/site/vendor/twig/twig/src/Environment.php(277): Twig\\TemplateWrapper->render()",
    "#13 /path/to/site/vendor/craftcms/cms/src/web/View.php(456): Twig\\Environment->render()",
    "#14 /path/to/site/vendor/craftcms/cms/src/web/View.php(509): craft\\web\\View->renderTemplate()",
    "#15 /path/to/site/vendor/craftcms/cms/src/web/TemplateResponseFormatter.php(56): craft\\web\\View->renderPageTemplate()",
    "#16 /path/to/site/vendor/yiisoft/yii2/web/Response.php(1098): craft\\web\\TemplateResponseFormatter->format()",
    "#17 /path/to/site/vendor/craftcms/cms/src/web/Response.php(286): yii\\web\\Response->prepare()",
    "#18 /path/to/site/vendor/yiisoft/yii2/web/Response.php(339): craft\\web\\Response->prepare()",
    "#19 /path/to/site/vendor/yiisoft/yii2/base/Application.php(390): yii\\web\\Response->send()",
    "#20 /path/to/site/web/index.php(11): yii\\base\\Application->run()",
    "#21 {main}"],"memory":5026280,"exception":"[object] (yii\\base\\Exception(code: 0): Could not acquire a mutex lock for the queue (queue). at /path/to/site/vendor/craftcms/cms/src/queue/Queue.php:911)"} 

Craft CMS version

4.4.5

PHP version

8.2

Operating system and version

Ubuntu 20.04.4 LTS

Database type and version

mysql Ver 8.0.32-0ubuntu0.20.04.2 for Linux on x86_64 ((Ubuntu))

Image driver and version

No response

Installed plugins and versions

    Name              Handle          Package Name                   Version  Installed  Enabled
    ----------------  --------------  -----------------------------  -------  ---------  -------
    Asset Rev         assetrev        clubstudioltd/craft-asset-rev  7.0.0    Yes        Yes    
    Contact Form      contact-form    craftcms/contact-form          3.0.1    Yes        Yes    
    Navigation        navigation      verbb/navigation               2.0.17   Yes        Yes    
    Phone Number      phone-number    rynpsc/craft-phone-number      2.1.0    Yes        Yes    
    Redactor          redactor        craftcms/redactor              3.0.4    Yes        Yes    
    Smith             smith           verbb/smith                    2.0.0    Yes        Yes    
    Super Table       super-table     verbb/super-table              3.0.8.1  Yes        Yes    
    Typed link field  typedlinkfield  sebastianlenz/linkfield        2.1.5    Yes        Yes    
@brandonkelly
Copy link
Member

The mutex error means that Craft thinks another request is currently processing the queue, and is avoiding having two conflicting processes.

It’s possible that whatever request initially acquired the queue mutex lock encountered a fatal error that prevented the lock from being released. You should see something mentioned in storage/logs/phperrors.log in that case.

By default, mutex locks live in storage/runtime/mutex using file locks. If you clear that directory, the error should go away.

As for the huge amount of queue jobs: “Generating image transform” jobs will be added each time an image transform (either on the front end or within the control panel for asset thumbnails) is requested and Craft doesn’t have a record of the transform existing yet. The job itself will verify that the transform really doesn’t exist (e.g. if it was just a missing index record, or if the transform has been explicitly requested by the browser via an <img> tag since the job was queued up). If it does, the job bails out early as its work is already done.

You can disable all this functionality entirely and have Craft just generate transforms immediately when it knows it needs them rather than defering to an <img> request / queue job, by enabling the generateTransformsBeforePageLoad config setting in config/general.php.

@MoritzLost
Copy link
Contributor Author

@brandonkelly Thanks for the explanation. After several restarts, clearing caches and the mutex folder and manually truncating the queue table, the site and Control Panel work normally again.

By default, mutex locks live in storage/runtime/mutex using file locks. If you clear that directory, the error should go away.

I thought so too, but the Control Panel was still showing the error even after clearing the mutex folder. So something (maybe the queue runner) was repeatedly acquiring mutex locks and blocking the Control Panel.

Why does the CP dashboard need a mutex lock on the queue table, anyway?

As for the huge amount of queue jobs: “Generating image transform” jobs will be added each time an image transform (either on the front end or within the control panel for asset thumbnails) is requested and Craft doesn’t have a record of the transform existing yet. The job itself will verify that the transform really doesn’t exist (e.g. if it was just a missing index record, or if the transform has been explicitly requested by the browser via an tag since the job was queued up). If it does, the job bails out early as its work is already done.

Yeah, we're used to seeing thousands of queue jobs because of this, that's perfectly normal. We generate tons of image variants for responsive images with srcset, usually around 10 variants per images, times two for the WebP variant. So a gallery with 20 images will result in 20 × 10 × 2 = 400 transforms. For what it's worth, the new system with one queue job for every transform works much better for us. The previous system where each queue job attempted to generate all pending transforms resulted in a lot of crashes and timeouts if too many transforms were requested.

However, I think there's a bug somewhere that can cause GenerateImageTransform jobs to be enqueued for transforms that already exist. While working on the server issue yesterday, I took the system offline and stopped the queue runner, then truncated the queue table. In this situation, no requests could be hitting the frontend, so there shouldn't be any transforms being requested. But I still saw hundreds of queue jobs being added to the queue table every second. No idea where those came from.

I can't reproduce it now, so I'll close this issue, but I'm still not sure if all those queue jobs I was seeing were normal. If I observer the same issue again, I'll report back!

@brandonkelly
Copy link
Member

Why does the CP dashboard need a mutex lock on the queue table, anyway?

The queue tries to acquire a mutex lock when it’s making a change to it, to avoid race conditions with other requests that may also be doing something to the table.

That said, in this case the error is occurring during a routine cleanup operation, which isn’t critically important to the main task (getting info about the active queue job). So I’ve just gone through and relaxed the mutex handling a bit, to skip the cleanup operation if a mutex lock can’t be immediately acquired.

@brandonkelly
Copy link
Member

For what it's worth, the new system with one queue job for every transform works much better for us.

Good to hear!

However, I think there's a bug somewhere that can cause GenerateImageTransform jobs to be enqueued for transforms that already exist.

If you clear your asset indexing data (UtilitiesCachesAsset indexing data), then Craft will need to rebuild those over time. Maybe that’s what happened?

@MoritzLost
Copy link
Contributor Author

That said, in this case the error is occurring during a routine cleanup operation, which isn’t critically important to the main task (getting info about the active queue job). So I’ve just gone through and relaxed the mutex handling a bit, to skip the cleanup operation if a mutex lock can’t be immediately acquired.

@brandonkelly Sounds great, thanks!

If you clear your asset indexing data (Utilities → Caches → Asset indexing data), then Craft will need to rebuild those over time. Maybe that’s what happened?

Ooooh … that could be it. We're running clear-caches/all and cache/flush-all during deployment, so this cache would be cleared. Does that mean that after every deployment, every requested image transform on the entire site will result in a new cron job being created, which will just see that the transform already exists on disk and add the transform index to the database again?

I wasn't aware this was happening, I thought image transformations would just be delegated to cron jobs if the file doesn't exist on disk. Maybe writing the asset index should happen on the fly if the file already exists on disk? Not sure about the implications of that though. Of course we could just skip the asset indexes during deployments, but I really want to clear all caches to avoid potential errors. Shouldn't the asset transform index be more "permanent" than stuff like template caches?

@brandonkelly
Copy link
Member

brandonkelly commented Apr 6, 2023

Does that mean that after every deployment, every requested image transform on the entire site will result in a new cron job being created, which will just see that the transform already exists on disk and add the transform index to the database again?

Queue job, not cron job, but yes.

I wasn't aware this was happening, I thought image transformations would just be delegated to cron jobs if the file doesn't exist on disk. Maybe writing the asset index should happen on the fly if the file already exists on disk?

I could have sworn we were already doing that, but turns out we were only considering the opposite scenario – if a transform index already existed with a record of the generated transform, we were double-checking that it really did exist:

// If it's a local filesystem, double-check that the transform exists
if ($fs instanceof LocalFsInterface && $index->fileExists && !$fs->fileExists($uri)) {
$index->fileExists = false;
$this->storeTransformIndexData($index);
}

Just updated that logic for the next release, to verify that $index->fileExists is accurate regardless of whether it’s true or false. (394779c)

(Note this feature is only available for local filesystems where the fileExists() check will be very quick, as we don’t want to hold up the initial web requests with a bunch of calls to S3, etc.)

@MoritzLost
Copy link
Contributor Author

@brandonkelly

*queue job, right 👀

Just updated that logic for the next release, to verify that $index->fileExists is accurate regardless of whether it’s true or false. (394779c)

That looks perfect, thanks! I was worried we'd have to adjust our workflows for this, so this change is a great improvement. Only creating queue jobs when actual image transforms need to happen is a good solution.

@thupsi
Copy link

thupsi commented Apr 11, 2023

@brandonkelly Will a setup with the Asset Indexes -> Cache remote images option turned on get the same treatment? Or is that irrelevant?

@brandonkelly
Copy link
Member

@thupsi As I mentioned, the change only applies to Local filesystems. Verifying files exist on remove filesystems is slower, so not something we’d want to do outside of an indexing/transform generation operation.

@brandonkelly
Copy link
Member

Craft 4.4.7 is out now, with the changes added in a19c00b and 394779c.

@thupsi
Copy link

thupsi commented Apr 12, 2023

@brandonkelly Right, I overlooked the fact that it's the original file that gets cached locally, not the transformed image. So, to get any benefit from this improvement, if I have my files on a remote volume, I would need to define a local volume for my transforms.

@brandonkelly
Copy link
Member

@thupsi Correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants