-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bucket successfully deleted from S3 but DB update fails #1524
Fix bucket successfully deleted from S3 but DB update fails #1524
Conversation
…cessfully-deleted-from-S3-but-DB-update-fails
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #1524 +/- ##
==========================================
- Coverage 92.36% 91.84% -0.52%
==========================================
Files 29 29
Lines 4804 4807 +3
==========================================
- Hits 4437 4415 -22
- Misses 367 392 +25 ☔ View full report in Codecov by Sentry. |
…deleted-from-S3-but-DB-update-fails' of https://github.com/ScilifelabDataCentre/dds_web into DDS-1920-Fix-raising-of-error-when-bucket-successfully-deleted-from-S3-but-DB-update-fails
…cessfully-deleted-from-S3-but-DB-update-fails
@rv0lt : do you have a good way to manually test this? If yes, please, add it to the description |
Manually? The thing I have been doing is to exec into the flask shell and see what was going on. You mean that? |
Yes, but how to reproduce the situation when the error appears |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am getting this traceback:
[2024-04-10 13:15:58,057] commands [ERROR] DeletionError: Project bucket contents were deleted, but they were not deleted from the database. Please contact SciLifeLab Data Centre.
Working outside of request context.
This typically means that you attempted to use functionality that needed
an active HTTP request. Consult the documentation on testing for
information about how to avoid this problem.
Traceback (most recent call last):
File "/code/dds_web/commands.py", line 688, in set_expired_to_archived
new_status_row, delete_message = archive.archive_project(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/dds_web/api/project.py", line 492, in archive_project
RemoveContents().delete_project_contents(project=project, delete_bucket=True)
File "/code/dds_web/api/project.py", line 773, in delete_project_contents
raise DeletionError(
^^^^^^^^^^^^^^
File "/code/dds_web/errors.py", line 200, in __init__
super().__init__((alt_message or "Deletion failed.") if not pass_message else message)
File "/code/dds_web/errors.py", line 39, in __init__
resource=flask.request.path or "not applicable",
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/werkzeug/local.py", line 316, in __get__
obj = instance._get_current_object()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/werkzeug/local.py", line 513, in _get_current_object
raise RuntimeError(unbound_message) from None
RuntimeError: Working outside of request context.
This typically means that you attempted to use functionality that needed
an active HTTP request. Consult the documentation on testing for
information about how to avoid this problem.
Traceback (most recent call last):
File "/usr/local/bin/flask", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/site-packages/flask/cli.py", line 1050, in main
cli.main()
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/flask/cli.py", line 357, in decorator
return __ctx.invoke(f, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/dds_web/commands.py", line 713, in set_expired_to_archived
project.project_statuses.append(new_status_row)
^^^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'new_status_row' where it is not associated with a value
but I still think that this PR is a different taks than the one I (at least) have in mind
What I have been trying to do is to catch the exception for the Deletion. Which is turned into the RuntimeError Then, log it out adding the message of the real error. So it is easy to find (the first line of [ERROR] DeletionError: Project bucket contents were deleted, but they were not deleted from the database. Please contact SciLifeLab Data Centre. So, it ends up logging the error but instead of the generic Runtime, it tries to be a bit more specific Ngl, this whole task is a bit mind scratching honestly
|
I agree it is confusing, and I seem to lack understanding of some of the things involved (on flask and python level). |
Yes, I am trying to write some instructions. Until now, I was just running "flask shell" in the backend container (it just opens a python shell but that can be conected to the apliocation), and running the code lines one by one. I am trying to think of a way to make it more reproducible than just doing that |
and how do you simulate the DB error? |
Ah, yes, same. Running one by one the lines of the function with the db container off.
Veeeeeery messy I know. That is why I am trying to find a way to document/test this better |
Just FYI. Reason why we cannot call just RemoveContents for example with the db off. It is because of decorators. They will detect it before executing the function logic. Probably mocking can be used here, will try something like that in a script |
@rv0lt : here is the traceback I had in mind when we initiated this task: If I understand this correctly, the |
Added an explanation to recreate the situation |
It seems this does the trick with Correct me if I am wrong, but we also want the flask command to register/print the error (what's now alt_message) and to continue running, don't we? |
I see, the alt_message is more difficult, because this issue can be caused by different operations failures. And thus, we go again to the solution of having to inspect the traceback in running time. The continue running, true. This will be caught by the outermost except, it can be solved by moving the archive.archive_project to the try block just below, this one continues the loop afterwards |
hm, why should logging the error be difficult. Maybe I am naive but shouldn't something like this work:
|
Ah yes, that could work. Only thing that I am thinking is that maybe that generates two alerts? This logger and then the one by the catch in the upper function? Worth researching anyway |
yes, I think I saw an error from commands.py too, when I made a quick try. But this shouldn't be a big issue!?!, as long as we get the "Project bucket contents were deleted, but they were not deleted from the ... blah blah" logged and the command continues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the suggested change, I have a question: in commands.py, set_expired_to_archived, inside the main try...except
block (lines 657 to 720) there are two more try...except
blocks, one for calling archive_project
and one for db.session.commit()
Then, in project.py, archive_project
calls delete_project_contents
inside a try...except
block, which in turn has a try...except
Maybe I am just ignorant, but is this really necessary?
Yes, I was still checking the changes. That's why I didn't re-requested the review yet! I added the second try-catch to see if it can be avoided to register two different errors (thus generating two alerts) |
…cessfully-deleted-from-S3-but-DB-update-fails
…cessfully-deleted-from-S3-but-DB-update-fails
Read this before submitting the PR
If there is a field which you are unsure about, enter the edit mode of this description or go to the PR template; There are invisible comments providing descriptions which may be of help.
1. Description / Summary
Fix raising of error when bucket successfully deleted from S3 but DB update fails
--- To reproduce ---
Add a few lines for the application to sleep
Release from the client project_1 and update the database to change the deadline. A testbucket has to exit in Minio
UPDATE projectstatuses SET deadline = CURDATE() - INTERVAL 1 DAY where project_id = 1;
Expire the project from backend container
flask set-available-to-expired
And update the deadline again
1. From the flask command (cronjob)
Add a second waiting time, after the else. In order to be able to turn on the db again and continue for the next projects.
Run
flask set-expired-to-archived
And during the 15 seconds stop. Stop the mariadb container, the error got must be a
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
i.e. The error throw if not from a request
2. From a request
Add another time stop in 774. After the line that catches the sqlerror and before creating the DeletionError
That is, because, if the db is still stopped when trying to throw the exception. There will be a different error thown.
Remember to recreate the testbucket in Minio
From the client, run the command to expire the project, with the db container up.
During the first 15-second wait, stop the container to force the exception.
During the second 15 seconds wait, start the container again to be able to communicate with the client and return the error.
The command will fail in the client, and in the logs it can be seen that the DeletionError was generated.
2. Jira task / GitHub issue
Link to the github issue or add the Jira task ID here.
3. Type of change
What type of change(s) does the PR contain?
Check the relevant boxes below. For an explanation of the different sections, enter edit mode of this PR description template.
4. Additional information
master
branch: _If checked, read the release instructions5. Actions / Scans
Check the boxes when the specified checks have passed.
For information on what the different checks do and how to fix it if they're failing, enter edit mode of this description or go to the PR template.