-
Notifications
You must be signed in to change notification settings - Fork 240
#213 Postgres: Add support for transaction-scoped advisory locks with external transactions - Continuation #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| private static async ValueTask<bool> ShouldDefineSavePoint(DatabaseConnection connection) | ||
| { | ||
| // If the connection is internally-owned, we only define a save point if a transaction has been opened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madelson I added the functionality for capturing and restoring the timeout settings, please review it.
You should know that I tried to remove the save point functionality entirely, but turns out that you can't do that. In case you get an error when attempting to acquire the lock, you must roll back the transaction (or the save point) if you want to run any other DB command, otherwise you will get an error about the aborted transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you get an error when attempting to acquire the lock, you must roll back the transaction (or the save point) if you want to run any other DB command, otherwise you will get an error about the aborted transaction.
What exactly are the cases impacted by this? Cancellation? Timeout? Something else? I'm wondering because it seems like these would also be issues for the new APIs, e.g. someone passes in a cancellation token or does TryAcquire with a timeout and it silently dooms the transaction that feels like it would be pretty undesirable and unexpected, particularly in the former case. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transaction will not be doomed with the current code, because I didn't remove the save point functionality. They will be functional even if an exception will occur.
When I did try to remove it - it did impact the handling of the timeout (because it checks if the lock is held) and the new code that try to restore the previous timeout settings.
The RollBackTransactionTimeoutVariablesIfNeededAsync function still rolls back the save point for transactional locks in cases of any failure (when you get any exception - the lock is assumed as unobtained), and thus the transaction isn't aborted (see the if statement - if (needsSavePoint && !(acquired && UseTransactionScopedLock(connection)))). This was true even before my changes.
There is only one edge case that I can think of, where the save point roll back works against you in a way for transactional locks, but it probably existed before my changes (for internal connection's transactions) - when you get a timeout and still hold the lock because of the Postgres bug (lines 88-96), I assume the save point's roll back will release the lock beforehand, so you will get a timeout even though for a moment you acquired the lock every time, but this is an edge case that isn't really visible for the library's users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't clear. What I'm trying to understand is how this interacts with the new transactional locking APIs that do not get to use the savepoint.
Under what conditions does the transaction get doomed? Can this happen under "normal" operation of the lock?
Said another way, if the savepoint is critical for all the existing lock APIs, how is removing it not a problem for the new APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually slightly more confused by your questions now :).
I think that maybe you didn't notice, but I changed the code so the new transactional locking APIs will always use save points, in addition to the capture/restore settings logic (see line 227) - this is because otherwise the transaction will get doomed when you get any exception while trying to acquire the lock.
So, the bottom line - the transaction will never get doomed in the new APIs, since we keep using save points in this scenario.
Just wanted to let you know that save points can't be removed from the code, because otherwise transactions will indeed reach an aborted state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new transactional locking APIs will always use save points
Ok sorry didn't realize this. However, now I'm wondering why it is OK to "leak" a SAVEPOINT on every lock acquisition. I'm not an expert on Postgres transactions, but this seems like a side-effect that could cause issues for people. At minimum, subsequent lock acquisitions on the same connection would try to recreate the same named SAVEPOINT which probably wouldn't work (would be good to add a test case for subsequent lock acquisitions on the same transaction).
I wonder if the real solution here is to restrict the new APIs wrt timeout/cancellation functionality. If we only support TryAcquire with timeout 0 and no cancellation, then we dodge all the issues with setting leaks and savepoint leaks (we can modify the code so that for 0-timeout acquires it doesn't issue any SET LOCAL commands and therefore also doesn't need the SAVEPOINT. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your point regarding a "nested" save point with the same name was really good, so I wrote a test and checked it, and it seems to work as intended. I wondered why that is, so I went and read postgres' documentation (https://www.postgresql.org/docs/current/sql-savepoint.html):
savepoint_name
The name to give to the new savepoint. If savepoints with the same name already exist, they will be inaccessible until newer identically-named savepoints are released.
SQL requires a savepoint to be destroyed automatically when another savepoint with the same name is established. In PostgreSQL, the old savepoint is kept, though only the more recent one will be used when rolling back or releasing. (Releasing the newer savepoint with RELEASE SAVEPOINT will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming.
If you have any other concern regarding this scenario, please do tell.
As for the timeout/cancellation restrictions in the new API as a solution for the leaked save point - I am against it. It will make the new API less useful but more importantly it won't solve the issue. We can't really dodge it, if you get any exception from the DB, not a timeout or a cancellation specifically, the transaction will be dead without a save point roll back, so the restrictions won't be a safe option to prevent this scenario from happening altogether.
I think that bottom line is that the save point must be leaked in this case, because if you roll it back after an acquisition you don't have a lock, and if you don't use a save point then the transaction may be aborted (which is the least favorable option IMO), so I'm not seeing a way around it.
We can add a note about it in the new API methods, so users will be able to choose whether they can/should use the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know that leaking a savepoint does not cause a conflict with further acquisitions!
I do want to confirm that there isn't any other observable side-effect to leaking savepoints that could cause issues for users. I'm not finding anything from my quick reading, but curious if you know either way. I want to avoid a scenario where calling the API has some hidden side-effect that causes issues for users, leading to bug reports here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of any other issue that may occur because of that. Do you think we should add a note about it in the new API?
|
|
||
| private static int? ParsePostgresTimeout(string? timeout) | ||
| { | ||
| if (timeout == null) { return null; } // This will be the case if the timeout wasn't found in the DB. Theoretically it should never happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it should never happen, why handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted this code line and basically reverted the whole concept of null timeouts in the code that I added.
| if (timeout == null) { return null; } // This will be the case if the timeout wasn't found in the DB. Theoretically it should never happen. | ||
| if (timeout == "0") { return 0; } // This will be the case if the timeout is disabled. | ||
|
|
||
| // In any other case we need to extract the timeout from the string, since Postgres returns timeouts with their unit attached, e.g. "5000ms". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to parse the value? Can't we leave it as a string and just use the same string value when we restore the timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortuanetly we can't just send a string. When you call SET LOCAL for a timeout Postgres expects a numeric value. If you send a string you will get an error - 42601: trailing junk after numeric literal at or near "{string_value}".
|
|
||
| private static async ValueTask<CapturedTimeoutSettings?> CaptureTimeoutSettingsIfNeededAsync(DatabaseConnection connection, CancellationToken cancellationToken) | ||
| { | ||
| var shouldCaptureTimeoutSettings = UseTransactionScopedLock(connection); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to enable this functionality except for external connections though, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I can change it, but I thought that it is logically sound that timeouts settings won't be leaked out of this class if possible for internally-owned transactions too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For internally-owned connections we don't care about leakage since the transaction never gets reused. Therefore, I'd like to avoid the overhead (2 additional queries).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, fixed it.
| { | ||
| if (needsSavePoint | ||
| if (needsSavePoint | ||
| // For transaction scoped locks, we can't roll back the save point on success because that will roll back our hold on the lock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to also say It's ok to "leak" the savepoint in that case because it's an internally-owned transaction/connection and the savepoint will be cleaned up with the disposal of the transaction.
The previous justification doesn't quite hold since it is now sometimes externally-owned transaction.
We should add back a comment justifying the leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a comment after we'll decide what to do regarding the leaked save points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment
|
|
||
|
|
||
| [Test] | ||
| public async Task TestWorksForMultipleLocksUnderTheSameConnectionWithExternalTransaction() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave a comment here along the lines of "Each acquisition creates the same named savepoint; this seems like it would create a conflict but it actually works fine in Postgres ".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
|
Thanks for working on this @Tzachi009 ! |
This PR is a continuation of #222
Related comments from previous PR:
https://github.com/madelson/DistributedLock/pull/222/files#r1903151395
https://github.com/madelson/DistributedLock/pull/222/files#r1903151800
Related Issue: #213