#213 Postgres: Add support for transaction-scoped advisory locks with external transactions - Continuation #235

Tzachi009 · 2025-01-07T07:30:22Z

This PR is a continuation of #222

Related comments from previous PR:
https://github.com/madelson/DistributedLock/pull/222/files#r1903151395
https://github.com/madelson/DistributedLock/pull/222/files#r1903151800

Related Issue: #213

madelson · 2025-01-08T00:59:30Z

src/DistributedLock.Postgres/PostgresAdvisoryLock.cs


    private static async ValueTask<bool> ShouldDefineSavePoint(DatabaseConnection connection)
    {
        // If the connection is internally-owned, we only define a save point if a transaction has been opened.


@Tzachi009 see https://github.com/madelson/DistributedLock/pull/222/files#r1906204137

@madelson I added the functionality for capturing and restoring the timeout settings, please review it.

You should know that I tried to remove the save point functionality entirely, but turns out that you can't do that. In case you get an error when attempting to acquire the lock, you must roll back the transaction (or the save point) if you want to run any other DB command, otherwise you will get an error about the aborted transaction.

In case you get an error when attempting to acquire the lock, you must roll back the transaction (or the save point) if you want to run any other DB command, otherwise you will get an error about the aborted transaction.

What exactly are the cases impacted by this? Cancellation? Timeout? Something else? I'm wondering because it seems like these would also be issues for the new APIs, e.g. someone passes in a cancellation token or does TryAcquire with a timeout and it silently dooms the transaction that feels like it would be pretty undesirable and unexpected, particularly in the former case. Thoughts?

The transaction will not be doomed with the current code, because I didn't remove the save point functionality. They will be functional even if an exception will occur.

When I did try to remove it - it did impact the handling of the timeout (because it checks if the lock is held) and the new code that try to restore the previous timeout settings.

The RollBackTransactionTimeoutVariablesIfNeededAsync function still rolls back the save point for transactional locks in cases of any failure (when you get any exception - the lock is assumed as unobtained), and thus the transaction isn't aborted (see the if statement - if (needsSavePoint && !(acquired && UseTransactionScopedLock(connection)))). This was true even before my changes.

There is only one edge case that I can think of, where the save point roll back works against you in a way for transactional locks, but it probably existed before my changes (for internal connection's transactions) - when you get a timeout and still hold the lock because of the Postgres bug (lines 88-96), I assume the save point's roll back will release the lock beforehand, so you will get a timeout even though for a moment you acquired the lock every time, but this is an edge case that isn't really visible for the library's users.

Sorry I wasn't clear. What I'm trying to understand is how this interacts with the new transactional locking APIs that do not get to use the savepoint.

Under what conditions does the transaction get doomed? Can this happen under "normal" operation of the lock?

Said another way, if the savepoint is critical for all the existing lock APIs, how is removing it not a problem for the new APIs?

I am actually slightly more confused by your questions now :).

I think that maybe you didn't notice, but I changed the code so the new transactional locking APIs will always use save points, in addition to the capture/restore settings logic (see line 227) - this is because otherwise the transaction will get doomed when you get any exception while trying to acquire the lock.

So, the bottom line - the transaction will never get doomed in the new APIs, since we keep using save points in this scenario.
Just wanted to let you know that save points can't be removed from the code, because otherwise transactions will indeed reach an aborted state.

the new transactional locking APIs will always use save points

Ok sorry didn't realize this. However, now I'm wondering why it is OK to "leak" a SAVEPOINT on every lock acquisition. I'm not an expert on Postgres transactions, but this seems like a side-effect that could cause issues for people. At minimum, subsequent lock acquisitions on the same connection would try to recreate the same named SAVEPOINT which probably wouldn't work (would be good to add a test case for subsequent lock acquisitions on the same transaction).

I wonder if the real solution here is to restrict the new APIs wrt timeout/cancellation functionality. If we only support TryAcquire with timeout 0 and no cancellation, then we dodge all the issues with setting leaks and savepoint leaks (we can modify the code so that for 0-timeout acquires it doesn't issue any SET LOCAL commands and therefore also doesn't need the SAVEPOINT. Thoughts?

Your point regarding a "nested" save point with the same name was really good, so I wrote a test and checked it, and it seems to work as intended. I wondered why that is, so I went and read postgres' documentation (https://www.postgresql.org/docs/current/sql-savepoint.html):

savepoint_name
The name to give to the new savepoint. If savepoints with the same name already exist, they will be inaccessible until newer identically-named savepoints are released.

SQL requires a savepoint to be destroyed automatically when another savepoint with the same name is established. In PostgreSQL, the old savepoint is kept, though only the more recent one will be used when rolling back or releasing. (Releasing the newer savepoint with RELEASE SAVEPOINT will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming.

If you have any other concern regarding this scenario, please do tell.

As for the timeout/cancellation restrictions in the new API as a solution for the leaked save point - I am against it. It will make the new API less useful but more importantly it won't solve the issue. We can't really dodge it, if you get any exception from the DB, not a timeout or a cancellation specifically, the transaction will be dead without a save point roll back, so the restrictions won't be a safe option to prevent this scenario from happening altogether.

I think that bottom line is that the save point must be leaked in this case, because if you roll it back after an acquisition you don't have a lock, and if you don't use a save point then the transaction may be aborted (which is the least favorable option IMO), so I'm not seeing a way around it.
We can add a note about it in the new API methods, so users will be able to choose whether they can/should use the API.

Good to know that leaking a savepoint does not cause a conflict with further acquisitions!

I do want to confirm that there isn't any other observable side-effect to leaking savepoints that could cause issues for users. I'm not finding anything from my quick reading, but curious if you know either way. I want to avoid a scenario where calling the API has some hidden side-effect that causes issues for users, leading to bug reports here.

I am not aware of any other issue that may occur because of that. Do you think we should add a note about it in the new API?

… lock

madelson · 2025-01-22T01:14:39Z

src/DistributedLock.Postgres/PostgresAdvisoryLock.cs

+
+        private static int? ParsePostgresTimeout(string? timeout)
+        {
+            if (timeout == null) { return null; } // This will be the case if the timeout wasn't found in the DB. Theoretically it should never happen.


If it should never happen, why handle this?

I deleted this code line and basically reverted the whole concept of null timeouts in the code that I added.

madelson · 2025-01-22T01:15:12Z

src/DistributedLock.Postgres/PostgresAdvisoryLock.cs

+            if (timeout == null) { return null; } // This will be the case if the timeout wasn't found in the DB. Theoretically it should never happen.
+            if (timeout == "0") { return 0; } // This will be the case if the timeout is disabled.
+
+            // In any other case we need to extract the timeout from the string, since Postgres returns timeouts with their unit attached, e.g. "5000ms".


Do we actually need to parse the value? Can't we leave it as a string and just use the same string value when we restore the timeout?

Unfortuanetly we can't just send a string. When you call SET LOCAL for a timeout Postgres expects a numeric value. If you send a string you will get an error - 42601: trailing junk after numeric literal at or near "{string_value}".

madelson · 2025-01-24T00:43:47Z

src/DistributedLock.Postgres/PostgresAdvisoryLock.cs


+    private static async ValueTask<CapturedTimeoutSettings?> CaptureTimeoutSettingsIfNeededAsync(DatabaseConnection connection, CancellationToken cancellationToken)
+    {
+        var shouldCaptureTimeoutSettings = UseTransactionScopedLock(connection);


We don't need to enable this functionality except for external connections though, right?

That's true. I can change it, but I thought that it is logically sound that timeouts settings won't be leaked out of this class if possible for internally-owned transactions too.

For internally-owned connections we don't care about leakage since the transaction never gets reused. Therefore, I'd like to avoid the overhead (2 additional queries).

Understood, fixed it.

madelson · 2025-01-24T00:47:31Z

src/DistributedLock.Postgres/PostgresAdvisoryLock.cs

        {
-            if (needsSavePoint 
+            if (needsSavePoint
                // For transaction scoped locks, we can't roll back the save point on success because that will roll back our hold on the lock.


This used to also say It's ok to "leak" the savepoint in that case because it's an internally-owned transaction/connection and the savepoint will be cleaned up with the disposal of the transaction.

The previous justification doesn't quite hold since it is now sometimes externally-owned transaction.

We should add back a comment justifying the leak.

I will add a comment after we'll decide what to do regarding the leaked save points.

Added a comment

madelson · 2025-01-26T21:06:51Z

src/DistributedLock.Tests/Tests/Postgres/PostgresDistributedLockExtensionsTest.cs

+
+
+    [Test]
+    public async Task TestWorksForMultipleLocksUnderTheSameConnectionWithExternalTransaction()


Let's leave a comment here along the lines of "Each acquisition creates the same named savepoint; this seems like it would create a conflict but it actually works fine in Postgres ".

src/DistributedLock.Postgres/PostgresAdvisoryLock.cs

madelson · 2025-01-29T00:42:36Z

Thanks for working on this @Tzachi009 !

Improve comments

cfb4f9a

Tzachi009 mentioned this pull request Jan 7, 2025

#213 Postgres: Add support for transaction-scoped advisory locks with external transactions #222

Merged

madelson reviewed Jan 8, 2025

View reviewed changes

Add support for capturing and restoring timeout settings for Postgres…

3010de2

… lock

madelson reviewed Jan 22, 2025

View reviewed changes

Changes after CR

dfb2cd2

madelson reviewed Jan 24, 2025

View reviewed changes

New test

afdc2d6

madelson reviewed Jan 26, 2025

View reviewed changes

Fixes after CR

fe04d5e

madelson reviewed Jan 28, 2025

View reviewed changes

src/DistributedLock.Postgres/PostgresAdvisoryLock.cs Show resolved Hide resolved

madelson approved these changes Jan 28, 2025

View reviewed changes

madelson merged commit d5a2439 into madelson:release-2.6 Jan 28, 2025
1 check passed



		[Test]
		public async Task TestWorksForMultipleLocksUnderTheSameConnectionWithExternalTransaction()

#213 Postgres: Add support for transaction-scoped advisory locks with external transactions - Continuation #235

#213 Postgres: Add support for transaction-scoped advisory locks with external transactions - Continuation #235

Uh oh!

Conversation

Tzachi009 commented Jan 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tzachi009 Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

madelson Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tzachi009 Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

madelson commented Jan 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tzachi009 Jan 22, 2025 •

edited

Loading

madelson Jan 23, 2025 •

edited

Loading

Tzachi009 Jan 22, 2025 •

edited

Loading