Skip to content

Conversation

@Tzachi009
Copy link


private static async ValueTask<bool> ShouldDefineSavePoint(DatabaseConnection connection)
{
// If the connection is internally-owned, we only define a save point if a transaction has been opened.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madelson I added the functionality for capturing and restoring the timeout settings, please review it.

You should know that I tried to remove the save point functionality entirely, but turns out that you can't do that. In case you get an error when attempting to acquire the lock, you must roll back the transaction (or the save point) if you want to run any other DB command, otherwise you will get an error about the aborted transaction.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you get an error when attempting to acquire the lock, you must roll back the transaction (or the save point) if you want to run any other DB command, otherwise you will get an error about the aborted transaction.

What exactly are the cases impacted by this? Cancellation? Timeout? Something else? I'm wondering because it seems like these would also be issues for the new APIs, e.g. someone passes in a cancellation token or does TryAcquire with a timeout and it silently dooms the transaction that feels like it would be pretty undesirable and unexpected, particularly in the former case. Thoughts?

Copy link
Author

@Tzachi009 Tzachi009 Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transaction will not be doomed with the current code, because I didn't remove the save point functionality. They will be functional even if an exception will occur.

When I did try to remove it - it did impact the handling of the timeout (because it checks if the lock is held) and the new code that try to restore the previous timeout settings.

The RollBackTransactionTimeoutVariablesIfNeededAsync function still rolls back the save point for transactional locks in cases of any failure (when you get any exception - the lock is assumed as unobtained), and thus the transaction isn't aborted (see the if statement - if (needsSavePoint && !(acquired && UseTransactionScopedLock(connection)))). This was true even before my changes.

There is only one edge case that I can think of, where the save point roll back works against you in a way for transactional locks, but it probably existed before my changes (for internal connection's transactions) - when you get a timeout and still hold the lock because of the Postgres bug (lines 88-96), I assume the save point's roll back will release the lock beforehand, so you will get a timeout even though for a moment you acquired the lock every time, but this is an edge case that isn't really visible for the library's users.

Copy link
Owner

@madelson madelson Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear. What I'm trying to understand is how this interacts with the new transactional locking APIs that do not get to use the savepoint.

Under what conditions does the transaction get doomed? Can this happen under "normal" operation of the lock?

Said another way, if the savepoint is critical for all the existing lock APIs, how is removing it not a problem for the new APIs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually slightly more confused by your questions now :).

I think that maybe you didn't notice, but I changed the code so the new transactional locking APIs will always use save points, in addition to the capture/restore settings logic (see line 227) - this is because otherwise the transaction will get doomed when you get any exception while trying to acquire the lock.

So, the bottom line - the transaction will never get doomed in the new APIs, since we keep using save points in this scenario.
Just wanted to let you know that save points can't be removed from the code, because otherwise transactions will indeed reach an aborted state.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new transactional locking APIs will always use save points

Ok sorry didn't realize this. However, now I'm wondering why it is OK to "leak" a SAVEPOINT on every lock acquisition. I'm not an expert on Postgres transactions, but this seems like a side-effect that could cause issues for people. At minimum, subsequent lock acquisitions on the same connection would try to recreate the same named SAVEPOINT which probably wouldn't work (would be good to add a test case for subsequent lock acquisitions on the same transaction).

I wonder if the real solution here is to restrict the new APIs wrt timeout/cancellation functionality. If we only support TryAcquire with timeout 0 and no cancellation, then we dodge all the issues with setting leaks and savepoint leaks (we can modify the code so that for 0-timeout acquires it doesn't issue any SET LOCAL commands and therefore also doesn't need the SAVEPOINT. Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your point regarding a "nested" save point with the same name was really good, so I wrote a test and checked it, and it seems to work as intended. I wondered why that is, so I went and read postgres' documentation (https://www.postgresql.org/docs/current/sql-savepoint.html):

savepoint_name
The name to give to the new savepoint. If savepoints with the same name already exist, they will be inaccessible until newer identically-named savepoints are released.

SQL requires a savepoint to be destroyed automatically when another savepoint with the same name is established. In PostgreSQL, the old savepoint is kept, though only the more recent one will be used when rolling back or releasing. (Releasing the newer savepoint with RELEASE SAVEPOINT will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming.

If you have any other concern regarding this scenario, please do tell.

As for the timeout/cancellation restrictions in the new API as a solution for the leaked save point - I am against it. It will make the new API less useful but more importantly it won't solve the issue. We can't really dodge it, if you get any exception from the DB, not a timeout or a cancellation specifically, the transaction will be dead without a save point roll back, so the restrictions won't be a safe option to prevent this scenario from happening altogether.

I think that bottom line is that the save point must be leaked in this case, because if you roll it back after an acquisition you don't have a lock, and if you don't use a save point then the transaction may be aborted (which is the least favorable option IMO), so I'm not seeing a way around it.
We can add a note about it in the new API methods, so users will be able to choose whether they can/should use the API.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know that leaking a savepoint does not cause a conflict with further acquisitions!

I do want to confirm that there isn't any other observable side-effect to leaking savepoints that could cause issues for users. I'm not finding anything from my quick reading, but curious if you know either way. I want to avoid a scenario where calling the API has some hidden side-effect that causes issues for users, leading to bug reports here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of any other issue that may occur because of that. Do you think we should add a note about it in the new API?


private static int? ParsePostgresTimeout(string? timeout)
{
if (timeout == null) { return null; } // This will be the case if the timeout wasn't found in the DB. Theoretically it should never happen.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it should never happen, why handle this?

Copy link
Author

@Tzachi009 Tzachi009 Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this code line and basically reverted the whole concept of null timeouts in the code that I added.

if (timeout == null) { return null; } // This will be the case if the timeout wasn't found in the DB. Theoretically it should never happen.
if (timeout == "0") { return 0; } // This will be the case if the timeout is disabled.

// In any other case we need to extract the timeout from the string, since Postgres returns timeouts with their unit attached, e.g. "5000ms".
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need to parse the value? Can't we leave it as a string and just use the same string value when we restore the timeout?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortuanetly we can't just send a string. When you call SET LOCAL for a timeout Postgres expects a numeric value. If you send a string you will get an error - 42601: trailing junk after numeric literal at or near "{string_value}".


private static async ValueTask<CapturedTimeoutSettings?> CaptureTimeoutSettingsIfNeededAsync(DatabaseConnection connection, CancellationToken cancellationToken)
{
var shouldCaptureTimeoutSettings = UseTransactionScopedLock(connection);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to enable this functionality except for external connections though, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I can change it, but I thought that it is logically sound that timeouts settings won't be leaked out of this class if possible for internally-owned transactions too.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For internally-owned connections we don't care about leakage since the transaction never gets reused. Therefore, I'd like to avoid the overhead (2 additional queries).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, fixed it.

{
if (needsSavePoint
if (needsSavePoint
// For transaction scoped locks, we can't roll back the save point on success because that will roll back our hold on the lock.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to also say It's ok to "leak" the savepoint in that case because it's an internally-owned transaction/connection and the savepoint will be cleaned up with the disposal of the transaction.

The previous justification doesn't quite hold since it is now sometimes externally-owned transaction.

We should add back a comment justifying the leak.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment after we'll decide what to do regarding the leaked save points.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment



[Test]
public async Task TestWorksForMultipleLocksUnderTheSameConnectionWithExternalTransaction()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave a comment here along the lines of "Each acquisition creates the same named savepoint; this seems like it would create a conflict but it actually works fine in Postgres ".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@madelson madelson merged commit d5a2439 into madelson:release-2.6 Jan 28, 2025
1 check passed
@madelson
Copy link
Owner

Thanks for working on this @Tzachi009 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants