Skip to content

Make rowid optional in DELETE query plan#25284

Merged
ghelmling merged 1 commit intoprestodb:masterfrom
ghelmling:export-D76325048
Jul 2, 2025
Merged

Make rowid optional in DELETE query plan#25284
ghelmling merged 1 commit intoprestodb:masterfrom
ghelmling:export-D76325048

Conversation

@ghelmling
Copy link
Contributor

Summary: Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner. If a connector does not actually use $row_id to implement DELETE, then we should not require it. This makes $row_id optional. If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048

@ghelmling ghelmling requested review from a team, ZacBlanco and hantangwangd as code owners June 11, 2025 04:38
@ghelmling ghelmling requested a review from jaystarshot June 11, 2025 04:38
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Jun 11, 2025
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76325048

@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76325048

ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 12, 2025
Summary:
Pull Request resolved: prestodb#25284

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76325048

ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 12, 2025
Summary:
Pull Request resolved: prestodb#25284

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76325048

ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 13, 2025
Summary:
Pull Request resolved: prestodb#25284

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76325048

ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 16, 2025
Summary:
Pull Request resolved: prestodb#25284

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
* {@link com.facebook.presto.spi.UpdatablePageSource} that created them.
*/
default ColumnHandle getDeleteRowIdColumnHandle(ConnectorSession session, ConnectorTableHandle tableHandle)
default Optional<ColumnHandle> getDeleteRowIdColumnHandle(ConnectorSession session, ConnectorTableHandle tableHandle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One little question: should we consider backward compatibility for potential externally-implemented connectors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a preference for how you would want to support this instead?

I suppose another option would be to add a guard method to the interface that could disable the behavior:
boolean requiresDeleteRowIdColumnHandle()

That seems kind of cumbersome to have 2 methods for both the update and delete rowId handling. I thought that the Optional usage would encapsulate this better.

Or we could add a system session property that QueryPlanner checks before injecting the rowId. That would avoid the ConnectorMetadata API change. But that doesn't seem quite as clean, as whether or not the rowId column is required is more a function/requirement of the connector implementation. So you could theoretically be using different connectors with different requirements.

I'm open to any other suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to ensure backward compatibility for this Connector SPI change, the modification should not affect any potentially existing external connectors that may have already implemented the modified SPI methods getDeleteRowIdColumnHandle or getUpdateRowIdColumnHandle.

IMO, we can refer to a previous SPI method modification that took backward compatibility into account, see here. In this approach, we retain the original SPI methods but mark them as @deprecated, while adding new SPI methods with default implementations that internally call the legacy methods. Meanwhile, the engine's internal implementation logic now exclusively uses new SPI methods. This ensures that the SPI changes remain backward-compatible, leaving all existing connectors unaffected.

One minor issue with this approach is that the methods need to maintain distinct signatures, which may require using different method names for the new ones. The code would roughly look like this:

    @Deprecated
    default ColumnHandle getUpdateRowIdColumnHandle(ConnectorSession session, ConnectorTableHandle tableHandle, List<ColumnHandle> updatedColumns)
    {
        throw new PrestoException(NOT_SUPPORTED, "This connector does not support updates");
    }

    default Optional<ColumnHandle> getUpdateRowIdColumn(ConnectorSession session, ConnectorTableHandle tableHandle, List<ColumnHandle> updatedColumns)
    {
        ColumnHandle columnHandle = getUpdateRowIdColumnHandle(session, tableHandle, updatedColumns);
        return Optional.ofNullable(columnHandle);
    }

In this way, all existing connectors (especially external-implemented connectors which we can not modify directly) would not be affected by this SPI interface change. And we can remove these deprecated SPI methods in some major version upgrade that breaks backward compatibility.

What's your opinion about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reply. Yeah, that approach makes sense. I can make that update.

@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76325048

ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 18, 2025
Summary:
Pull Request resolved: prestodb#25284

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 20, 2025
Summary:

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76325048

ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 20, 2025
Summary:
Pull Request resolved: prestodb#25284

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 22, 2025
Summary:

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76325048

ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 23, 2025
Summary:
Pull Request resolved: prestodb#25284

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 24, 2025
Summary:

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76325048

ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 24, 2025
Summary:
Pull Request resolved: prestodb#25284

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 25, 2025
Summary:

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76325048

ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 25, 2025
Summary:
Pull Request resolved: prestodb#25284

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
@ghelmling
Copy link
Contributor Author

@hantangwangd would you mind taking another look? The FB internal failing test is due to a corresponding internal code change needed.

hantangwangd
hantangwangd previously approved these changes Jun 27, 2025
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactoring, I have verified it on Iceberg V1 tables that only support metadata deletion. Looks good to me, just some little nits.

ghelmling added a commit to ghelmling/presto that referenced this pull request Jun 27, 2025
Summary:

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
Summary:
Pull Request resolved: prestodb#25284

Currently, the $row_id column is automatically inserted into the output variables for DELETE queries by QueryPlanner.  If a connector does not actually use $row_id to implement DELETE, then we should not require it.  This makes $row_id optional.  If the Optional is empty, then we don't need to project the output variable.

Differential Revision: D76325048
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D76325048

@ghelmling ghelmling merged commit b7289b6 into prestodb:master Jul 2, 2025
109 of 110 checks passed
@prestodb-ci prestodb-ci mentioned this pull request Jul 28, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants