Skip to content

Conversation

@kekwan
Copy link
Contributor

@kekwan kekwan commented Feb 21, 2025

Description

This PR adds support to track physical data scan usage in resource groups. This is an optional feature in resource groups and behaves similar to cpuLimit. This is a helpful feature for admins who need to 'throttle' their users from pulling too much data from file system. Quota based tracking will penalize users accordingly.

Additional context and related issues

#25003

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## General
* Add physical data scan tracking to resource groups({issue}`25003`)
## SPI
*  Methods added to io.trino.spi.resourcegroups for tracking data scan usage({issue}`25003`)

@cla-bot cla-bot bot added the cla-signed label Feb 21, 2025
@github-actions github-actions bot added the docs label Feb 21, 2025
@kekwan kekwan force-pushed the data-scan-resource-groups branch 5 times, most recently from a9ca3a5 to 8841fac Compare February 22, 2025 01:13
@kekwan kekwan marked this pull request as ready for review February 22, 2025 01:13
@kekwan kekwan changed the title [WIP] Add physical data scan tracking to resource groups Add physical data scan tracking to resource groups Feb 22, 2025
@kekwan kekwan force-pushed the data-scan-resource-groups branch 2 times, most recently from 6b533d3 to bc39466 Compare February 22, 2025 01:34
@wendigo wendigo requested review from hashhar, lukasz-stec and piotrrzysko and removed request for lukasz-stec February 22, 2025 16:03
@kekwan
Copy link
Contributor Author

kekwan commented Mar 5, 2025

Hi @hashhar. Just a gentle reminder on this PR in case you forgot. Thank you!

@kekwan kekwan force-pushed the data-scan-resource-groups branch 2 times, most recently from fab19d4 to 2074032 Compare March 14, 2025 16:33
@kekwan kekwan force-pushed the data-scan-resource-groups branch 2 times, most recently from 9c548c3 to 819ddd9 Compare March 24, 2025 21:54
@kekwan kekwan requested a review from wendigo April 12, 2025 00:53
@kekwan kekwan force-pushed the data-scan-resource-groups branch from 819ddd9 to 7948bc4 Compare April 25, 2025 22:30
@github-actions
Copy link

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions bot added the stale label May 19, 2025
@github-actions
Copy link

github-actions bot commented Jun 9, 2025

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Jun 9, 2025
@kekwan kekwan reopened this Jun 9, 2025
@kekwan
Copy link
Contributor Author

kekwan commented Jun 9, 2025

@piotrrzysko @hashhar @lukasz-stec @wendigo Could I get a review here?

@github-actions github-actions bot removed the stale label Jun 10, 2025
@kekwan kekwan force-pushed the data-scan-resource-groups branch 3 times, most recently from df31081 to 24e79c7 Compare July 2, 2025 20:45
@kekwan kekwan force-pushed the data-scan-resource-groups branch 5 times, most recently from 448d088 to 27d4fab Compare July 24, 2025 20:55
@kekwan kekwan requested a review from lukasz-stec July 24, 2025 20:55
}

@Test
public void testMigrationWithOldResourceGroupsSchema()
Copy link
Contributor Author

@kekwan kekwan Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this test as it is not compatible with the new constraint. I feel this test is unnecessary.

I'm not sure why only this MySQL test supports old data during the migration. Postgres and Oracle tests do not have this test.

@kekwan
Copy link
Contributor Author

kekwan commented Jul 24, 2025

@lukasz-stec I have made the changes to support period-based tracking by adding a new property physicalDataScanQuotaPeriod. Please take a look, thanks.

@kekwan kekwan requested a review from xkrogen July 24, 2025 23:02
Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good % minor comments

{
private final long cpuUsageMillis;
private final long memoryUsageBytes;
private final long physicalDataScanUsageBytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the source of this metric (getPhysicalInputDataSize), rename it please to something the has physicalInput in the name e.g. physicalInputDataUsageBytes or physicalInputUsageBytes. Here and everywhere else.

Copy link
Contributor Author

@kekwan kekwan Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. All the internal references of the data usage now use physicalInputDataUsageBytes.

For the user exposed limits and quota period, those still have physical data scan in the naming which I think is fine. It is consistent with settings like query.max-scan-physical-bytes

Copy link
Member

@xkrogen xkrogen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this feature, I think it's a great enhancement!

@kekwan kekwan force-pushed the data-scan-resource-groups branch from 5c811c5 to 1509517 Compare July 25, 2025 19:16
@kekwan
Copy link
Contributor Author

kekwan commented Jul 25, 2025

@xkrogen @lukasz-stec thanks for the review, i've addressed all the comments above

@kekwan kekwan requested review from lukasz-stec and xkrogen July 28, 2025 20:52
Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kekwan Looks good generally. Please squash the commits as all 3 are tightly related.
Great work!
We also need a maintainer review. @hashhar or @wendigo, Can one of you give it a go?

Add physical data scan tracking to resource groups. This is an optional feature in resource groups that tracks data scan via quota based tracking.
@kekwan kekwan force-pushed the data-scan-resource-groups branch from 1509517 to 2a9d9dd Compare August 6, 2025 01:40
{
this.cpuUsageMillis = cpuUsageMillis;
this.memoryUsageBytes = memoryUsageBytes;
this.physicalInputDataUsageBytes = physicalInputDataUsageBytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we validate that these values are non-negative? (can be a follow up)

private final long physicalInputDataUsageBytes;

public ResourceUsage(long cpuUsageMillis, long memoryUsageBytes)
public ResourceUsage(long cpuUsageMillis, long memoryUsageBytes, long physicalInputDataUsageBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be converted to record class (can be a follow up)

@wendigo wendigo merged commit 0a39998 into trinodb:master Aug 6, 2025
98 checks passed
@github-actions github-actions bot added this to the 477 milestone Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants