Skip to content

Add a limit on total number of bytes read from storage in table scan#14739

Merged
tdcmeehan merged 1 commit intoprestodb:masterfrom
fgwang7w:14701
Jul 31, 2020
Merged

Add a limit on total number of bytes read from storage in table scan#14739
tdcmeehan merged 1 commit intoprestodb:masterfrom
fgwang7w:14701

Conversation

@fgwang7w
Copy link
Member

@fgwang7w fgwang7w commented Jun 28, 2020

Fixes #14701

== RELEASE NOTES ==

General Changes
* Add `query.max-scan-physical-bytes` configuration and `query_max_scan_physical_bytes` session properties to limit total number of bytes read from storage during table scan. The default limit is 1PB.

@fgwang7w fgwang7w changed the title to #14701: Support query level scan bytes limits Support query level scan bytes limits Jun 28, 2020
@fgwang7w fgwang7w requested a review from mbasmanova June 29, 2020 04:04
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fgwang7w Some initial comments. Would it make sense to have both soft and hard limit? E.g. sort limit would generate a warning, while hard limit will fail the query. See query_max_scan_physical_bytes for an example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getQueryMaxScanPhysicalBytes(session) returns a value that can be used as is without further checking.

This code ensures that this property defaults to what's specified in config:

                dataSizeProperty(
                        QUERY_MAX_SCAN_PHYSICAL_BYTES,
                        "Maximum scan physical bytes of a query",
                        queryManagerConfig.getQueryMaxScanPhysicalBytes(),
                        false),

and this code defines the default value that is used if nothing is specified in config file:

private DataSize queryMaxScanPhysicalBytes = DataSize.succinctDataSize(1, PETABYTE);

Hence, the code can be simplified like this:

        for (QueryExecution query : queryTracker.getAllQueries()) {
            DataSize limit = getQueryMaxScanPhysicalBytes(query.getSession());
            DataSize scan = query.getQueryInfo().getQueryStats().getRawInputDataSize();
            if (scan.compareTo(limit) >= 0) {
                query.fail(new ExceededScanLimitException(limit));
            }
        }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the proposal. I have made another version of how to implement this limit based on your suggestion. The new version is now includes a method defines in QueryExecution to collect scanned data size which is implemented in SqlQueryExecution to collect finalQueryInfo's rawInputDataSize. Please help review again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify what does the limit applies to? E.g. whether it applies to amount of data read from storage before or after compression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the amount of consumed input bytes read from storage via QueryInfo's getQueryStats().getRawInputDataSize()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurring @mbasmanova's comment - we should be clear about what scan limit has been exceeded. I will recommend EXCEEDED_SCAN_RAW_BYTES_READ_LIMIT. The message can also be made more explicit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the message has been fixed to indicate the scan bytes limit has exceeded for this exception

Copy link
Contributor

@viczhang861 viczhang861 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is QueryStats::rawInputDataSize equivalent to scanPhysicalBytes? does reading from materialized intermediate table included? cc @arhimondr

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in SystemSessionProperties - lets use the same name as the actual metric - queryMaxRawInputBytes. In accordance with that lets change the config name as well - query.max-raw-input-bytes.

@mbasmanova
Copy link
Contributor

@fgwang7w Would you squash all 5 commits into one?

Copy link

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, lets ensure that the logic works and in addition to that, lets use the same term rawInputBytes instead of scannedBytes so that it is easy for the users to understand which metric did they exceed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurring @mbasmanova's comment - we should be clear about what scan limit has been exceeded. I will recommend EXCEEDED_SCAN_RAW_BYTES_READ_LIMIT. The message can also be made more explicit

Comment on lines 759 to 591

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see a lot of issues with this logic:

  1. We are reading finalQueryInfo which is published only when the query is finished - can you test and see if this number is actually published when the query is running ?
  2. For every stage that is a scan stage, we take the whole query's bytes read and add it. So if there are 2 scan stages - in that case, you will just return 2 * bytesreadbyquery.
  3. getAllStages does a DFS traversal every time - and given that it will be called in a loop, this might be an expensive operation and we don't really care about the ordering here.

In my opinion - we should do something similar to the existing logic that exists for getTotalCpuTime. That will help ensure that the logics are similar and it is easy to change them together if we ever decide to head that path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so this is newly implemented, the original proposal was a simple approach which is to obtain the scanned data size from queryInfo::queryStats if the final resolution is to do something similiar to the existing logic with getTotalCpuTime.
DataSize scan = query.getQueryInfo().getQueryStats().getRawInputDataSize();
@mbasmanova what's your suggestion on this one?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query.getQueryInfo().getQueryStats() is a sort of expensive method since it aggregates all the stats and not just bytes read. We should keep this data collection as light as possible in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @mayankgarg1990 -- this can be made to be less expensive and consistent with how we calculate CPU time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @mayankgarg1990 @tdcmeehan for the comment, will revise this code accordingly to align with general data collection logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these comments addressed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this code block has been removed as to make it simple and similar to how cpu limits enforcement is handled

@fgwang7w fgwang7w force-pushed the 14701 branch 2 times, most recently from 8f3188d to 0bb4527 Compare July 2, 2020 19:15
@fgwang7w
Copy link
Member Author

fgwang7w commented Jul 2, 2020

@fgwang7w Would you squash all 5 commits into one?

done, squashed into 1 commit now

@mbasmanova
Copy link
Contributor

@fgwang7w I'm in meetings all day today and won't have time to review/answer questions on this PR before the long weekend. I'll take a look early next week.

@fgwang7w
Copy link
Member Author

fgwang7w commented Jul 2, 2020

@fgwang7w I'm in meetings all day today and won't have time to review/answer questions on this PR before the long weekend. I'll take a look early next week.

Sure thank you, and in the meantime I will do more testing to ensure quality of the code

@fgwang7w
Copy link
Member Author

hi @mbasmanova please help review the revised commit, many thanks~

@mbasmanova
Copy link
Contributor

@fgwang7w Looks good to, but I'll defer to @mayankgarg1990 and @tdcmeehan to confirm that their comments have been addressed properly. Commit message needs to be updated to match the guidelines at https://chris.beams.io/posts/git-commit/ . For example,

Add a limit on total number of bytes read from storage in table scan

Add query.max-scan-physical-bytes configuration and query_max_scan_physical_bytes 
session properties to limit the total number of bytes reads from storage during table scan. 
The default limit is 1PB.

@mbasmanova mbasmanova requested review from bhhari and yingsu00 July 20, 2020 14:06
@mbasmanova mbasmanova changed the title Support query level scan bytes limits Add a limit on total number of bytes read from storage in table scan Jul 20, 2020
@mbasmanova
Copy link
Contributor

@mayankgarg1990 @tdcmeehan Mayank, Tim, would you take another look?

@mayankgarg1990
Copy link

@mbasmanova , this is on my radar, I will get to it by tomorrow (7/22) :)

@mbasmanova
Copy link
Contributor

Thank you, Mayank.

@fgwang7w fgwang7w force-pushed the 14701 branch 2 times, most recently from 3196c11 to 2f7fffb Compare July 21, 2020 21:33
@fgwang7w
Copy link
Member Author

fgwang7w commented Jul 24, 2020

@mayankgarg1990 @tdcmeehan Hi Could you please help review the commit for the upcoming release merge? many thanks!

@fgwang7w fgwang7w removed the request for review from bhhari July 24, 2020 21:55
@fgwang7w fgwang7w requested review from mayankgarg1990 and mbasmanova and removed request for yingsu00 July 24, 2020 21:55
@mayankgarg1990
Copy link

@fgwang7w - As @tdcmeehan pointed out - my comments from my previous review are not addressed yet. @tdcmeehan commented 2 days ago

@fgwang7w
Copy link
Member Author

@fgwang7w - As @tdcmeehan pointed out - my comments from my previous review are not addressed yet. @tdcmeehan commented 2 days ago

yes I resubmitted the squashed commit code just now and replied to both you @mayankgarg1990 and @tdcmeehan regarding the querystats perf issue for getScannedBytes method and default scan limit. Basically I have simplified the code and reduce unnecessary method calls per suggestions. The default scan limit is also removed for now. Please give another round of review, many thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting a new comment since the old comment was marked as resolved. Lets match this with the actual metric (rawInputBytes) and the exception name - QUERY_MAX_SCAN_RAW_INPUT_BYTES

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure actually I would synchronize all method call names and varilables to QueryMaxScanRawInputBytes to map with rawInputBytes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in SystemSessionProperties - lets use the same name as the actual metric - queryMaxRawInputBytes. In accordance with that lets change the config name as well - query.max-raw-input-bytes.

Comment on lines 334 to 335

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these 2 are not being used - remove these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, it's removed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a bit expensive since getting querystats involves -> for all stages, for all tasks, for all metrics.

The way CPU does it is still cheaper -> all stages, all tasks, just the cpu - so we can do something similar here and make it even cheaper -

We can follow a similar flow here

SqlQueryManager -> SqlQuerySchedulerInterface#getTotalCpuTime -> SqlStageExecution#getTotalCpuTime

and just sum up the bytes involved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain if your approach about summing up bytes from SqlStageExecution#getTotalScanBytes is accurate... it's always the best practice to inherit scanned byte size from existing querystats to ensure the result is correct. I suggest we maintain current implementation with SqlQueryManager:getBasicQueryInfo-> BasicQueryStats:getQueryStats . How much cheaper are we suggesting to reduce if we want to bypass the existing logic with a risk that we might have a false result?
My option is that rawInputDataSize should be the only reliable source of truth when setting against with the hard scan limit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that BasicQueryStats:getQueryStats is the only trusted source. As you can see, we are already doing this for total cpu ms and memory limits so I don't see why raw input bytes will be any different here. Again, my only concern is that this is a single thread already doing cpu and memory enforcements and we should keep it as light weight as possible and by keeping this trend, we will ensure that the new entries that are added also follow this lighter weight approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood, thank you! I have implemented your approach, I think it is safer and cheaper to acquire actual number via SqlQuerySchedulerInterface::getRawInputDataSize -> QueryExecution::getRawInputDataSize, and sum up all task's stats via SqlStageExecution::getRawInputDataSize. Please review the revised commit again, many thanks!

@fgwang7w fgwang7w force-pushed the 14701 branch 3 times, most recently from 0d8cff0 to 7837b6b Compare July 29, 2020 07:52
Copy link

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good - just last comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - lets rename this as rawInputSize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks

Comment on lines 338 to 343

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add this check at the top to ensure that we are only considering table scan nodes -

if (planFragment.getTableScanSchedulingOrder().isEmpty()) {
    return new DataSize(0, BYTE);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, we need to bypass when source is empty, fixed

Copy link

@mayankgarg1990 mayankgarg1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good - one last comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets set the default to a higher number to avoid unexpected failures when people deploy this new version. Every sys admin should be able to set a reasonable value in their configurations. How about an exabyte (1000, PETABYTE)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, this field is adjustable anytime by DBAs. Default is revised to 1000PB.

Add query.max-scan-physical-bytes configuration and query_max_scan_physical_bytes
session properties to limit the total number of bytes reads from storage during table scan.
The default limit is 1PB.
object -> object);
}

public static PropertyMetadata<DataSize> dataSizeProperty(String name, String description, DataSize defaultValue, boolean hidden)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@tdcmeehan tdcmeehan merged commit fb8bb9f into prestodb:master Jul 31, 2020
@caithagoras caithagoras mentioned this pull request Aug 14, 2020
7 tasks
@fgwang7w fgwang7w deleted the 14701 branch September 4, 2020 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support query level scan bytes limits

5 participants