Skip to content

[native] Add caching of parsed Types#21325

Merged
xiaoxmeng merged 1 commit intoprestodb:masterfrom
kevinwilfong:cache_type_conversion
Nov 13, 2023
Merged

[native] Add caching of parsed Types#21325
xiaoxmeng merged 1 commit intoprestodb:masterfrom
kevinwilfong:cache_type_conversion

Conversation

@kevinwilfong
Copy link
Contributor

We've seen cases of queries that spend a large amount of time just parsing types when converting the Presto Plan to Velox. This seems to be because it parses the same large Row Types that are used across many field accesses.

Adding caching within a request shows a substantial decrease in the amount of time it takes to do the conversion.

Notably, this helps with timeouts we're seeing making calls from the coordinator to create tasks on the Workers.

@kevinwilfong kevinwilfong requested a review from a team as a code owner November 6, 2023 23:21
@kevinwilfong kevinwilfong marked this pull request as draft November 6, 2023 23:21
@kevinwilfong kevinwilfong force-pushed the cache_type_conversion branch 2 times, most recently from 2d8fc42 to 17b39fa Compare November 7, 2023 21:31
@kevinwilfong kevinwilfong marked this pull request as ready for review November 8, 2023 19:11
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinwilfong nice catch. Thanks for the optimization!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark pool_ and typeParser_ as consts? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NYC: drop explicit as the ctor takes more than one input? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NYC: mark poo_ and queryCtx_ as consts?

velox::memory::MemoryPool* const pool_;
velox::core::QueryCtx* const queryCtx_;

We've seen cases of queries that spend a large amount of time just parsing
types when converting the Presto Plan to Velox. This seems to be because it
parses the same large Row Types that are used across many field accesses.

Adding caching within a request shows a substantial decrease in the amount of
time it takes to do the conversion.

Notably, this helps with timeouts we're seeing making calls from the coordinator
to create tasks on the Workers.
@xiaoxmeng xiaoxmeng merged commit d1c5d83 into prestodb:master Nov 13, 2023
@majetideepak
Copy link
Collaborator

majetideepak commented Nov 15, 2023

@kevinwilfong I am adding Presto type parser support using Flex/Bison in Velox. facebookincubator/velox#7568
The end goal is to replace Antlr with that and remove a dependency.
I will add support for caching.
Is there a benchmark to evaluate the performance?

velox::TypePtr parse(const std::string& text) const;

private:
mutable std::unordered_map<std::string, velox::TypePtr> cache_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use the SimpleLRUCache from Velox?
We use that to cache file handles
https://github.com/facebookincubator/velox/blob/main/velox/connectors/hive/FileHandle.h#L62

@majetideepak
Copy link
Collaborator

I am worried that without a bound, the cache might grow too big in a production system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants