-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Last N Value Cache #25091
Comments
BTW another potential implementation would be to use an OptimizerRule to rewrite plans with relevant references to use a new table provider. Here is an example of how to do that: apache/datafusion#11087 |
I'd like it to be explicit to the user that they're requesting values from the cache. That way they know the semantics behind it (i.e. the cache only has data from when the server was running and accepting writes). We could do the optimizer in addition to that, but ensuring the actual result is the same as a non-optimized result will be tricky as it's just a cache and not the raw underlying data. |
Here is a good example of the last cache behaviour and how the key columns are used in the cache: #25109 (review) |
Edit: added some additional points to the Other Requirements section in issue description detailing key and value column requirements:
|
Edit: added cache creation requirements to original issue description:
TTL was suggested here: #25109 (comment) |
Keeping track of some questions I have here, I will update this as I think of more.
|
I think this would be okay, it is just going to cache all columns (or whatever value columns they specify), but wont be optimized for any columns as predicates.
My thinking is that it can pull records from all associated caches when a name is not specified - all the caches associated with a table will all be populated from the same write batches, so their time stamps should be aligned and therefore this would be possible. In a lot of cases, users may only have one cache associated with a given table, so I think it would be a better UX if they did not have to specify the cache name on every call. A more common use case than above would be if there was a single cache associated with a table. For example, if there is only one cache, named SELECT * FROM last_cache('cpu') -- defaults to using cache1, the only cache for the `cpu` table
SELECT * FROM last_cache('cpu', 'cache1') |
A Last N Value Cache will allow users to access the last value of many series (either by identifier or group) very quickly (<10ms).
Users should be able to specify for a given table and set of columns, the last N values they want to keep cached in RAM. This will be a feature available in both open source and Pro, but there will be limitations in the former.
For a given table, the user would specify the lookup key (i.e. columns to lookup by), the number of values to cache, and the columns (either by name or
*
) that they want in the cache. The time of the values will always be included.Cache Creation
To create a cache, users specify:
<table_name>_<key_columns>_last_cache
)We would like the front-end for this to be available via a REST API.
The configuration of each cache will be stored in the catalog.
Populating the Cache
In open source, the cache should be populated as a write through while the server is running. In Pro, this will also be the case, but Pro will also have the ability to fill the cache from historical data on boot-up.
Cache Queries
Querying the cache will require a specialized query. The query syntax could look like so:
This is a use-case for DataFusion's User-Defined Table Functions (UDTF).
In some cases, query predicates may be handled directly by the cache's
TableProvider
/TableFunctionImpl
, while more complicated predicates could just be passed back up to the query engine, but where we draw that line remains TBD.Other Requirements
Tasks
TableFunctionImpl
to enable queries from the last_cache #25095last_cache
system table #25098influxdb3_client
#25099influxdb3
CLI #25100TableProvider
#25174The text was updated successfully, but these errors were encountered: