-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(DC): Add Storage selector support for Entity configuration #3545
Conversation
4e82b8e
to
1aa4071
Compare
1aa4071
to
9b7e9d9
Compare
Codecov ReportBase: 92.19% // Head: 92.47% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #3545 +/- ##
==========================================
+ Coverage 92.19% 92.47% +0.27%
==========================================
Files 726 732 +6
Lines 33790 33905 +115
==========================================
+ Hits 31152 31352 +200
+ Misses 2638 2553 -85
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
storage_and_mappers_list: Sequence[StorageAndMappers], | ||
) -> StorageAndMappers: | ||
for sm_tuple in storage_and_mappers_list: | ||
if storage == sm_tuple.storage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check feels a bit brittle. The problem is that we don't have a proper equality check for storages, so what you are actually checking here is that the instance of the storages is the same (Python compares the memory address of the variables and sees if they are the same).
In theory that should be fine, since we only instantiate each storage once, but it would be better to compare the storage key I think. Storages are aware of their own storage key. storage.storage_key == ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, updated to compare using storage key instead.
def get_storage_mapping_pair( | ||
self, | ||
storage: ReadableStorage, | ||
storage_and_mappers_list: Sequence[StorageAndMappers], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage_and_mappers_list: Sequence[StorageAndMappers], | |
storage_and_mappers: Sequence[StorageAndMappers], |
""" | ||
|
||
def __init__(self, storage: str) -> None: | ||
self.storage = get_storage(StorageKey(storage)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could probably just store the StorageKey
here. Then in the select_storage
function you could compare the storage key directly: if sm.storage.storage_key == self.storage_key
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
from snuba.query.query_settings import HTTPQuerySettings | ||
from snuba.query.snql.parser import parse_snql_query | ||
|
||
TEST_CASES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a couple test cases for the errors and sessions ones as well.
I have some questions about the design choices made here:
Why not have the default be implicit and have the entity fail to instantiate if the deafult is not sufficient? You can accomplish this with the
Aren't storage selectors part of entities? Wouldn't it make more sense to put it under there? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the UX could be improved here but you've been in the guts of the code more so I look forward to your perspective on it
for sm in storage_and_mappers_list: | ||
if sm.storage == self.storage: | ||
return sm | ||
raise QueryStorageSelectorError( | ||
"Unable to select storage. Storage not defined." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this function, the storage_and_mappers_list is only supposed to have one element. I don't think you need to be doing any comparisons here just
assert len(storage_and_mappers_list) == 1
return storage_and_mappers_list[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the __init__
function either in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
def __init__(self) -> None: | ||
self.__errors_table = get_writable_storage(StorageKey.ERRORS) | ||
self.__errors_ro_table = get_storage(StorageKey.ERRORS_RO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is duplicating knowledge that is already in the configuration and then using get_storage_mapping_pair
to do this reverse lookup, it's not very intuitive. I think there are more egonomic ways to do this.
For example, we can provide the guarantee that storage_and_mappers_list
is always going to be in the order that it is defined in configuration, that way you can do something like:
def select_storage(
self,
query: Query,
query_settings: QuerySettings,
storage_and_mappers_list: List[StorageAndMappers],
) -> StorageAndMappers:
use_readonly_storage = (
state.get_config("enable_events_readonly_table", False)
and not query_settings.get_consistent()
)
errors_ro, errors = storage_and_mappers_list
if use_readonly_storage:
return errors_ro
return errors
That way you don't need to be worrying about implementing a good reverse mapping function and the coherence of the configuration is preserved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is duplicating knowledge that is already in configuration, but it's also the way we do things today (selectors define storages in its constructor). However, I'm also not a fan of the reverse lookup.
Guaranteeing order is an interesting approach (e.g. sorting storages by some condition). Is what you're suggesting, read-only storages should always be defined first in the array, followed by writable storages? What if there are multiple readable storages? How should they be ordered?
To summarize the problem, it is the job of select_storage()
to determine which storage (defined on entity) to use based on input parameters. If the selection choice is always binary (readable or writable), it could probably just check the instance type. However, what if the choice isn't always between readable and writable? Is this something we even need to design for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this to remove the reverse lookup thing and to check instance type instead. This solution would not work for multiple readable storages though. But perhaps we don't need to design for that yet.
…elector_to_config
I'm okay with making this default storage selector implicit too. My initial thought was since storages defined in the entity will eventually be formatted as an array (in a subsequent PR), implicit selection from an array might not be too obvious. I'm fine with either though.
You're right, I will move this directory over. |
@@ -310,7 +265,8 @@ def __init__(self) -> None: | |||
storages=[storage], | |||
query_pipeline_builder=SimplePipelineBuilder( | |||
query_plan_builder=StorageQueryPlanBuilder( | |||
storages=[StorageAndMappers(storage, TranslationMappers())] | |||
storages=[StorageAndMappers(storage, TranslationMappers())], | |||
selector=DefaultQueryStorageSelector(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it possible this is using the DefaultQueryStorageSelector
? Shouldn't it be using the sessions selector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ORG_SESSIONS
only uses the org materialized sessions storage so the default is possible. SESSIONS
on the other hand, requires the storage selector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it. OK cool.
Overview
This PR is responsible for adding capabilities for defining storage selectors in entity configuration. Now that query plan builders have been consolidated, we can now support custom or default storage selectors defined on an entity. For context, right now only Errors and Sessions have custom storage selectors based on a query. All other entities simply default to single storage regardless of the query.
What has changed (before and after)?
DefaultQueryStorageSelector
has been created and added to all entities (python defined ones too). This is used to make storage selection explicit.snuba/datasets/storages/selectors/
directory was created to house all storage selectors. The custom sessions and errors selectors were also moved here.Blast Radius
This change is being applied to all entities. Entities must now specify a storage selector (default is provided).
Testing