-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TMP development to show how things could work with concurrent cursor #228
base: main
Are you sure you want to change the base?
Conversation
@@ -487,6 +488,7 @@ def __init__( | |||
self._message_repository = message_repository or InMemoryMessageRepository( | |||
self._evaluate_log_level(emit_connector_builder_messages) | |||
) | |||
self._state_manager = state_manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that this makes sense here given that we want to instantiate cursors with the state eventually to avoid a set_initial_state
method call
@@ -1476,6 +1476,17 @@ def _merge_stream_slicers( | |||
stream_cursor=cursor_component, | |||
) | |||
elif model.incremental_sync: | |||
if model.retriever.type == "AsyncRetriever": | |||
if model.incremental_sync.type != "DatetimeBasedCursor": | |||
# TODO explain why it isn't supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: I'm not exactly sure why other types of cursors wouldn't be supported but I was only doing this for source-amazon-ads so I wanted to be more restrictive than not.
Note that Global/PerPartition cursors were not updated which we will need for source-amazon-ads
@@ -151,7 +153,7 @@ def __json_serializable__(self) -> Any: | |||
return self._stream_slice | |||
|
|||
def __hash__(self) -> int: | |||
return hash(orjson.dumps(self._stream_slice, option=orjson.OPT_SORT_KEYS)) | |||
return SliceHasher.hash("dummy_name", self._stream_slice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This had to be updated to support AsyncPartition
. SliceHasher considers __json_serializable__
but orjson does not. I figure we should have the same slicing logic everywhere and if we want to update this logic to orjson, just do it once in SliceHasher
TODO: I'm not sure why we require the name of the stream for the slice hasher.
@@ -322,6 +322,7 @@ | |||
"http_method": "GET", | |||
}, | |||
}, | |||
"incremental_sync": {"$ref": "#/definitions/incremental_cursor"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure that async_retriever with incremental syncs are also concurrent
Note that this change might impact @tolik0 work here