TMP development to show how things could work with concurrent cursor #228

maxi297 · 2025-01-17T19:28:43Z

Note that this change might impact @tolik0 work here

maxi297 · 2025-01-17T19:31:14Z

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

@@ -487,6 +488,7 @@ def __init__(
        self._message_repository = message_repository or InMemoryMessageRepository(
            self._evaluate_log_level(emit_connector_builder_messages)
        )
+        self._state_manager = state_manager


My guess is that this makes sense here given that we want to instantiate cursors with the state eventually to avoid a set_initial_state method call

maxi297 · 2025-01-17T19:32:42Z

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

@@ -1476,6 +1476,17 @@ def _merge_stream_slicers(
                    stream_cursor=cursor_component,
                )
        elif model.incremental_sync:
+            if model.retriever.type == "AsyncRetriever":
+                if model.incremental_sync.type != "DatetimeBasedCursor":
+                    # TODO explain why it isn't supported


TODO: I'm not exactly sure why other types of cursors wouldn't be supported but I was only doing this for source-amazon-ads so I wanted to be more restrictive than not.

Note that Global/PerPartition cursors were not updated which we will need for source-amazon-ads

maxi297 · 2025-01-17T19:34:35Z

airbyte_cdk/sources/types.py

@@ -151,7 +153,7 @@ def __json_serializable__(self) -> Any:
        return self._stream_slice

    def __hash__(self) -> int:
-        return hash(orjson.dumps(self._stream_slice, option=orjson.OPT_SORT_KEYS))
+        return SliceHasher.hash("dummy_name", self._stream_slice)


This had to be updated to support AsyncPartition. SliceHasher considers __json_serializable__ but orjson does not. I figure we should have the same slicing logic everywhere and if we want to update this logic to orjson, just do it once in SliceHasher

TODO: I'm not sure why we require the name of the stream for the slice hasher.

maxi297 · 2025-01-17T19:34:52Z

unit_tests/sources/declarative/test_concurrent_declarative_source.py

@@ -322,6 +322,7 @@
                    "http_method": "GET",
                },
            },
+            "incremental_sync": {"$ref": "#/definitions/incremental_cursor"},


To ensure that async_retriever with incremental syncs are also concurrent

TMP development to show how things could work with concurrent cursor

68cd03b

maxi297 commented Jan 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TMP development to show how things could work with concurrent cursor #228

TMP development to show how things could work with concurrent cursor #228

maxi297 commented Jan 17, 2025 •

edited

Loading

maxi297 Jan 17, 2025

maxi297 Jan 17, 2025

maxi297 Jan 17, 2025

maxi297 Jan 17, 2025

TMP development to show how things could work with concurrent cursor #228

Are you sure you want to change the base?

TMP development to show how things could work with concurrent cursor #228

Conversation

maxi297 commented Jan 17, 2025 • edited Loading

maxi297 Jan 17, 2025

Choose a reason for hiding this comment

maxi297 Jan 17, 2025

Choose a reason for hiding this comment

maxi297 Jan 17, 2025

Choose a reason for hiding this comment

maxi297 Jan 17, 2025

Choose a reason for hiding this comment

maxi297 commented Jan 17, 2025 •

edited

Loading