-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add IncrementingCountCursor #346
base: main
Are you sure you want to change the base?
Conversation
…ntCursor implementation
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
UPD:
|
Signed-off-by: Artem Inzhyyants <[email protected]>
…g_count_cursor # Conflicts: # airbyte_cdk/sources/declarative/concurrent_declarative_source.py
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
📝 WalkthroughWalkthroughThis pull request introduces a new cursor type, Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller
participant Factory as ModelToComponentFactory
participant StateManager as ConnectorStateManager
participant Converter as IncrementingCountStreamStateConverter
participant Cursor as ConcurrentCursor
Caller->>Factory: Call create_declarative_stream()
Factory->>StateManager: Retrieve stream state
StateManager-->>Factory: Return stream state
Factory->>Factory: Check incremental_sync type (IncrementingCountCursorModel)
Factory->>Converter: Initialize IncrementingCountStreamStateConverter
Converter-->>Factory: Provide state conversion details
Factory->>Cursor: Create concurrent cursor with parsed parameters
Cursor-->>Caller: Return constructed ConcurrentCursor
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (8)
🔇 Additional comments (10)
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (14)
airbyte_cdk/sources/streams/concurrent/state_converters/abstract_stream_state_converter.py (1)
7-7
: Remove unused import.The
Callable
type is imported but not used in this file. Consider removing it to keep imports clean.-from typing import TYPE_CHECKING, Any, Callable, List, MutableMapping, Optional, Tuple +from typing import TYPE_CHECKING, Any, List, MutableMapping, Optional, Tupleairbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)
420-426
: Consider implementing proper type checking.The comment suggests that
IncrementingCountCursorModel
is hardcoded to be of typeDatetimeBasedCursor
. Would you like to implement proper type checking using the commented code to make this more explicit and type-safe? This could help prevent potential issues in the future.unit_tests/sources/streams/concurrent/test_incrementing_count_state_converter.py (2)
7-18
: Consider adding more test cases.The test covers the happy path, but would you like to add tests for:
- Edge cases (e.g., zero values, negative values)
- Error scenarios (e.g., invalid state format)
- Different initial state values
20-29
: Consider testing state type validation.The test verifies the conversion but doesn't check if the converter properly validates the state type. Would you like to add a test case that ensures invalid state types are rejected? wdyt?
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
780-817
: IncrementingCountCursor Schema Definition ReviewThis new block defines the
IncrementingCountCursor
very clearly with all the essential fields:
type
correctly constrained toIncrementingCountCursor
.cursor_field
is well documented with guidance on maintaining top-level placement.start_value
allows both string and integer types, which seems flexible.start_value_option
reuses the existingRequestOption
definition, ensuring consistency.A couple of friendly questions to consider:
- Since one of the PR objectives mentions that when running a check with
IncrementingCountCursor
the operation is executed without aConcurrentCursor
, would it be useful to include any additional metadata or hints in this schema (or via documentation) to signal such limitations?- Would adding an example usage or a note about potential behaviors during check operations benefit future developers who interact with this schema?
Wdyt?
airbyte_cdk/sources/streams/concurrent/state_converters/incrementing_count_stream_state_converter.py (6)
25-63
: The state conversion logic looks good, but could use more error handling.The implementation handles both compatible and incompatible states, but might benefit from additional validation.
Consider adding validation for:
- Type checking of
start
parameter- Ensuring
cursor_field
exists instream_state
- Validating state structure before conversion
What do you think about adding these validations? wdyt?
77-93
: The sync start calculation looks good but could use more documentation.The implementation handles edge cases well, but the logic could be clearer with additional comments.
Consider adding docstring explaining:
- The purpose of
prev_sync_low_water_mark
- Why we compare with
sync_start
- The implications of returning different values
Would you like me to suggest a docstring format? wdyt?
15-20
: Add return type hints for clarity?The implementation is correct, but would you consider adding return type hints to make the interface clearer? Something like:
- def _from_state_message(self, value: Any) -> Any: + def _from_state_message(self, value: Any) -> int: return value - def _to_state_message(self, value: Any) -> Any: + def _to_state_message(self, value: Any) -> int: return value
77-93
: Consider validating for negative values?The sync start calculation looks good! Would you consider adding validation to ensure we don't get negative values? Something like:
def _get_sync_start( self, cursor_field: CursorField, stream_state: MutableMapping[str, Any], start: Optional[int], ) -> int: sync_start = start if start is not None else self.zero_value + if sync_start < 0: + raise ValueError(f"Negative values are not allowed for sync_start: {sync_start}") prev_sync_low_water_mark: Optional[int] = ( stream_state[cursor_field.cursor_field_key] if cursor_field.cursor_field_key in stream_state else None ) if prev_sync_low_water_mark and prev_sync_low_water_mark >= sync_start: + if prev_sync_low_water_mark < 0: + raise ValueError(f"Negative values are not allowed for prev_sync_low_water_mark: {prev_sync_low_water_mark}") return prev_sync_low_water_mark else: return sync_start
25-63
: The state conversion logic looks solid, but could use more documentation.The implementation correctly handles both compatible and incompatible states, creating appropriate slices for prior syncs. However, would it be helpful to add more examples in the docstring showing both cases? For instance, showing what an incompatible state looks like before and after conversion, wdyt?
77-93
: The sync start calculation looks good but could benefit from validation.The logic correctly handles previous state and start values. However, should we add validation to ensure the cursor values are non-negative integers? This could prevent potential issues with negative counts, wdyt?
def _get_sync_start( self, cursor_field: CursorField, stream_state: MutableMapping[str, Any], start: Optional[int], ) -> int: sync_start = start if start is not None else self.zero_value + if sync_start < 0: + raise ValueError("Cursor values must be non-negative integers") prev_sync_low_water_mark: Optional[int] = ( stream_state[cursor_field.cursor_field_key] if cursor_field.cursor_field_key in stream_state else None ) + if prev_sync_low_water_mark is not None and prev_sync_low_water_mark < 0: + raise ValueError("Previous cursor value must be a non-negative integer") if prev_sync_low_water_mark and prev_sync_low_water_mark >= sync_start: return prev_sync_low_water_mark else: return sync_startairbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
593-593
: Consider raising an explicit error for runtime usage
Currently,create_incrementing_count_cursor
is mapped here, but the docstring says there’s no runtime usage. Would you like to raise an exception if it’s unexpectedly called, to avoid confusion? Wdyt?
1696-1721
: RequestOptionsProvider usage
The code reusesDatetimeBasedRequestOptionsProvider
with an incrementing count model. Would you consider introducing a specialized provider for numeric increments to reduce confusion? Wdyt?airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1511-1531
: LGTM! The IncrementingCountCursor class is well-structured.The class follows best practices with:
- Clear field definitions with descriptive help text
- Proper type hints and validation
- Flexible configuration options for start values
One suggestion though - would it be helpful to add an example in the docstring showing how to use this cursor with a real-world scenario? For instance, showing how to configure it for paginated APIs that use record counts, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py
(3 hunks)airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(2 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(6 hunks)airbyte_cdk/sources/streams/concurrent/state_converters/abstract_stream_state_converter.py
(1 hunks)airbyte_cdk/sources/streams/concurrent/state_converters/incrementing_count_stream_state_converter.py
(1 hunks)unit_tests/sources/streams/concurrent/test_incrementing_count_state_converter.py
(1 hunks)
👮 Files not reviewed due to content moderation or server errors (2)
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
- airbyte_cdk/sources/streams/concurrent/state_converters/incrementing_count_stream_state_converter.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (12)
airbyte_cdk/sources/streams/concurrent/state_converters/abstract_stream_state_converter.py (1)
14-15
: LGTM! Added integer state type.The addition of the
integer
state type toConcurrencyCompatibleStateType
is well-aligned with supporting the newIncrementingCountCursor
feature.airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
1511-1531
: LGTM! TheIncrementingCountCursor
class is well-structured.The class follows the established patterns and includes comprehensive field descriptions. The cursor field is properly marked as required, and the optional fields have sensible defaults.
A few suggestions to consider:
- Would it make sense to add more examples for the
start_value
field to demonstrate both integer and string use cases? wdyt?- Should we add validation to ensure
start_value
is a non-negative number when provided? wdyt?
1973-1979
: LGTM! Theincremental_sync
field is properly updated.The integration of
IncrementingCountCursor
into theDeclarativeStream
class is clean and maintains backward compatibility.
1973-1979
: LGTM! The integration with DeclarativeStream looks good.The
IncrementingCountCursor
is properly added as an option in theincremental_sync
field alongside existing cursor types.airbyte_cdk/sources/streams/concurrent/state_converters/incrementing_count_stream_state_converter.py (5)
14-24
: LGTM! The base implementation is clean and follows best practices.The class properly extends
AbstractStreamStateConverter
and implements the required methods. The infinity provider is a nice touch for handling unbounded ranges.
64-76
: LGTM! The value manipulation methods are simple and effective.The implementation of
parse_value
,zero_value
,increment
, andoutput_format
is straightforward and follows the expected behavior for integer-based cursors.
25-63
: LGTM! The state conversion logic is robust.The implementation correctly handles both compatible and incompatible state messages. The slices creation for representing prior syncs is a nice touch!
14-24
: LGTM! The base class implementation looks good.The class correctly implements the required methods from AbstractStreamStateConverter with proper type hints. The infinity end provider is a nice touch for unbounded streams.
64-76
: LGTM! The cursor manipulation methods are clean and straightforward.Simple and effective implementations for parsing, incrementing, and formatting integer values.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
248-250
: Imports look good
These new imports for theIncrementingCountCursorModel
appear correct and necessary. All good here!
503-504
: Confirmed the converter import
Nice addition ofIncrementingCountStreamStateConverter
; it cleanly ties into your concurrency approach.
1199-1262
: Potential type mismatch forstartValue
When you setstart=interpolated_start_value
, the interpolation could yield a string. Would you like to parse it into an integer before passing it along, to avoid any off-by-one or type errors? Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2210-2225
: 🛠️ Refactor suggestionConsider raising NotImplementedError in create_incrementing_count_cursor.
Since
create_incrementing_count_cursor
returns aDatetimeBasedCursor
and is not intended for runtime use, would you consider raisingNotImplementedError
to prevent unexpected behavior if the code is ever called at runtime? Wdyt?@staticmethod def create_incrementing_count_cursor( model: IncrementingCountCursorModel, config: Config, **kwargs: Any ) -> DatetimeBasedCursor: - # This should not actually get used anywhere at runtime, but needed to add this to pass checks since - # we still parse models into components. The issue is that there's no runtime implementation of a - # IncrementingCountCursor. - # A known and expected issue with this stub is running a check with the declared IncrementingCountCursor because it is run without ConcurrentCursor. - return DatetimeBasedCursor( - cursor_field=model.cursor_field, - datetime_format="%Y-%m-%d", - start_datetime="2024-12-12", - config=config, - parameters={}, - ) + raise NotImplementedError( + "IncrementingCountCursor is only supported with ConcurrentCursor. " + "This method exists only for model parsing and should not be called at runtime." + )
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
1199-1262
: Consider adding error handling for invalid cursor field.The
create_concurrent_cursor_from_incrementing_count_cursor
implementation looks good, but would you consider adding validation for the cursor field to ensure it exists in the stream's schema? This could help catch configuration errors early. Wdyt?def create_concurrent_cursor_from_incrementing_count_cursor( self, model_type: Type[BaseModel], component_definition: ComponentDefinition, stream_name: str, stream_namespace: Optional[str], config: Config, message_repository: Optional[MessageRepository] = None, **kwargs: Any, ) -> ConcurrentCursor: stream_state = ( self._connector_state_manager.get_stream_state(stream_name, stream_namespace) if "stream_state" not in kwargs else kwargs["stream_state"] ) component_type = component_definition.get("type") if component_definition.get("type") != model_type.__name__: raise ValueError( f"Expected manifest component of type {model_type.__name__}, but received {component_type} instead" ) incrementing_count_cursor_model = model_type.parse_obj(component_definition) if not isinstance(incrementing_count_cursor_model, IncrementingCountCursorModel): raise ValueError( f"Expected {model_type.__name__} component, but received {incrementing_count_cursor_model.__class__.__name__}" ) + if not incrementing_count_cursor_model.cursor_field: + raise ValueError("cursor_field must be specified for IncrementingCountCursor") + interpolated_start_value = ( InterpolatedString.create( incrementing_count_cursor_model.start_value, # type: ignore parameters=incrementing_count_cursor_model.parameters or {}, ) if incrementing_count_cursor_model.start_value else 0 )
1696-1721
: Consider adding validation for start_value_option.The
create_declarative_stream
method's handling ofIncrementingCountCursorModel
looks good, but would you consider adding validation to ensurestart_value_option
is valid when provided? This could help catch configuration errors early. Wdyt?elif model.incremental_sync and isinstance( model.incremental_sync, IncrementingCountCursorModel ): cursor_model: IncrementingCountCursorModel = model.incremental_sync # type: ignore + if cursor_model.start_value_option and not isinstance(cursor_model.start_value_option, RequestOptionModel): + raise ValueError("start_value_option must be a RequestOption when provided") + start_time_option = ( self._create_component_from_model( cursor_model.start_value_option, # type: ignore # mypy still thinks cursor_model of type DatetimeBasedCursor config, parameters=cursor_model.parameters or {}, ) if cursor_model.start_value_option # type: ignore # mypy still thinks cursor_model of type DatetimeBasedCursor else None )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
248-250
: LGTM! Import of new cursor model.The import of
IncrementingCountCursor
from the declarative component schema is correctly added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have tests that ensures that a manifest model creates the expected behavior?
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
# the default DatetimeBasedRequestOptionsProvider() sets them to start_time/end_time | ||
partition_field_start = "start" | ||
|
||
request_options_provider = DatetimeBasedRequestOptionsProvider( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming here is weird. Should we generify the DatetimeBasedRequestOptionsProvider
to be a StartEndRequestOptionsProvider
or something like that? Would this allow us to have only one method for building both DatetimeBasedCursor
and IncrementingCountCursor
? The comments above refer to DatetimeBasedCursor
which makes me think there is duplication
slices = [ | ||
{ | ||
self.START_KEY: start if start is not None else sync_start, | ||
self.END_KEY: sync_start, # this may not be relevant anymore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't it be relevant anymore? Can we just remove it if it's the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is used only in ConcurrentCursor._get_concurrent_state
airbyte-python-cdk/airbyte_cdk/sources/streams/concurrent/cursor.py
Lines 191 to 201 in 4459243
def _get_concurrent_state( | |
self, state: MutableMapping[str, Any] | |
) -> Tuple[CursorValueType, MutableMapping[str, Any]]: | |
if self._connector_state_converter.is_state_message_compatible(state): | |
return ( | |
self._start or self._connector_state_converter.zero_value, | |
self._connector_state_converter.deserialize(state), | |
) | |
return self._connector_state_converter.convert_from_sequential_state( | |
self._cursor_field, state, self._start | |
) |
and, since we don't have any "old-style" states (is_state_message_compatible
-> True
) we may never use convert_from_sequential_state
for this cursor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might use the convert_from_sequential_state
if we can find a way to have slice_range. For now, I'm fine with this.
...cdk/sources/streams/concurrent/state_converters/incrementing_count_stream_state_converter.py
Show resolved
Hide resolved
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
780-817
: The schema definition for IncrementingCountCursor looks well-structured.This is a good addition that follows the same design pattern as DatetimeBasedCursor. The cursor_field, start_value and other properties are well-defined with appropriate descriptions and examples.
However, I notice that the start_value property accepts both string and integer. Since this is for an incrementing count, should we maybe restrict it to integers only? Or at least document when a string might be appropriate? wdyt?
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)
421-425
: Informative comments about the current implementation state.The comments provide helpful context about why there's no isinstance check for IncrementingCountCursor. It clarifies that it's currently hardcoded as DatetimeBasedCursor, which helps future maintainers understand the implementation.
Would it be valuable to add a TODO comment with a ticket reference to track the future work of creating a proper Declarative IncrementingCountCursor? This would make it easier to find and address this technical debt later. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py
(4 hunks)airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1359-1359
: Properly registered the new cursor in DeclarativeStream's incremental_sync property.Good job adding the reference to the new IncrementingCountCursor in the list of available incremental_sync options.
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (4)
34-36
: Added import for the new IncrementingCountCursorModel.Clean import addition that follows the existing pattern of importing model classes.
228-229
: Method name change better reflects its expanded purpose.Good renaming from
_is_datetime_incremental_without_partition_routing
to_is_concurrent_cursor_incremental_without_partition_routing
to accurately reflect that it now handles both datetime-based and incrementing count cursors.Also applies to: 406-407
414-417
: Updated condition now includes both cursor types.Good update to make the condition more inclusive, accepting both DatetimeBasedCursorModel and IncrementingCountCursorModel.
260-272
:❓ Verification inconclusive
Proper handling of IncrementingCountCursor type.
The new conditional branch appropriately detects and processes the IncrementingCountCursorModel type by calling the appropriate factory method. This looks good, but I noticed the PR objectives mention a known issue where "when running a check with the declared IncrementingCountCursor, it is executed without the ConcurrentCursor."
Does this implementation address that issue? If not, should we add some validation to ensure ConcurrentCursor is always used with IncrementingCountCursor? wdyt?
🏁 Script executed:
#!/bin/bash # Look for validation related to ConcurrentCursor usage with IncrementingCountCursor # Check if there's any validation in model_to_component_factory.py rg -A 5 "create_concurrent_cursor_from_incrementing_count_cursor" --type pyLength of output: 1662
ConcurrentCursor validation suggestion
Hi there, I can confirm that the code now correctly routes the IncrementingCountCursorModel to the dedicated factory method via the call to
create_concurrent_cursor_from_incrementing_count_cursor
. However, I didn’t find any extra validation ensuring that the returned cursor is indeed a ConcurrentCursor—a point hinted at by the known issue note. Would it make sense to include an explicit check or validation (for example, an assertion inside the factory method or right after its call) to guarantee that a ConcurrentCursor is always used when dealing with an IncrementingCountCursorModel? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have tests that ensures that a manifest model creates the expected behavior?
cursor_field = CursorField(interpolated_cursor_field.eval(config=config)) | ||
|
||
connector_state_converter = IncrementingCountStreamStateConverter( | ||
is_sequential_state=True, # ConcurrentPerPartitionCursor only works with sequential state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not concurrent per partition though. I think the real reason to keep it as sequential state is that since the slice_range is never defined for IncrementalCountCursor, there is not value in having partitioned state
slices = [ | ||
{ | ||
self.START_KEY: start if start is not None else sync_start, | ||
self.END_KEY: sync_start, # this may not be relevant anymore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might use the convert_from_sequential_state
if we can find a way to have slice_range. For now, I'm fine with this.
...cdk/sources/streams/concurrent/state_converters/incrementing_count_stream_state_converter.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had one small suggestion on adjusting the description of a field, otherwise the YAML schema changes make sense to me!
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: Artem Inzhyyants <[email protected]>
This is interesting. I see only 1 way to get this :: if our API can provide the latest record, e.g. make a request with limit=1
ORDER BY DESC so we can get but i have no idea how to implement it. |
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
780-817
: Review on the new IncrementingCountCursor schemaYou've introduced the new
IncrementingCountCursor
definition, which looks well structured and in line with our schema design. The properties are clear and consistent with similar cursor definitions:
cursor_field
: The description is comprehensive, explaining that the field must be at the top level and that records are expected in ascending order. This clarity is great for users.start_value
: Allowing both string and integer types increases flexibility. Is this dual-type intentionally designed to support cases where the start value might come as a numeric or a templated string? Just checking if that meets the broader use cases.start_value_option
: The reference to#/definitions/RequestOption
follows our established pattern, which is excellent.One thought—since the PR objectives mention a known issue where the
IncrementingCountCursor
is used during a check without theConcurrentCursor
, would it be beneficial to include a short note in the description of this schema to warn users of potential unexpected behavior during check operations? wdyt?Overall, the changes are clear and consistent.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(2 hunks)unit_tests/sources/declarative/test_concurrent_declarative_source.py
(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Analyze (python)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (7)
unit_tests/sources/declarative/test_concurrent_declarative_source.py (7)
7-7
: New imports for the IncrementingCountCursor implementation.The import of
math
is likely needed formath.inf
used in line 822, and theIncrementingCountStreamStateConverter
is required for the new cursor implementation. These imports align well with the functionality being added.Also applies to: 47-49
237-246
: New incremental_counting_cursor definition.The definition appropriately configures:
- Cursor type as IncrementingCountCursor
- Cursor field set to "id"
- Start value of 0
- Request parameter for "since_id"
Looks good and aligns with incremental ID-based synchronization patterns.
255-261
: New base_incremental_counting_stream definition.This creates a proper base stream that leverages the new counting cursor for incremental sync.
551-579
: Implementation of incremental_counting_stream.The stream properly extends the base stream and defines appropriate schema. One potential concern: this stream uses the same path ("/party_members") as the existing party_members_stream. Is this intentional or could it lead to confusion since they're different streams fetching from the same endpoint? wdyt?
589-589
: Added the new stream to the manifest's streams list.Good - ensures the stream is available for testing.
810-823
: Test assertions for the new IncrementingCountCursor.These assertions verify the core properties of the cursor:
- Proper state converter type (IncrementingCountStreamStateConverter)
- Correct stream name, cursor field, and start value
- End provider correctly set to infinity
While these tests verify the cursor is correctly created, they don't specifically test the functionality during actual read operations. Would it make sense to add a test case that verifies the cursor behavior during a sync operation (similar to the date-based tests)? wdyt?
810-823
: Address the known issue mentioned in PR description.Per the PR objectives, there's a known issue when running a check with the declared
IncrementingCountCursor
- it's executed without theConcurrentCursor
, leading to unexpected behavior. This code implements the cursor but doesn't address the check-related issue.Could we add a targeted test for this known issue? Or is there a plan to address it in a follow-up PR?
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
What
add IncrementingCountCursor
How
Warning
A known and expected issue with this stub is running a check with the declared IncrementingCountCursor because it is run without ConcurrentCursor.
Summary by CodeRabbit
New Features
IncrementingCountCursor
, for improved incremental synchronization.Tests
IncrementingCountStreamStateConverter
to ensure expected functionality.