Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(low-code cdk): add component resolver and http component resolver #88

Merged
merged 22 commits into from
Dec 5, 2024

Conversation

lazebnyi
Copy link
Contributor

@lazebnyi lazebnyi commented Nov 26, 2024

What

This PR add dynamic stream that can dynamically generate streams based on response.
Fixed: https://github.com/airbytehq/airbyte-internal-issues/issues/10787?issue=airbytehq%7Cairbyte-internal-issues%7C10788

How

Described schema, models and data classes for DynamicDeclarativeStream, HttpComponentsResolver and ComponentMappingDefinition. Defined behavior for components resolving and managing parsing of the stream template. Integrated the dynamic stream handling into the existing framework, ensuring seamless operation alongside static streams.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced dynamic stream capabilities with the addition of dynamic_streams in the schema.
    • Added HttpComponentsResolver and ComponentMappingDefinition for enhanced component mapping and resolution.
    • Implemented new methods for improved dynamic stream handling in the ManifestDeclarativeSource.
  • Improvements

    • Enhanced error handling and robustness in stream configuration processing.
    • Improved handling of empty stream selections to prevent infinite loops.
    • Restructured DeclarativeSource for more flexible configurations.
    • Expanded module capabilities by including PartitionRouter in the public interface.
  • Tests

    • Added unit tests for the HttpComponentsResolver to validate component mapping resolution.
    • Implemented tests for dynamic stream reading and resolution logic.
    • Enhanced test suite for ManifestDeclarativeSource with new fixtures and validation cases.

@github-actions github-actions bot added the enhancement New feature or request label Nov 26, 2024
@lazebnyi lazebnyi marked this pull request as ready for review November 27, 2024 02:02
@lazebnyi lazebnyi requested a review from maxi297 November 27, 2024 02:02
Copy link
Contributor

coderabbitai bot commented Nov 27, 2024

Warning

Rate limit exceeded

@lazebnyi has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 22 minutes and 31 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 25e6e1f and de7b6bc.

📝 Walkthrough
📝 Walkthrough
📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces several changes across multiple files within the Airbyte CDK. Key modifications include updates to the ConcurrentDeclarativeSource and ManifestDeclarativeSource classes to enhance stream configuration handling, the addition of new components and properties in the declarative_component_schema.yaml, and the introduction of new classes and methods for component resolution. These changes collectively improve the flexibility and robustness of the declarative source schema and its associated processing logic.

Changes

File Change Summary
airbyte_cdk/sources/declarative/concurrent_declarative_source.py Modified _group_streams to use self.resolved_manifest["streams"] and self.resolved_manifest["dynamic_streams"]; updated read method to yield non-empty selected_concurrent_streams.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Added dynamic_streams property; introduced ComponentMappingDefinition, HttpComponentsResolver, and DynamicDeclarativeStream components to enhance schema flexibility.
airbyte_cdk/sources/declarative/manifest_declarative_source.py Updated _stream_configs method to concatenate results from _dynamic_stream_configs; added _dynamic_stream_configs for dynamic stream resolution; enhanced error handling and connection checker logic.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Added ComponentMappingDefinition, HttpComponentsResolver, and DynamicDeclarativeStream classes; restructured DeclarativeSource to include dynamic_streams field.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Added methods for creating HTTP components and mapping definitions; refactored _merge_stream_slicers for improved clarity.
airbyte_cdk/sources/declarative/partition_routers/__init__.py Updated __all__ to include PartitionRouter.
airbyte_cdk/sources/declarative/resolvers/__init__.py Introduced COMPONENTS_RESOLVER_TYPE_MAPPING and imported various components to enhance module exports.
airbyte_cdk/sources/declarative/resolvers/components_resolver.py Added ComponentMappingDefinition, ResolvedComponentMappingDefinition, and ComponentsResolver classes; introduced resolve_components method for mapping resolution.
airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py Introduced HttpComponentsResolver class for resolving components via HTTP; includes methods for parsing mappings and updating configurations.
unit_tests/sources/declarative/resolvers/__init__.py Added copyright notice.
unit_tests/sources/declarative/resolvers/test_http_components_resolver.py Introduced tests for HttpComponentsResolver class to validate component mapping resolution under various scenarios.

Possibly related PRs

Suggested reviewers

  • aaronsteers
  • brianjlai
  • ChristoGrab

Wydt about reaching out to these reviewers for their insights?


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (19)
airbyte_cdk/sources/declarative/partition_routers/__init__.py (1)

9-9: LGTM! Consider grouping related imports?

The changes look good! The new PartitionRouter import and its addition to __all__ are well-structured.

Hey, what do you think about grouping the router imports together? We could move the base PartitionRouter import before its implementations for better readability, wdyt? 😊

from airbyte_cdk.sources.declarative.partition_routers.partition_router import PartitionRouter
from airbyte_cdk.sources.declarative.partition_routers.cartesian_product_stream_slicer import CartesianProductStreamSlicer
from airbyte_cdk.sources.declarative.partition_routers.list_partition_router import ListPartitionRouter
from airbyte_cdk.sources.declarative.partition_routers.single_partition_router import SinglePartitionRouter
from airbyte_cdk.sources.declarative.partition_routers.substream_partition_router import SubstreamPartitionRouter

Also applies to: 11-11

unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (3)

1-11: Consider adding module and class level docstrings?

The file structure looks clean! Would you consider adding a module-level docstring to describe the purpose of these tests? This could help other developers understand the test coverage better. wdyt?

 #
 # Copyright (c) 2024 Airbyte, Inc., all rights reserved.
 #
 
+"""
+Unit tests for the HttpComponentsResolver class.
+
+These tests verify the component resolution logic with different mapping configurations
+and conditions, ensuring proper template population based on HTTP retriever responses.
+"""
+
 import pytest

13-52: Test cases look good! Consider adding more edge cases?

The parametrized test cases effectively cover basic scenarios. Would you consider adding a few more edge cases to strengthen the test coverage? Some suggestions:

  • Empty component values
  • Invalid value types
  • Complex nested conditions
  • Multiple records in retriever_data

What do you think about adding descriptions for each test case using ids in parametrize? This could make test failures more descriptive:

 @pytest.mark.parametrize(
     "components_mapping, retriever_data, stream_template_config, expected_result",
     [
         (
             # ... existing test case 1
         ),
         (
             # ... existing test case 2
         ),
     ],
+    ids=[
+        "multiple_keys_with_conditions",
+        "single_key_update",
+    ]
 )

53-75: Test implementation looks solid! Want to add some additional assertions?

The test implementation is clean and well-structured! A few suggestions to make it even more robust:

  1. Maybe we could verify that the retriever was called with the expected parameters?
  2. How about adding assertions for the number of times the retriever was called?
  3. Would it make sense to verify the type of the resolved values?

Here's what that might look like:

     # Assert the resolved components match the expected result
     assert result == expected_result
+    # Verify retriever interaction
+    mock_retriever.read_records.assert_called_once()
+    # Verify types of resolved values
+    for resolved_item in result:
+        for key, value in resolved_item.items():
+            if value is not None:
+                assert isinstance(value, str)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

184-185: Consider enhancing documentation for stream configuration handling

The code handles stream configurations carefully, but would it be helpful to add some documentation about:

  1. How dynamic streams interact with the concurrent processing model
  2. Any limitations or considerations when using dynamic streams with concurrent processing
  3. Examples of valid stream configurations

This could help future maintainers understand the interaction between dynamic streams and concurrent processing. wdyt? 📝

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4)

2707-2753: The ComponentMappingDefinition looks good, but would you consider enhancing the documentation?

The component is well-structured, but a few additions could make it even more user-friendly:

  1. Could we add examples of common use cases for the condition field? For instance, filtering based on date ranges or status values.
  2. What happens when type inference fails for value_type? Should we document the fallback behavior?

What do you think about adding these clarifications to help users better understand the component's behavior?


2754-2778: The HttpComponentsResolver looks solid, but could we enhance the documentation for production readiness?

Since this is an experimental component that makes HTTP calls, would you consider adding documentation about:

  1. Error handling: How does it handle network failures or invalid responses?
  2. Retry behavior: Are there built-in retries for failed requests?
  3. Caching: Is there any caching mechanism to prevent redundant requests?

These additions would help users understand the component's behavior in production scenarios. What are your thoughts?


2779-2790: The DynamicDeclarativeStream looks promising, but could we add more implementation guidance?

For this experimental component, would you consider adding:

  1. Documentation about stream naming conventions and potential conflicts
  2. Clarification about state management for dynamic streams
  3. Examples of common use cases (e.g., multi-tenant scenarios, dynamic API endpoints)

These additions would help users understand when and how to use this component effectively. What do you think?


22-25: Consider adding version information for experimental features

The dynamic_streams array is well-integrated into the main schema. Since this is part of experimental features, would you consider:

  1. Adding a comment about the minimum CDK version required
  2. Including a note about potential breaking changes in future versions
  3. Adding a link to migration guides (when available)

This would help users make informed decisions about adopting these experimental features. Thoughts?

airbyte_cdk/sources/declarative/resolvers/components_resolver.py (2)

35-35: Is using '@deprecated' appropriate for marking an experimental class?

The @deprecated decorator is typically used to indicate that a class or method is obsolete and may be removed in future versions. Since ComponentsResolver is experimental rather than deprecated, perhaps using a custom warning or simply adding a note in the docstring might be clearer. Would you consider adjusting this to more accurately reflect its experimental status? WDYT?


21-21: Is an empty string the appropriate default for 'condition'?

The condition field is set to an empty string by default, but its type allows for None (Optional[...]). Would setting the default to None be more appropriate, especially if an empty string isn't a meaningful condition? WDYT?

Also applies to: 32-32

airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1)

21-21: Consider using a custom warning instead of @deprecated

Currently, the @deprecated decorator is used to indicate that HttpComponentsResolver is experimental:

@deprecated("This class is experimental. Use at your own risk.", category=ExperimentalClassWarning)

Since @deprecated typically signals that a feature is outdated and should be avoided because it may be removed in future versions, perhaps it's clearer to use a custom warning or decorator to mark experimental classes. This could prevent confusion for users who might think the class is being phased out rather than being in an experimental phase. Wdyt?

airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)

305-307: Appending dynamic stream configs—any concerns about duplicates?

By extending stream_configs with dynamic stream configurations, is there a possibility of duplicate stream names causing conflicts? Should we consider handling or warning about duplicates to prevent potential issues? Wdyt?

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

1056-1064: Consider using None as the default value for condition field

Currently, the condition field in ComponentMappingDefinition has a default value of an empty string "". Since it's declared as Optional[str], would setting the default to None improve consistency with other optional fields? Wdyt?


1784-1784: Add a description to components_mapping field

For clarity and consistency, should we add a Field with a description to the components_mapping attribute in HttpComponentsResolver? Wdyt?


1792-1795: Correct the description of components_resolver field

The current description is "Component resolve and populates stream templates with components values." Perhaps it should be "Component that resolves and populates stream templates with component values." Wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

1295-1319: Should we support other retriever models in _build_stream_slicer_from_partition_router?

Currently, the method checks if the model is an instance of SimpleRetrieverModel. Would it make sense to extend this to support AsyncRetrieverModel or CustomRetrieverModel for greater flexibility? WDYT?


2207-2221: Would adding a docstring to create_components_mapping_definition improve clarity?

Providing a docstring can help future maintainers understand the purpose and usage of this method. What do you think?


2222-2255: Consider adding a docstring to create_http_components_resolver for better maintainability.

Adding a docstring could enhance readability and assist others in understanding the functionality of this method. WDYT?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a8b1b2b and b136524.

📒 Files selected for processing (11)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1 hunks)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2 hunks)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (7 hunks)
  • airbyte_cdk/sources/declarative/partition_routers/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/resolvers/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/resolvers/components_resolver.py (1 hunks)
  • airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/__init__.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • airbyte_cdk/sources/declarative/resolvers/init.py
  • unit_tests/sources/declarative/resolvers/init.py
🔇 Additional comments (13)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)

184-185: Implementation looks solid! 👍

The change to use self._stream_configs() is clean and maintains all the existing thread-safety guarantees while enabling dynamic stream support. Nice work!


184-185: Verify stream configuration handling with dynamic streams

The change to use self._stream_configs() aligns well with the PR's objective of supporting dynamic streams. However, we should ensure this works correctly with concurrent processing.

A few questions to consider:

  1. How does this handle the case when dynamic streams are added/removed during runtime?
  2. Could there be any race conditions between stream configuration resolution and concurrent processing?

wdyt about adding some debug logging here to track stream configuration changes? 🤔

Let's verify the stream configuration handling:

✅ Verification successful

No race conditions found in stream configuration handling 🎯

After analyzing the code, I can confirm that the change to use self._stream_configs() is safe in concurrent processing. The method is called during stream initialization before any concurrent processing begins, and its results are used to create a mapping of stream configurations. The concurrent processing only starts after this initialization is complete.

A few observations that support this:

  • Stream configs are resolved once during initialization
  • The mapping is created before any concurrent stream processing begins
  • No shared state modifications during stream config handling
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for potential race conditions or inconsistencies in stream handling

# Look for any existing debug/error logging related to stream configuration
rg -A 3 "stream.*config" --type py

# Check for any existing race condition handling
rg -A 3 "concurrent|parallel|race|atomic" --type py

# Look for any similar pattern in other declarative sources
ast-grep --pattern 'self._stream_configs($manifest, $config)'

Length of output: 116614


Script:

#!/bin/bash
# Let's check how stream configuration is handled in concurrent processing

# Look for any concurrent stream configuration handling
ast-grep --pattern 'def _stream_configs($manifest, $config)'

# Check for any race condition handling in stream configuration
rg -A 5 'self._stream_configs.*config' --type py

# Look for any concurrent stream processing related to configuration
rg -A 5 'concurrent.*stream.*config' --type py

Length of output: 2395

airbyte_cdk/sources/declarative/resolvers/components_resolver.py (1)

20-20: Is the use of 'InitVar' for 'parameters' intentional?

In both ComponentMappingDefinition and ResolvedComponentMappingDefinition, the parameters field is defined as an InitVar, which means it won't be stored as an instance attribute after initialization. Is this your intention, or should parameters be accessible as an instance variable? WDYT?

Also applies to: 31-31

airbyte_cdk/sources/declarative/manifest_declarative_source.py (4)

33-34: Importing COMPONENTS_RESOLVER_TYPE_MAPPING—looks good!

The import statement correctly adds COMPONENTS_RESOLVER_TYPE_MAPPING, which is needed for component resolution.


124-124: Updated method call with new config parameter—great job!

The call to _stream_configs now includes the config parameter, aligning with the updated method signature.


299-301: Adding config parameter to _stream_configs—are all usages updated?

The _stream_configs method now accepts config as a parameter. Have we ensured that all calls to this method are updated accordingly? Also, are there any subclasses or overrides that might need adjustment? Wdyt?


327-331: ⚠️ Potential issue

Safely accessing 'type' in components_resolver_config—what do you think?

We access components_resolver_config["type"] directly, which might lead to a KeyError if 'type' is missing. Should we use get() to handle this more gracefully? Wdyt?

Suggested change:

-resolver_type = components_resolver_config["type"]
+resolver_type = components_resolver_config.get("type")

 if not resolver_type:
     raise ValueError(
         f"Missing 'type' in components resolver configuration: {components_resolver_config}"
     )

Likely invalid or redundant comment.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

1372-1372: Addition of dynamic_streams field to DeclarativeSource looks good

Including dynamic_streams enhances the flexibility of the declarative schema by allowing dynamic stream configurations.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)

33-36: LGTM!

The new imports for HttpComponentsResolver and ComponentMappingDefinition are correctly added.


284-289: LGTM!

The model imports for HttpComponentsResolverModel and ComponentMappingDefinitionModel are properly added.


304-304: LGTM!

The import statement for PartitionRouter and related slicers looks good.


475-476: LGTM!

Adding the new models to PYDANTIC_MODEL_TO_CONSTRUCTOR is appropriate.


1320-1329: Is it intentional to create a ResumableFullRefreshCursor only when stream_slicer is None?

In _build_resumable_cursor_from_paginator, the method returns a ResumableFullRefreshCursor only if there's no stream_slicer. Should we consider handling cases where both a paginator and a stream_slicer are present to ensure all scenarios are covered? WDYT?

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have one integration tests where we provide a manifest and run a small read operation on it? I already started on branch maxi297/playing-with-dynamic-streams (see this compare) in order to make sure I understood properly and I got blocked on the check method. I would like to see a test for read as well.

I'm almost wondering if we need a test with a parent stream as well to confirm that this would be working. I know there are some specifics like _initialize_cache_for_parent_streams being called to set up caching and things like that.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (8)
airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)

311-352: Consider adding debug logging and extracting validation logic.

The method implementation looks solid! A couple of suggestions to consider:

  1. Would it be helpful to add debug logging similar to what we have in _stream_configs? This could help with troubleshooting dynamic stream resolution.

  2. The validation logic (lines 317-335) could potentially be extracted into a separate method for better readability. Wdyt?

Here's a possible refactor approach:

    def _dynamic_stream_configs(
        self, manifest: Mapping[str, Any], config: Mapping[str, Any]
    ) -> List[Dict[str, Any]]:
        dynamic_stream_definitions: List[Dict[str, Any]] = manifest.get("dynamic_streams", [])
        dynamic_stream_configs: List[Dict[str, Any]] = []

+        def _validate_resolver_config(resolver_config: Dict[str, Any]) -> None:
+            if not resolver_config:
+                raise ValueError(
+                    f"Missing 'components_resolver' in dynamic definition: {dynamic_definition}"
+                )
+
+            resolver_type = resolver_config.get("type")
+            if not resolver_type:
+                raise ValueError(
+                    f"Missing 'type' in components resolver configuration: {resolver_config}"
+                )
+
+            if resolver_type not in COMPONENTS_RESOLVER_TYPE_MAPPING:
+                raise ValueError(
+                    f"Invalid components resolver type '{resolver_type}'. "
+                    f"Expected one of {list(COMPONENTS_RESOLVER_TYPE_MAPPING.keys())}."
+                )

        for dynamic_definition in dynamic_stream_definitions:
            components_resolver_config = dynamic_definition["components_resolver"]
+            self.logger.debug(f"Processing dynamic stream definition: {dynamic_definition}")
+            _validate_resolver_config(components_resolver_config)
-            if not components_resolver_config:
-                raise ValueError(
-                    f"Missing 'components_resolver' in dynamic definition: {dynamic_definition}"
-                )
-
-            resolver_type = components_resolver_config.get("type")
-            if not resolver_type:
-                raise ValueError(
-                    f"Missing 'type' in components resolver configuration: {components_resolver_config}"
-                )
-
-            if resolver_type not in COMPONENTS_RESOLVER_TYPE_MAPPING:
-                raise ValueError(
-                    f"Invalid components resolver type '{resolver_type}'. "
-                    f"Expected one of {list(COMPONENTS_RESOLVER_TYPE_MAPPING.keys())}."
-                )
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

1034-1063: Would adding validation examples help future maintainers? wdyt?

The ComponentMappingDefinition class looks good, but consider adding docstring examples showing validation scenarios, like:

  • Invalid field paths
  • Type mismatches
  • Parameter validation

This would help other developers understand the expected behavior and constraints.


1811-1820: Consider enhancing the field documentation for better clarity?

The HttpComponentsResolver class looks solid, but the components_mapping field could use more detailed documentation explaining:

  • The relationship between retriever and mapping
  • Expected mapping structure
  • Error handling behavior

This would make it easier for developers to understand how to use this class correctly.


Line range hint 1362-1423: Would adding migration guidance help users upgrading their code?

The split of DeclarativeSource into two variants (DeclarativeSource1 and DeclarativeSource2) is a good approach for type safety. Consider adding:

  • Migration guide for users upgrading from the old version
  • Examples showing when to use each variant
  • Documentation explaining the rationale behind the split

This would help users understand which variant to use in their specific case.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)

1295-1318: Consider adding type hints for the return value.

The method implementation looks good, but adding a return type hint would improve code clarity and maintainability. What do you think about adding -> Optional[PartitionRouter]?

    def _build_stream_slicer_from_partition_router(
        self,
        model: Union[AsyncRetrieverModel, CustomRetrieverModel, SimpleRetrieverModel],
        config: Config,
-    ):
+    ) -> Optional[PartitionRouter]:

1320-1329: Consider adding docstring for clarity.

The method looks good but could benefit from a docstring explaining its purpose and return value. What do you think about adding one?

    def _build_resumable_cursor_from_paginator(
        self,
        model: Union[AsyncRetrieverModel, CustomRetrieverModel, SimpleRetrieverModel],
        stream_slicer: Optional[StreamSlicer],
    ) -> Optional[StreamSlicer]:
+       """
+       Builds a resumable cursor from a paginator if needed.
+       
+       Args:
+           model: The retriever model
+           stream_slicer: The current stream slicer
+       
+       Returns:
+           A ResumableFullRefreshCursor if the model has a paginator and no stream slicer,
+           otherwise returns None
+       """

2225-2258: Consider adding error handling for invalid configurations.

The implementation looks good but might benefit from some defensive programming. What do you think about adding validation for required components and proper error messages?

    def create_http_components_resolver(
        self, model: HttpComponentsResolverModel, config: Config
    ) -> Any:
+       if not model.retriever:
+           raise ValueError("HTTP components resolver requires a retriever")
+       if not model.components_mapping:
+           raise ValueError("HTTP components resolver requires at least one component mapping")

        stream_slicer = self._build_stream_slicer_from_partition_router(model.retriever, config)

Also, consider being more specific with the return type instead of Any:

-    ) -> Any:
+    ) -> HttpComponentsResolver:

Line range hint 1330-1374: Consider breaking down the complex merge logic.

The _merge_stream_slicers method handles multiple cases and could benefit from being broken down into smaller, more focused methods. What do you think about extracting the cursor creation logic into separate methods?

For example:

def _create_global_substream_cursor(
    self, 
    incremental_sync_model, 
    stream_slicer: StreamSlicer,
    config: Config
) -> GlobalSubstreamCursor:
    cursor_component = self._create_component_from_model(
        model=incremental_sync_model, 
        config=config
    )
    return GlobalSubstreamCursor(
        stream_cursor=cursor_component, 
        partition_router=stream_slicer
    )

def _create_partition_with_global_cursor(
    self,
    incremental_sync_model,
    stream_slicer: StreamSlicer,
    config: Config
) -> PerPartitionWithGlobalCursor:
    cursor_component = self._create_component_from_model(
        model=incremental_sync_model, 
        config=config
    )
    return PerPartitionWithGlobalCursor(
        cursor_factory=CursorFactory(
            lambda: self._create_component_from_model(
                model=incremental_sync_model, 
                config=config
            ),
        ),
        partition_router=stream_slicer,
        stream_cursor=cursor_component,
    )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b136524 and 1d83663.

📒 Files selected for processing (8)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1 hunks)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (4 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (7 hunks)
  • airbyte_cdk/sources/declarative/resolvers/components_resolver.py (1 hunks)
  • airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
  • airbyte_cdk/sources/declarative/resolvers/components_resolver.py
  • airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py
🔇 Additional comments (8)
airbyte_cdk/sources/declarative/manifest_declarative_source.py (3)

33-34: LGTM! Import statement is well-placed.

The import of COMPONENTS_RESOLVER_TYPE_MAPPING is appropriately positioned with related imports.


241-242: LGTM! Validation logic properly handles both stream types.

The validation now correctly checks for either static streams or dynamic streams, which aligns well with the new dynamic stream capability.


124-127: ⚠️ Potential issue

Consider protecting against multiple calls to _stream_configs.

Hey! I noticed we're extending the stream configs directly. Since _stream_configs might be called multiple times, we could end up adding the same dynamic streams repeatedly. What do you think about creating a deep copy of the configs before combining them? Wdyt?

Here's a suggested approach:

-        stream_configs = self._stream_configs(self._source_config) + self._dynamic_stream_configs(
-            self._source_config, config
-        )
+        stream_configs = deepcopy(self._stream_configs(self._source_config)) + deepcopy(self._dynamic_stream_configs(
+            self._source_config, config
+        ))
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

33-36: LGTM! Clean import organization.

The new imports for HttpComponentsResolver and ComponentMappingDefinition are properly grouped with related resolvers.


475-476: LGTM! Factory mapping registration.

The new components are properly registered in the PYDANTIC_MODEL_TO_CONSTRUCTOR mapping.


2208-2223: LGTM! Well-structured component mapping definition.

The implementation properly handles:

  • Interpolated string creation for values
  • Field path interpolation
  • Value type conversion
  • Parameter passing
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

11-15: LGTM! The root schema changes look good.

The addition of anyOf constraint and dynamic_streams property maintains backward compatibility while adding support for dynamic streams. Nice work on ensuring that either streams or dynamic_streams must be present!

Also applies to: 26-29


2711-2756: Consider moving the experimental warning to the beginning of the description?

The experimental nature of this component is quite important for users to know upfront. Would you consider moving "This component is experimental. Use at your own risk." to the start of the description? This way, users will see it immediately before reading about the component's functionality, wdyt?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (6)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

1040-1069: Consider adding value type validation?

The ComponentMappingDefinition looks good! One thought - would it make sense to add a validator to ensure the provided value_type matches the actual type of the value when specified? This could help catch type mismatches early, wdyt?

@validator('value_type')
def validate_value_type(cls, v, values):
    if v and 'value' in values:
        # Add type validation logic here
        pass
    return v

Line range hint 1368-1429: Consider using inheritance to reduce duplication?

The current implementation splits the source definition into two separate models with a lot of duplicate fields. What do you think about using inheritance to share common fields between DeclarativeSource1 and DeclarativeSource2? Something like:

class BaseDeclarativeSource(BaseModel):
    type: Literal["DeclarativeSource"]
    check: CheckStream
    version: str
    schemas: Optional[Schemas] = None
    definitions: Optional[Dict[str, Any]] = None
    spec: Optional[Spec] = None
    # ... other common fields

class DeclarativeSource1(BaseDeclarativeSource):
    streams: List[DeclarativeStream]
    dynamic_streams: Optional[List[DynamicDeclarativeStream]] = None

class DeclarativeSource2(BaseDeclarativeSource):
    streams: Optional[List[DeclarativeStream]] = None
    dynamic_streams: List[DynamicDeclarativeStream]

This could make the code more maintainable and reduce the chance of inconsistencies when updating common fields. What are your thoughts?


1829-1850: Add more detailed documentation for dynamic streams?

The implementation looks solid! Would it be helpful to add more detailed documentation about:

  1. How the stream_template is used
  2. The lifecycle of component resolution
  3. Examples of common use cases

This could help other developers understand and use these features effectively. What do you think?

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

2711-2756: Two suggestions for the ComponentMappingDefinition schema.

  1. Would you consider moving the experimental warning to the start of the description for better visibility? This is consistent with the pattern used in other components.

  2. There seems to be an inconsistency in the field_path property - it uses interpolation_content while other properties use interpolation_context. Should we align these? wdyt?


2757-2781: LGTM! One minor suggestion for HttpComponentsResolver.

The component is well-defined with proper required fields and references. Would you consider moving the experimental warning to the start of the description for better visibility? wdyt?


2782-2800: Two suggestions for the DynamicDeclarativeStream schema.

  1. Would you consider moving the experimental warning to the start of the description for better visibility?

  2. The schema is missing the required field. Given that stream_template and components_resolver seem essential for this component to work, should we add:

required:
  - type
  - stream_template
  - components_resolver

What are your thoughts on this?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1d83663 and d0d7107.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (10 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

11-15: LGTM! The root schema changes look well-structured.

The addition of anyOf to enforce either streams or dynamic_streams must be present is a good approach. The dynamic_streams property is properly defined as an array of DynamicDeclarativeStream references.

Also applies to: 26-29

@lazebnyi lazebnyi requested a review from maxi297 November 29, 2024 01:14
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (7)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

1295-1319: Consider adding docstring to explain the method's purpose

The new helper method _build_stream_slicer_from_partition_router looks well-structured, but could benefit from documentation explaining its role and return value semantics. What do you think about adding a docstring?

def _build_stream_slicer_from_partition_router(
    self,
    model: Union[AsyncRetrieverModel, CustomRetrieverModel, SimpleRetrieverModel],
    config: Config,
) -> Optional[PartitionRouter]:
+    """
+    Builds a stream slicer from the partition router configuration if one exists.
+    
+    Args:
+        model: The retriever model containing partition router configuration
+        config: The connector configuration
+        
+    Returns:
+        PartitionRouter if the model has a partition_router configured, None otherwise
+    """

1320-1329: Consider adding docstring and improving variable naming

The _build_resumable_cursor_from_paginator method could be clearer with documentation and more descriptive variable names. What do you think about these suggestions?

def _build_resumable_cursor_from_paginator(
    self,
    model: Union[AsyncRetrieverModel, CustomRetrieverModel, SimpleRetrieverModel],
    stream_slicer: Optional[StreamSlicer],
) -> Optional[StreamSlicer]:
+    """
+    Creates a resumable cursor for pagination if needed.
+    
+    Args:
+        model: The retriever model containing paginator configuration
+        stream_slicer: Existing stream slicer if any
+        
+    Returns:
+        ResumableFullRefreshCursor if paginator exists and no stream slicer is present, None otherwise
+    """
-    if hasattr(model, "paginator") and model.paginator and not stream_slicer:
+    has_paginator = hasattr(model, "paginator") and model.paginator
+    if has_paginator and not stream_slicer:

2207-2258: Consider adding error handling for empty components_mapping

The create_http_components_resolver method looks solid, but might benefit from validation of the components_mapping list. What do you think about adding a check?

def create_http_components_resolver(
    self, model: HttpComponentsResolverModel, config: Config
) -> Any:
+    if not model.components_mapping:
+        raise ValueError("components_mapping cannot be empty for HttpComponentsResolver")
+
    stream_slicer = self._build_stream_slicer_from_partition_router(model.retriever, config)

Also, consider being more specific with the return type instead of Any:

-) -> Any:
+) -> HttpComponentsResolver:
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4)

2713-2713: Consider moving the experimental warning to the start of the description?

The experimental warning might be more visible if it's at the beginning of the description, similar to how it's done in other components. wdyt?


2759-2759: Consider moving the experimental warning to the start of the description?

The experimental warning might be more visible if it's at the beginning of the description, similar to how it's done in other components. wdyt?


2759-2760: Consider enhancing the component description?

The current description is quite brief. Would it be helpful to add more details about:

  • When to use this component
  • How it interacts with other components
  • Example usage scenarios

This could help users better understand its purpose and proper usage. wdyt?


2784-2784: Consider moving the experimental warning to the start of the description?

The experimental warning might be more visible if it's at the beginning of the description, similar to how it's done in other components. wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d0d7107 and f6542ec.

📒 Files selected for processing (3)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (7 hunks)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

33-36: LGTM: Clean import organization

The new imports for HttpComponentsResolver and ComponentMappingDefinition are properly grouped with related imports.


Line range hint 1330-1374: LGTM: Well-structured stream slicer merging logic

The refactored _merge_stream_slicers method has good separation of concerns and handles different cursor types appropriately. The early returns and clear conditionals make the flow easy to follow.


2208-2223: LGTM: Well-implemented component mapping definition

The create_components_mapping_definition method properly handles interpolation and type conversion. The use of list comprehension for field_path transformation is clean and efficient.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

11-15: LGTM! Good validation addition

The anyOf constraint ensures either streams or dynamic_streams must be present, which is a good validation to have.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
airbyte_cdk/sources/declarative/manifest_declarative_source.py (2)

125-127: Consider defensive copying for stream configurations

Hey! To prevent any potential side effects from list modifications, what do you think about using defensive copying here? Something like:

-        stream_configs = self._stream_configs(self._source_config) + self._dynamic_stream_configs(
-            self._source_config, config
-        )
+        stream_configs = list(self._stream_configs(self._source_config)) + list(self._dynamic_stream_configs(
+            self._source_config, config
+        ))

This ensures we're working with new list instances. Wdyt? 🤔


325-330: Consider enhancing error message clarity

The error message could be more helpful by including the available resolver types in the error message itself. What do you think about something like:

-                raise ValueError(
-                    f"Missing 'type' in components resolver configuration: {components_resolver_config}"
-                )
+                available_types = list(COMPONENTS_RESOLVER_TYPE_MAPPING.keys())
+                raise ValueError(
+                    f"Missing 'type' in components resolver configuration: {components_resolver_config}. "
+                    f"Available types are: {available_types}"
+                )
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2208-2225: Consider refining type annotations to eliminate type: ignore?

In the create_components_mapping_definition method, there's a # type: ignore[arg-type] comment for the field_path parameter. Perhaps adjusting the type annotations can remove the need for this comment, enhancing type safety and code clarity. Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f6542ec and 13d0d0f.

📒 Files selected for processing (6)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1 hunks)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (4 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (8 hunks)
  • airbyte_cdk/sources/declarative/resolvers/components_resolver.py (1 hunks)
  • airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
  • airbyte_cdk/sources/declarative/resolvers/components_resolver.py
  • airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py
🔇 Additional comments (5)
airbyte_cdk/sources/declarative/manifest_declarative_source.py (3)

42-42: LGTM! Import follows existing patterns.

The new import for COMPONENTS_RESOLVER_TYPE_MAPPING is well-placed within the existing import structure.


241-242: LGTM! Validation logic properly handles both stream types

The validation correctly checks for either static or dynamic streams, aligning with the new dynamic stream capability.


344-346: Verify template processing security

Could we verify that the template processing is secure against injection attacks?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

2226-2259: Great implementation of the create_http_components_resolver method!

The create_http_components_resolver method is well-crafted and integrates the component creation smoothly. Nice work!


Line range hint 1296-1375: The refactoring of stream slicer construction looks solid

Splitting _merge_stream_slicers into _build_stream_slicer_from_partition_router and _build_resumable_cursor_from_paginator improves modularity and readability. Great job!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

2713-2713: Consider moving the experimental warning to the start

The experimental nature of this component is important information that might be missed in the middle of the description. Would you consider moving it to the start? wdyt?

-    description: (This component is experimental. Use at your own risk.) Specifies a mapping definition to update or add fields in a record or configuration. This allows dynamic mapping of data by interpolating values into the template based on provided contexts.
+    description: |
+      THIS COMPONENT IS EXPERIMENTAL. USE AT YOUR OWN RISK.
+      
+      Specifies a mapping definition to update or add fields in a record or configuration. This allows dynamic mapping of data by interpolating values into the template based on provided contexts.

2759-2759: Consider moving the experimental warning to the start

Similar to ComponentMappingDefinition, would you consider moving the experimental warning to the start of the description for better visibility? wdyt?

-    description: Component resolve and populates stream templates with components fetched via an HTTP retriever. (This component is experimental. Use at your own risk.)
+    description: |
+      THIS COMPONENT IS EXPERIMENTAL. USE AT YOUR OWN RISK.
+      
+      Component resolve and populates stream templates with components fetched via an HTTP retriever.

2784-2784: Consider moving the experimental warning to the start

Similar to other components, would you consider moving the experimental warning to the start of the description for better visibility? wdyt?

-    description: A component that described how will be created declarative streams based on stream template. (This component is experimental. Use at your own risk.)
+    description: |
+      THIS COMPONENT IS EXPERIMENTAL. USE AT YOUR OWN RISK.
+      
+      A component that describes how declarative streams will be created based on stream template.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 13d0d0f and c478df5.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4 hunks)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

1332-1332: LGTM: Enhanced field_path flexibility with interpolation context

The addition of interpolation context to field_path allows for dynamic configuration, which is a nice improvement.


2726-2729: ⚠️ Potential issue

Consider enhancing field_path validation

The items schema for field_path looks incorrect. Should it be:

      field_path:
        type: array
        items:
-          - type: string
+          type: string

This matches the pattern used in other components. wdyt?

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
airbyte_cdk/sources/declarative/manifest_declarative_source.py (3)

241-242: Enhance validation error message clarity

The validation now correctly checks for either static or dynamic streams, but the error message only mentions streams. Should we update it to be more descriptive? Wdyt?

-            raise ValidationError(f"A valid manifest should have at least one stream defined. Got {streams}")
+            raise ValidationError(f"A valid manifest should have at least one stream defined (either in 'streams' or 'dynamic_streams'). Got streams: {streams}, dynamic_streams: {dynamic_streams}")

311-355: Consider breaking down the method for better maintainability

The method handles multiple responsibilities (validation, resolver creation, stream configuration). Would it make sense to extract some of these into helper methods? For example:

def _validate_resolver_config(self, components_resolver_config: Dict[str, Any]) -> str:
    """Validates resolver config and returns resolver type"""
    if not components_resolver_config:
        raise ValueError(...)
    resolver_type = components_resolver_config.get("type")
    if not resolver_type:
        raise ValueError(...)
    if resolver_type not in COMPONENTS_RESOLVER_TYPE_MAPPING:
        raise ValueError(...)
    return resolver_type

This could make the code more maintainable and testable. Wdyt?


337-338: Document caching behavior for dynamic streams

The code enables caching for all retrievers in dynamic streams, which is great for performance. Should we add a comment explaining this behavior and its implications? Something like:

+            # Enable caching for all retrievers in dynamic streams to improve performance
+            # when the same component needs to be resolved multiple times
             if "retriever" in components_resolver_config:
                 components_resolver_config["retriever"]["requester"]["use_cache"] = True
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c478df5 and 97a932a.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (4 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)

124-127: Consider thread-safety implications of concatenating stream configs

The concatenation of static and dynamic streams looks clean, but since streams() can be called multiple times, we should ensure thread-safety. Would it make sense to create a deep copy of the configs before concatenation to prevent any potential side effects? Wdyt?

@lazebnyi
Copy link
Contributor Author

lazebnyi commented Nov 29, 2024

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

2711-2756: Consider enhancing the experimental warning visibility

The experimental warning is currently embedded in the description. Would you consider moving it to the start for better visibility? This matches the pattern used in other experimental components. wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 97a932a and ce9539c.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4 hunks)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

11-15: LGTM! Clean implementation of the dynamic streams feature

The schema changes elegantly handle both traditional and dynamic streams by:

  1. Using anyOf to require either streams or dynamic_streams
  2. Defining dynamic_streams as an array of DynamicDeclarativeStream references

Also applies to: 26-29


1332-1332: LGTM! Enhanced field path interpolation capabilities

The addition of config, components_values, and stream_template_config to the interpolation context enables more flexible path extraction.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)

1034-1063: LGTM! The ComponentMappingDefinition model looks well-structured.

The model is well documented with clear examples and type definitions. I particularly like the flexibility provided by the optional value_type field that allows for both explicit and inferred typing.

One thought - would it make sense to add validation to ensure that when value_type is specified, the provided value matches that type? wdyt?

@validator('value')
def validate_value_type(cls, v, values):
    if 'value_type' in values and values['value_type']:
        # Add type validation logic here
        pass
    return v

1811-1820: The HttpComponentsResolver implementation looks clean and focused.

The model effectively combines a retriever with component mappings. The design is simple yet flexible enough to handle various resolution scenarios.

Quick thought - should we add validation to ensure components_mapping is not empty? wdyt?

@validator('components_mapping')
def validate_components_mapping(cls, v):
    if not v:
        raise ValueError('components_mapping cannot be empty')
    return v

1822-1832: The DynamicDeclarativeStream model looks well designed.

Good job on making the stream_template and components_resolver required fields. This ensures that all necessary components are present for dynamic stream generation.

Have you considered adding an optional description field to help users understand the purpose of each dynamic stream? wdyt?

description: Optional[str] = Field(
    None,
    description="A description of what this dynamic stream represents and how it's configured.",
)

Line range hint 1362-1423: Smart approach to handling backward compatibility with DeclarativeSource.

The use of Union type with DeclarativeSource1 and DeclarativeSource2 is a clean way to support both static and dynamic streams while maintaining backward compatibility.

One suggestion - would it be helpful to add a deprecation warning to DeclarativeSource1 to encourage migration to the newer format? wdyt?

class DeclarativeSource1(BaseModel):
    def __init__(self, **data):
        warnings.warn(
            "DeclarativeSource1 is deprecated. Please migrate to the new format supporting dynamic streams.",
            DeprecationWarning,
            stacklevel=2
        )
        super().__init__(**data)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ce9539c and 0160353.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4 hunks)

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pre-emptively approving but I would like the feedback from @lmossman on this eventually. We have marked the interfaces as experimental and will keep them as such until it is fully integrated with the Connector Builder so let's not stop our progress for that and we can always fix them later if needs be

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
unit_tests/sources/declarative/test_manifest_declarative_source.py (2)

74-82: Consider enhancing the docstring for better clarity, wdyt?

The docstring could provide more context about the fixture's purpose and usage. For example:

-    """Base manifest without streams or dynamic streams."""
+    """
+    Provides a base manifest configuration without streams or dynamic streams.
+    Used as a foundation for testing manifest validation scenarios.
+    """

83-134: Consider extracting nested configurations into separate fixtures for better reusability, wdyt?

The declarative_stream_config function contains deeply nested configurations that could be split into separate fixtures. This would improve maintainability and allow reuse in other test cases. For example:

  • requester_config
  • paginator_config
  • schema_loader_config
airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1)

Line range hint 11-91: Would you consider some readability improvements? 🤔

A couple of suggestions to make the code even more maintainable:

  1. What do you think about adding type hints to the mapping values? Something like:
DEFAULT_MODEL_TYPES: Mapping[str, Literal["DeclarativeStream", "HttpComponentsResolver", ...]] = {
  1. Maybe we could group related mappings together with comments? For example:
DEFAULT_MODEL_TYPES: Mapping[str, str] = {
    # Stream Components
    "DeclarativeStream.retriever": "SimpleRetriever",
    "DynamicDeclarativeStream.stream_template": "DeclarativeStream",
    
    # Resolver Components
    "HttpComponentsResolver.retriever": "SimpleRetriever",
    "HttpComponentsResolver.components_mapping": "ComponentMappingDefinition",
    # ...
}

What are your thoughts on these suggestions? They're not critical but might help with maintainability! 😊

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0160353 and 25e6e1f.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (1 hunks)
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py (1 hunks)
  • unit_tests/sources/declarative/test_manifest_declarative_source.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
  • unit_tests/sources/declarative/resolvers/test_http_components_resolver.py
🔇 Additional comments (4)
unit_tests/sources/declarative/test_manifest_declarative_source.py (2)

135-183: LGTM! Well-structured fixture reusing _declarative_stream

The fixture effectively demonstrates the composition of a DynamicDeclarativeStream by reusing the _declarative_stream fixture as a template.


629-659: LGTM! Well-structured test cases with clear scenarios

The test method effectively covers all validation scenarios:

  1. Fails when both streams and dynamic streams are missing
  2. Succeeds with regular streams
  3. Succeeds with dynamic streams

Good use of fixtures and clear assertions make the test cases easy to understand and maintain.

airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (2)

Line range hint 11-91: Clean integration with existing transformation logic! 👍

The new mappings integrate seamlessly with the existing DEFAULT_MODEL_TYPES, and the transformation logic will handle them automatically. The placement in DEFAULT_MODEL_TYPES (rather than CUSTOM_COMPONENTS_MAPPING) aligns well with their role as core components.


34-39: LGTM! The new mappings look well-structured.

The mappings for DynamicDeclarativeStream and HttpComponentsResolver follow the established pattern and create a clear hierarchy. Would you like me to verify that all the referenced types exist in the codebase? 🤔

✅ Verification successful

All referenced types exist and are properly defined in the codebase! 🎉

The verification shows that all the types referenced in the new mappings are present:

  • DeclarativeStream is defined in airbyte_cdk/sources/declarative/declarative_stream.py
  • HttpComponentsResolver is defined in airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py
  • SimpleRetriever exists and is widely used throughout the codebase
  • ComponentMappingDefinition is defined in airbyte_cdk/sources/declarative/resolvers/components_resolver.py

The mappings follow the established pattern and create proper type relationships. Looking good! 👍

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all referenced types exist in the codebase
# Search for class definitions of the referenced types

echo "Searching for referenced types..."
for type in "DeclarativeStream" "HttpComponentsResolver" "SimpleRetriever" "ComponentMappingDefinition"; do
    echo "Looking for $type..."
    ast-grep --pattern "class $type"
done

Length of output: 1024


Script:

#!/bin/bash
# Let's try a different approach using ripgrep to search for these types
# This will help us find not just class definitions but also imports and type hints

echo "Searching for DeclarativeStream..."
rg "DeclarativeStream" -A 2

echo "\nSearching for HttpComponentsResolver..."
rg "HttpComponentsResolver" -A 2

echo "\nSearching for SimpleRetriever..."
rg "SimpleRetriever" -A 2

echo "\nSearching for ComponentMappingDefinition..."
rg "ComponentMappingDefinition" -A 2

Length of output: 67933


Script:

#!/bin/bash
# Let's specifically look for the HttpComponentsResolver and ComponentMappingDefinition classes
# since they weren't clearly visible in the previous results

echo "Looking for HttpComponentsResolver class definition..."
fd -e py | xargs grep -l "class HttpComponentsResolver"

echo "\nLooking for ComponentMappingDefinition class definition..."
fd -e py | xargs grep -l "class ComponentMappingDefinition"

echo "\nLooking for imports of ComponentMappingDefinition..."
rg "from.*import.*ComponentMappingDefinition"

Length of output: 1024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants