Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(low-code cdk): add transformation to dynamic schema loader #176

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

lazebnyi
Copy link
Contributor

@lazebnyi lazebnyi commented Dec 16, 2024

What

During dynamic schema generation, we need to transform keys or add fields to enhance the schema's compatibility and functionality.

How

A schema transformations component has been integrated into the dynamic schema loader. This component is responsible for managing a list of transformations, enabling us to easily modify schema keys or insert additional fields as needed.

@github-actions github-actions bot added the enhancement New feature or request label Dec 16, 2024
Copy link
Contributor

coderabbitai bot commented Dec 16, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

This pull request introduces a new property named schema_transformations to the DynamicSchemaLoader within the declarative_component_schema.yaml file. This property allows for a list of transformations to be applied to the schema, including adding fields, removing fields, and converting keys. The changes enhance the flexibility of the DynamicSchemaLoader class by enabling users to define how the schema should be modified after extraction, integrating seamlessly into the existing schema structure.

Changes

File Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Added schema_transformations property to DynamicSchemaLoader definition
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Added schema_transformations field to DynamicSchemaLoader class
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Added create_keys_to_snake_transformation method; updated create_dynamic_schema_loader to include schema_transformations
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py Added schema_transformations field; updated get_json_schema method to apply transformations
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py Updated tests to include schema_transformations and modified key names in expected schema

Possibly related PRs

Suggested reviewers

  • maxi297
  • aldogonzalez8

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

2517-2526: LGTM! Consider adding clarifying documentation?

The implementation looks good and aligns well with the PR objectives. The new transformations property follows the same pattern as stream-level transformations.

What do you think about adding a description that clarifies when to use record selector transformations vs stream transformations? This could help users choose the appropriate level for their transformations. Something like:

      transformations:
        title: Transformations
-       description: A list of transformations to be applied to each output record.
+       description: A list of transformations to be applied to each output record during the record selection phase. Use these transformations when you need to modify records before they are processed by stream-level transformations.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 216cd43 and 9260b53.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (9 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (4 hunks)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

1510-1516: LGTM! Nice addition of the transformations field to RecordSelector.

The optional transformations field allows for flexible record manipulation through various transformation types. The implementation maintains backward compatibility since it's optional.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

1926-1931: LGTM! Clean implementation of transformation handling.

The implementation correctly creates transformation components from the models and appends them to the transformations list. The code is straightforward and follows the factory pattern consistently.

unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

182-185: LGTM! Great test coverage for the transformation functionality.

The tests thoroughly verify the transformation handling in both the schema parsing and component creation. The coverage includes both AddFields and RemoveFields transformations, ensuring the feature works as expected.

Also applies to: 302-309, 1315-1318, 1340-1340

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

1502-1508: LGTM! Consider adding an example to the field description?

The implementation looks good and aligns well with the PR objectives to allow transformations in the record selector. The type definition and documentation are clear.

Would you consider adding an example to the field description to make it more user-friendly? Something like:

examples=[{
    "transformations": [
        {"type": "AddFields", "fields": [{"path": ["extra_field"], "value": "{{ record['some_field'] }}"}]},
        {"type": "RemoveFields", "field_pointers": [["sensitive_data"]]}
    ]
}]

wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9260b53 and 6976fc7.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1 hunks)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)

182-185: LGTM! Consider enhancing test coverage?

The addition of transformations to the record selector looks good. Would you consider adding a test case with multiple field pointers to ensure the RemoveFields transformation handles multiple fields correctly? wdyt?


2250-2268: LGTM! Consider adding edge cases?

The test case for AddFields transformation with string value type is well-structured. Would you consider adding test cases for edge cases like empty strings or special characters? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6976fc7 and 62dc10c.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (10 hunks)
🔇 Additional comments (2)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)

2198-2201: LGTM! Clear and helpful comment

The comment effectively explains why transformations appear twice in the test results. This helps future maintainers understand the expected behavior.


2216-2234: LGTM! Well-structured test case

The test case for AddFields transformation without value type is thorough and clearly structured. The expected results are well-defined and the comment explains the duplication clearly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

1650-1656: Consider adding validation for the name field

The name field is marked as optional with a default empty string. Should we add some validation to ensure the name follows a specific pattern when provided? For example, ensuring it's a valid Python identifier? wdyt?

    name: Optional[str] = Field(
-        "", description="The stream name.", example=["Users"], title="Name"
+        "", 
+        description="The stream name.", 
+        example=["Users"], 
+        title="Name",
+        regex="^[a-zA-Z_][a-zA-Z0-9_]*$"
    )

Also applies to: 1657-1658


1925-1929: Consider adding type validation for partition router lists

The partition router can now be a single router or a list of routers. Should we add validation to ensure the list isn't empty when provided? wdyt?

    partition_router: Optional[
        Union[
            CustomPartitionRouter,
            ListPartitionRouter,
            SubstreamPartitionRouter,
            List[
                Union[
                    CustomPartitionRouter, ListPartitionRouter, SubstreamPartitionRouter
                ]
            ],
        ]
    ] = Field(
        [],
        description="PartitionRouter component that describes how to partition the stream, enabling incremental syncs and checkpointing.",
        title="Partition Router",
+       min_items=1
    )

Also applies to: 2003-2007

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62dc10c and 6aadd76.

📒 Files selected for processing (3)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (10 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🔇 Additional comments (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

1770-1780: LGTM! Schema definition matches implementation.

The transformations field schema definition in the YAML file correctly matches the Python implementation, allowing the same transformation types and maintaining consistency.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)

531-533: OAuth configuration enhancements look good!

The additions to OAuth configuration are well-documented with clear examples. The new fields enable better OAuth flow customization and user input handling.

Also applies to: 827-829, 898-912, 914-919


1848-1862: LGTM: Transformations field added to DynamicSchemaLoader

The transformations field matches the structure used in DeclarativeStream, maintaining consistency across the codebase.


1971-1973: LGTM: Optional download_extractor field

The download_extractor field is properly marked as optional with clear typing.


2071-2076: LGTM: Components resolver field in DynamicDeclarativeStream

The components_resolver field is well-typed with a clear description of its purpose.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (2)

107-107: Could you add a docstring?
This adds the list of transformations with a factory default. Perhaps consider adding a short docstring for clarity, wdyt?


133-134: Check if None is more explicit.
You're calling _transform(properties, {}). If it's not used, passing None might be clearer. Would you consider that, wdyt?

unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1)

67-71: Additional test coverage for schema transformations.
You introduced a new "KeysToSnakeCase" transformation. Maybe add another transformation variety to ensure robust coverage, wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

597-601: create_keys_to_snake_transformation method.
This direct return is simple. Would it help if we passed parameters to the transformation, or is this minimal signature enough, wdyt?

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

1770-1780: LGTM! The schema_transformations implementation looks clean and well-structured.

The implementation follows the existing patterns in the codebase and aligns perfectly with the PR objectives. One small suggestion: would you consider adding a default empty array value for consistency with other similar fields in the schema? wdyt?

      schema_transformations:
        title: Schema Transformations
        description: A list of transformations to be applied to the schema.
        type: array
+       default: []
        items:
          anyOf:
            - "$ref": "#/definitions/AddFields"
            - "$ref": "#/definitions/CustomTransformation"
            - "$ref": "#/definitions/RemoveFields"
            - "$ref": "#/definitions/KeysToLower"
            - "$ref": "#/definitions/KeysToSnakeCase"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6aadd76 and a241356.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (6 hunks)
  • airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (4 hunks)
  • unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (4 hunks)
🔇 Additional comments (13)
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (5)

7-7: Imports look good.
No issues found here, wdyt?


16-16: Direct import of RecordTransformation recognized.
Looks straightforward and helps ensure typed transformations. wdyt?


18-18: Added dependencies for Config, StreamSlice, and StreamState.
Including these type hints keeps the code more expressive. wdyt?


138-138: Schema snippet alignment.
The “properties” key is updated to the transformed schema. This is consistent with standard JSON Schema usage, wdyt?


141-153: Order of transformations is enforced implicitly.
All transformations are applied in a loop, ensuring the final state is cumulative. If the order matters, we may want more explicit documentation. Would that add clarity, wdyt?

unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (3)

238-238: Renamed property to 'first_name'.
This aligns with the transformation logic. Changes look correct, wdyt?


253-254: Sample data matches new property names.
This looks consistent with code changes. wdyt?


265-267: Confirm naming logic for "FirstName" → "first_name".
The test properly verifies the snake-case transformation. Looks good, wdyt?

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

1834-1848: New field for schema transformations.
This field makes transformations more flexible. Seems well-defined. wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)

239-241: New import for KeysToSnakeCaseModel.
Clear and consistent with the rest of the transformation imports. wdyt?


481-481: Mapping KeysToSnakeCase in the factory.
This integration ensures your transformation can be constructed correctly. wdyt?


1653-1659: schema_transformations list building.
You gather transformations into a list before injecting them. This is clean and readable, wdyt?


1674-1674: Passing the transformations into DynamicSchemaLoader.
This final assignment is straightforward and consistent. wdyt?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1)

248-250: How about making the transformation flow more explicit in the test?

The test correctly validates the transformation pipeline, but we could make it more self-documenting. What do you think about:

  1. Adding comments to explain the transformation flow:

    • PascalCase in schema response (FirstName)
    • Gets transformed to snake_case (first_name)
    • Static field gets added (static_field)
  2. Maybe split the test into smaller, focused test cases for each transformation? This would make it easier to maintain and debug, wdyt? 🤔

Also applies to: 264-265, 276-278

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a241356 and 193d59f.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (4 hunks)
🔇 Additional comments (1)
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1)

67-81: Consider adding more test cases for schema transformations?

The new schema transformations look good! However, we might want to strengthen our test coverage. What do you think about adding test cases for:

  • Failed transformations
  • Empty transformation lists
  • Multiple AddFields transformations
  • Invalid field definitions

This would help ensure robustness, wdyt? 🤔

@lazebnyi lazebnyi changed the title feat(low-code cdk): add transformation to record selector feat(low-code cdk): add transformation to schema loader Dec 18, 2024
@lazebnyi lazebnyi changed the title feat(low-code cdk): add transformation to schema loader feat(low-code cdk): add transformation to dynamic schema loader Dec 18, 2024
@lazebnyi lazebnyi requested a review from maxi297 December 18, 2024 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants