Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: (CDK) (HttpRequester) - Make the HttpRequester.path optional #370

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bazarnov
Copy link
Contributor

@bazarnov bazarnov commented Feb 27, 2025

What

Resolving:

How

  • For the airbyte_cdk/sources/declarative/declarative_component_schema.yaml:

    • Removed the path as a required property.
    • Added new interpolation contexts such as next_page_token, stream_interval, stream_partition, etc.
    • Updated examples to include new interpolation contexts in URL paths.
  • For the airbyte_cdk/sources/declarative/models/declarative_component_schema.py:

    • Made the path property optional in the HttpRequester class.
    • Added new examples for the path property.
  • For the airbyte_cdk/sources/declarative/requesters/http_requester.py:

    • Added EmptyString and made path optional in the HttpRequester class.
    • Implemented a method to handle empty string interpolation.
    • Added detailed docstrings and methods to handle URL base and path interpolation.
  • For the airbyte_cdk/sources/declarative/requesters/requester.py:

    • Updated get_url_base method to accept and use interpolation contexts.
  • For the airbyte_cdk/sources/types.py:

    • Introduced EmptyString as a type.
  • For the unit_tests/sources/declarative/requesters/test_http_requester.py:

    • Adjusted tests to match new URL handling, ensuring no trailing slashes.

User Impact

This update introduces the Requester interface change, which adds the ability for get_url_base() to utilize the interpolation context passed, such as stream_state, stream_slice / stream_interval, and next_page_token.

Sources that reuse the interface may need to update the interface for the get_url_base() method if it has been overridden. Otherwise, we anticipate no breaking changes from this CDK update.

Summary by CodeRabbit

  • New Features
    • Introduced dynamic API configuration options by supporting additional interpolation values (e.g., pagination tokens, stream intervals, partitions, and slices) for constructing URLs.
  • Enhancements
    • Improved URL formation for better consistency by allowing optional endpoint paths and more reliable handling of trailing slashes.
  • Tests
    • Adjusted test expectations to reflect the refined URL construction and formatting.

@bazarnov bazarnov self-assigned this Feb 27, 2025
@github-actions github-actions bot added bug Something isn't working security labels Feb 27, 2025
Copy link
Contributor

coderabbitai bot commented Feb 27, 2025

📝 Walkthrough

Walkthrough

This PR updates the HTTP request construction logic. In the declarative component schema and its corresponding model, the path attribute is transitioned from a required field to an optional one and enhanced with additional interpolation contexts. In the requesters’ implementation, method signatures are updated to accept optional parameters (stream_state, stream_slice, next_page_token), a new _get_interpolation_context method is introduced, and URL joining logic is refined (e.g., handling trailing slashes). Additionally, a new variable (EmptyString) is added and unit tests have been updated to reflect these changes.

Changes

File(s) Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
Removed the required path property (in YAML) and changed it to an optional attribute (in Python) with a default of None; added new interpolation contexts (next_page_token, stream_interval, stream_partition, stream_slice, creation_response, polling_response, download_target) to both url_base and path; updated examples.
airbyte_cdk/sources/declarative/requesters/http_requester.py
airbyte_cdk/sources/declarative/requesters/requester.py
Updated method signatures to include optional parameters (stream_state, stream_slice, next_page_token); introduced _get_interpolation_context for creating interpolation data; improved URL construction logic (including handling of trailing and duplicate slashes).
airbyte_cdk/sources/types.py Added new variable EmptyString initialized as an empty string.
unit_tests/sources/declarative/requesters/test_http_requester.py Updated test assertions by removing the trailing slash in the expected URL output.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant HR as HttpRequester
    participant R as Requester
    C->>HR: Call get_url_base(stream_state, stream_slice, next_page_token)
    HR->>HR: Build interpolation context (_get_interpolation_context)
    HR->>HR: Format 'url_base' with context
    HR->>HR: Evaluate optional 'path'
    HR->>HR: Call _join_url to merge base and path
    HR-->>C: Return constructed URL
    C->>R: Call get_url_base with new parameters
    R-->>C: Return URL
Loading

Possibly related PRs

Suggested labels

enhancement, bug

Suggested reviewers

  • lazebnyi – wdyt?
  • maxi297 – wdyt?
✨ Finishing Touches
  • 📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/requesters/http_requester.py (1)

118-134: Good refactoring with _get_interpolation_context method.

Extracting the interpolation context creation logic into a separate method reduces duplication and centralizes this functionality. I like how you're handling the extra_fields from stream_slice too.

One minor suggestion: would it make sense to also include stream_state in the returned context since it's passed as a parameter? wdyt?

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

1769-1775: Enhance URL base interpolation contexts.
I see you've added additional interpolation keys (next_page_token, stream_interval, stream_partition, stream_slice, creation_response, polling_response, and download_target) to the url_base property. This should provide extra flexibility for dynamic URL construction. Have you verified that all downstream components correctly support these additional contexts? wdyt?


1781-1798: Make the path argument optional.
I noticed that the path property now mirrors many aspects of the url_base (including multiple interpolation contexts) and is no longer listed as a required property. This change aligns with the PR objective to make the path argument optional. Would you consider adding a brief note in the description explicitly stating that this field is now optional to help guide implementers? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 406542d and 0235a1c.

📒 Files selected for processing (6)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1 hunks)
  • airbyte_cdk/sources/declarative/requesters/http_requester.py (6 hunks)
  • airbyte_cdk/sources/declarative/requesters/requester.py (1 hunks)
  • airbyte_cdk/sources/types.py (1 hunks)
  • unit_tests/sources/declarative/requesters/test_http_requester.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • airbyte_cdk/sources/types.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (11)
unit_tests/sources/declarative/requesters/test_http_requester.py (1)

828-828: This change aligns with the URL construction updates.

The test now verifies that trailing slashes from paths are correctly removed in the final URL, which makes the URL construction more consistent with common practices.

airbyte_cdk/sources/declarative/requesters/requester.py (1)

38-44: Good enhancement of the get_url_base method signature.

The method now accepts additional context parameters (stream_state, stream_slice, next_page_token), making it consistent with other methods like get_path. This allows for dynamic URL construction based on runtime context, enabling more flexible API integration patterns. This is the core change that addresses the PR objective.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

2051-2053: Enhanced examples for url_base with interpolation contexts.

These new examples effectively demonstrate how to use the new capabilities for dynamic URL construction, including stream partition IDs and next page tokens. Great additions that will help users understand how to use the feature.


2057-2058: Making path optional is the key change in this PR.

By changing path from required to optional with a default value of None, you've successfully implemented the core requirement of the PR. This provides more flexibility for URL construction while maintaining backward compatibility.

airbyte_cdk/sources/declarative/requesters/http_requester.py (6)

28-28: Nice addition of EmptyString to import statement.

I see you've added EmptyString to the imports. This will be used to handle the new optional path parameter, making the code more robust when no path is provided.


55-55: Good design choice making path optional.

Making the path parameter optional aligns well with the PR objective of allowing more flexibility in URL construction, especially when only the base URL might be sufficient.


70-72: Nice handling of optional path value.

Using EmptyString as a fallback when path is None is a clean solution. This ensures that the interpolation logic works consistently whether a path is provided or not.


135-148: Great enhancement to get_url_base.

The method now accepts the same parameters as get_path, providing consistency in the interface. Using the new interpolation context method ensures that dynamic values from stream_slice and next_page_token can now be used in the URL base construction.

The use of os.path.join with EmptyString is clever to ensure we get a properly formatted URL.


149-162: Nicely updated get_path method.

Making the parameters optional with default values of None maintains backward compatibility while adding new functionality. The method now leverages the shared interpolation context logic, making the code more maintainable.


359-372: Excellent docstring addition to _join_url.

The detailed docstring clearly explains the purpose and behavior of this method, particularly regarding trailing slash handling. This kind of documentation is very helpful for both users and maintainers of the code.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

1778-1780: Update URL base examples for dynamic interpolation.
The new examples now clearly illustrate the dynamic generation of the URL base (using expressions like {{ stream_partition['id'] }} and {{ next_page_token['id'] }}). Do these updated examples fully capture your use cases for various API configurations? wdyt?

@bazarnov bazarnov changed the title fix: (CDK) (HttpRequester) - Make the path arg optional fix: (CDK) (HttpRequester) - Make the HttpRequester.path optional Feb 27, 2025
@bazarnov bazarnov requested review from bnchrch, maxi297 and lmossman and removed request for bnchrch February 27, 2025 16:09
Copy link
Contributor

@lmossman lmossman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAML changes look good to me! Thanks Baz

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shared a concern which I think is blocking for this PR

**(
stream_slice.extra_fields
if stream_slice is not None and hasattr(stream_slice, "extra_fields")
else {}
),
}
path = str(self._path.eval(self.config, **kwargs))

def get_url_base(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this can work. It seems like we query get_url_base here in order to create the paginator. This is needed to remove the base_url from the next_page_token here. This means that we either need to:

  • not rely on params we don't have at the start of the main.py
  • find another way to do that in the paginator

I would prefer to do the first option as I don't think we have a case to interpolate on anything than the config, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants