Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: (OAuthAuthenticator) - get the access_token, refresh_token, expires_in recursively from response #285

Merged
merged 7 commits into from
Jan 30, 2025

Conversation

bazarnov
Copy link
Contributor

@bazarnov bazarnov commented Jan 28, 2025

What

Resolving:

Currently the OAuthAuthenticator e.g DeclarativeOauth2Authenticator and SingleUseRefreshTokenOauth2Authenticator don't support the case when the access_token or similar field is nested under another entity like data or similar.
We need a way to handle this dynamically for the Customer and never think about what is the PATH for the target keys to set.

How

  • added new private method _find_and_get_value_from_response to pull out the target key recursively, up to 5 levels of nesting (by default)
  • covered and extended the unit_tests to reflect the change
  • refactored the code and corrected unit test accordingly to simplify to logic

User Impact

No impact is expected. This is not a breaking change.

Summary by CodeRabbit

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced OAuth token retrieval with support for nested JSON response structures.
    • Improved token refresh mechanism with more robust value extraction.
  • Bug Fixes

    • Added recursive search method to handle complex authentication response formats.
    • Refined error handling during token refresh operations.
  • Improvements

    • Increased flexibility in parsing authentication responses.
    • Enhanced error detection and handling for OAuth token retrieval.
    • Expanded test coverage for handling nested JSON responses in token retrieval.

@bazarnov bazarnov self-assigned this Jan 28, 2025
@bazarnov bazarnov marked this pull request as ready for review January 28, 2025 18:23
@github-actions github-actions bot added bug Something isn't working security labels Jan 28, 2025
@bazarnov bazarnov requested review from bnchrch and maxi297 January 28, 2025 18:24
Copy link
Contributor

coderabbitai bot commented Jan 28, 2025

📝 Walkthrough

Walkthrough

The pull request introduces a new exception class ResponseKeysMaxRecurtionReached and a method _find_and_get_value_from_response in the AbstractOauth2Authenticator class. This method allows for recursive searching of nested JSON structures for specific keys, enhancing the token retrieval process. The existing methods for refreshing access tokens are updated to utilize this new method, improving error handling related to maximum recursion depth. Additionally, the test suite is expanded to cover various scenarios involving nested JSON responses and recursion limits.

Changes

File Change Summary
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py Added ResponseKeysMaxRecurtionReached exception and _find_and_get_value_from_response method for recursive JSON key searching with depth control. Updated _get_refresh_access_token_response to _make_handled_request and added methods for token extraction and verification.
airbyte_cdk/sources/streams/http/requests_native_auth/oauth.py Updated refresh_access_token method to use new recursive search method for token extraction and modified method signature. Refactored configuration retrieval logic.
unit_tests/sources/streams/http/requests_native_auth/test_requests_native_auth.py Added test cases for nested JSON response handling, recursion depth limits, and updated existing comments for clarity.
unit_tests/connector_builder/test_connector_builder_handler.py Updated tests to patch _make_handled_request instead of _get_refresh_access_token_response, enhancing OAuth token handling tests.

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • maxi297
  • natikgadzhi
  • aldogonzalez8

Hey there! 👋 I noticed you've made some really cool improvements to the OAuth token retrieval process. The new recursive search method looks quite elegant! Wdyt about adding some additional comments explaining the max_depth parameter? It might help future developers understand the recursion limit. Cheers! 🚀

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2)

145-148: Consider improving the error message for better debugging?

The error message could include the actual path tried during the token extraction for better debugging. wdyt?

-            raise Exception(
-                "Token refresh API response was missing access token {self.get_access_token_name()}"
-            )
+            raise Exception(
+                f"Token refresh API response was missing access token '{self.get_access_token_name()}'. Response: {json.dumps(response_json)}"
+            )

301-352: Consider some optimizations for better performance?

The implementation is solid, but here are some suggestions that might improve performance, wdyt?

  1. Consider using a set for faster key lookups in deeply nested structures
  2. Add early return if the response is None or empty
  3. Consider caching found paths for subsequent lookups
 def _find_and_get_value_from_response(
     self,
     response_data: Mapping[str, Any],
     key_name: str,
     max_depth: int = 5,
     current_depth: int = 0,
 ) -> Any:
+    if not response_data:
+        return None
+
     if current_depth > max_depth:
         message = f"The maximum level of recursion is reached. Couldn't find the speficied `{key_name}` in the response."
         raise AirbyteTracedException(
             internal_message=message, message=message, failure_type=FailureType.config_error
         )

     if isinstance(response_data, dict):
         if key_name in response_data:
             return response_data[key_name]

         for _, value in response_data.items():
             result = self._find_and_get_value_from_response(
                 value, key_name, max_depth, current_depth + 1
             )
             if result is not None:
                 return result

     elif isinstance(response_data, list):
         for item in response_data:
             result = self._find_and_get_value_from_response(
                 item, key_name, max_depth, current_depth + 1
             )
             if result is not None:
                 return result

     return None
unit_tests/sources/streams/http/requests_native_auth/test_requests_native_auth.py (1)

319-351: Consider adding more edge case tests?

The max depth test is great! Would you consider adding tests for these scenarios as well?

  1. Empty response
  2. Response with circular references
  3. Response with array of arrays
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9d93ab and 5f5d921.

📒 Files selected for processing (3)
  • airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (4 hunks)
  • airbyte_cdk/sources/streams/http/requests_native_auth/oauth.py (1 hunks)
  • unit_tests/sources/streams/http/requests_native_auth/test_requests_native_auth.py (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (5)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2)

170-171: LGTM! Good error handling.

Preserving the original AirbyteTracedException maintains the error context and failure type.


183-186: LGTM! Consistent implementation.

Using the same extraction method for both fields ensures consistent behavior.

airbyte_cdk/sources/streams/http/requests_native_auth/oauth.py (1)

343-345: LGTM! Clean implementation.

The changes maintain consistency with the parent class while supporting nested token structures.

unit_tests/sources/streams/http/requests_native_auth/test_requests_native_auth.py (2)

280-289: LGTM! Good test coverage.

The test case properly validates the extraction of nested tokens and their types.


291-318: LGTM! Thorough edge case testing.

Great job testing deeply nested structures with various intermediate fields.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2)

27-32: Fix typo in class name

There's a typo in the class name: "Recurtion" should be "Recursion". What do you think about renaming it to ResponseKeysMaxRecursionReached? 🤔

-class ResponseKeysMaxRecurtionReached(AirbyteTracedException):
+class ResponseKeysMaxRecursionReached(AirbyteTracedException):

308-359: Great implementation! A couple of minor suggestions:

  1. There's a typo in the error message: "speficied" should be "specified".
  2. Would you consider adding a return type hint to make the method signature more complete? The docstring mentions it returns Any.
    def _find_and_get_value_from_response(
        self,
        response_data: Mapping[str, Any],
        key_name: str,
        max_depth: int = 5,
        current_depth: int = 0,
-    ) -> Any:
+    ) -> Any:  # Consider being more specific about return type if possible
-            message = f"The maximum level of recursion is reached. Couldn't find the speficied `{key_name}` in the response."
+            message = f"The maximum level of recursion is reached. Couldn't find the specified `{key_name}` in the response."
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f5d921 and 8afeb66.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (5 hunks)
  • unit_tests/sources/streams/http/requests_native_auth/test_requests_native_auth.py (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (6)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (2)

Line range hint 152-168: LGTM! Good error handling 👍

The integration with _find_and_get_value_from_response is clean, and I appreciate that you're explicitly handling the recursion exception.


190-193: Clean implementation!

The method now elegantly handles nested tokens while maintaining its interface.

unit_tests/sources/streams/http/requests_native_auth/test_requests_native_auth.py (4)

Line range hint 263-272: Great test coverage! 🎯

Good job testing different expires_in formats and their type preservation.


283-292: Good test for basic nesting! 👍

Clear test case demonstrating single-level nested token retrieval.


294-320: Excellent edge case coverage! 🌟

Great job testing deep nesting with a mix of different fields. This will help catch any regression issues.


322-354: Don't forget to update the test

After fixing the typo in the exception class name from ResponseKeysMaxRecurtionReached to ResponseKeysMaxRecursionReached, this test will need to be updated too. Otherwise, great job testing the max depth scenario! 🎯

-        with pytest.raises(ResponseKeysMaxRecurtionReached) as exc_info:
+        with pytest.raises(ResponseKeysMaxRecursionReached) as exc_info:

Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question but otherwise pre-approving

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (3)

28-33: Fix typo in class name: "Recurtion" → "Recursion"

The class name contains a typo. What do you think about renaming it to ResponseKeysMaxRecursionReached to match the standard spelling? 🤔

-class ResponseKeysMaxRecurtionReached(AirbyteTracedException):
+class ResponseKeysMaxRecursionReached(AirbyteTracedException):

320-371: Consider adding type hints for better code maintainability

The method could benefit from more specific type hints. What do you think about using Union[dict, list] for response_data to better reflect its supported types? Also, there's a typo in "speficied". 🤔

    def _find_and_get_value_from_response(
        self,
-       response_data: Mapping[str, Any],
+       response_data: Union[Mapping[str, Any], List[Any]],
        key_name: str,
        max_depth: int = 5,
        current_depth: int = 0,
    ) -> Any:
-            message = f"The maximum level of recursion is reached. Couldn't find the speficied `{key_name}` in the response."
+            message = f"The maximum level of recursion is reached. Couldn't find the specified `{key_name}` in the response."

229-256: Fix missing f-string prefix in error message

There's a missing 'f' prefix in the error message string. Would you like to fix it? 🔍

-                    "Token refresh API response was missing access token {self.get_access_token_name()}"
+                    f"Token refresh API response was missing access token {self.get_access_token_name()}"
unit_tests/sources/streams/http/requests_native_auth/test_requests_native_auth.py (2)

284-354: Consider using more descriptive test case labels

The nested token test cases are comprehensive, but what do you think about making them more descriptive using pytest's parametrize? This would make the test report more readable. 🤔

    @pytest.mark.parametrize(
+       "test_name, response_data, expected_result",
+       [
+           (
+               "single_level_nesting",
+               {"data": {"access_token": "access_token_nested", "expires_in": "2001"}},
+               ("access_token_nested", "2001"),
+           ),
+           (
+               "deep_nesting_within_limits",
+               {"data": {"data2": {"data3": {"data4": {"data5": {"access_token": "token", "expires_in": "2002"}}}}}},
+               ("token", "2002"),
+           ),
+           # ... more cases
+       ],
+   )
+   def test_nested_token_extraction(self, mocker, test_name, response_data, expected_result):

323-354: Add explicit documentation about max depth limit

The test case for maximum recursion is good, but what do you think about adding a comment explaining why 6 levels of nesting triggers the exception? This would make it clearer that we're testing against the default max_depth of 5. 🤔

-                                    # this is the edge case, but worth testing.
+                                    # Testing recursion limit: 6 levels of nesting exceeds default max_depth of 5
airbyte_cdk/sources/streams/http/requests_native_auth/oauth.py (3)

Line range hint 178-199: Consider adding parameter validation in the constructor?

The constructor has great documentation now, but what do you think about adding validation for critical parameters like token_refresh_endpoint and connector_config to fail fast if they're invalid or missing? wdyt?

 def __init__(
     self,
     connector_config: Mapping[str, Any],
     token_refresh_endpoint: str,
     # ... other parameters ...
 ) -> None:
+    if not connector_config:
+        raise ValueError("connector_config is required")
+    if not token_refresh_endpoint:
+        raise ValueError("token_refresh_endpoint is required")
+    if not token_refresh_endpoint.startswith(("http://", "https://")):
+        raise ValueError("token_refresh_endpoint must be a valid HTTP(S) URL")
     
     self._connector_config = connector_config

372-402: Consider caching frequently accessed config values?

For values that are accessed often (like client_id, client_secret), what do you think about caching them after the first retrieval to avoid repeated dpath lookups? This could improve performance, especially with deeply nested configs. wdyt?

 class SingleUseRefreshTokenOauth2Authenticator(Oauth2Authenticator):
+    _config_cache: Dict[str, Any] = {}
     
     def _get_config_value_by_path(
         self, config_path: Union[str, Sequence[str]], default: Optional[str] = None
     ) -> str | Any:
+        cache_key = str(config_path)
+        if cache_key in self._config_cache:
+            return self._config_cache[cache_key]
         
-        return dpath.get(
+        value = dpath.get(
             self._connector_config,
             config_path,
             default=default if default is not None else "",
         )
+        self._config_cache[cache_key] = value
+        return value

405-430: Address the FIXME comment about deprecated function?

There's a FIXME comment about emit_configuration_as_airbyte_control_message being deprecated. Should we create a follow-up issue to remove this technical debt and fully migrate to airbyte_cdk.sources.message? wdyt?

Would you like me to create an issue to track this deprecation removal?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8afeb66 and 43a71f1.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (6 hunks)
  • airbyte_cdk/sources/streams/http/requests_native_auth/oauth.py (8 hunks)
  • unit_tests/connector_builder/test_connector_builder_handler.py (1 hunks)
  • unit_tests/sources/streams/http/requests_native_auth/test_requests_native_auth.py (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (4)
airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py (1)

133-145: LGTM! Clean refactoring of token refresh logic

The separation of token extraction logic into dedicated methods improves code organization and maintainability.

unit_tests/connector_builder/test_connector_builder_handler.py (1)

603-603: LGTM! Good test coverage for OAuth token refresh

The test properly validates the integration between the connector builder and the OAuth token refresh mechanism.

airbyte_cdk/sources/streams/http/requests_native_auth/oauth.py (2)

358-370: LGTM! Clean and well-documented implementation.

The token refresh logic is now more robust with proper error handling and clear separation of concerns.


199-204: Consider improving type safety in config value retrieval?

The return type hint str | Any in _get_config_value_by_path could lead to runtime errors if non-string values are retrieved. What do you think about making it more specific or adding type checking? wdyt?

@bazarnov bazarnov merged commit 65e6a0d into main Jan 30, 2025
22 of 24 checks passed
@bazarnov bazarnov deleted the baz/cdk/aouth-extract-tokens-recursively branch January 30, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants