Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(concurrent): do not raise when record does not have cursor value in concurrent cursor #96

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Dec 2, 2024

Partially resolves:

  • airbytehq/airbyte-internal-issues#10550

What

Following https://github.com/airbytehq/airbyte-internal-issues/issues/10550 change on source-pipedrive, we have seen issues when the cursor field was not provided.

For now, we will assume that we need to sync those records but they won't influence the most_recent_cursor_value

How

By not raising if the cursor value is not provided

Summary by CodeRabbit

  • New Features

    • Improved error handling and logging for cursor value extraction in the ConcurrentCursor class.
  • Bug Fixes

    • Enhanced stability by ensuring that the system does not raise exceptions when no cursor value is present.
  • Tests

    • Added a new test to verify that the observe method behaves correctly when no cursor value is provided, along with updates to existing tests for comprehensive coverage.

Copy link
Contributor

coderabbitai bot commented Dec 2, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes in this pull request focus on enhancing error handling and logging mechanisms within the Cursor and ConcurrentCursor classes in the airbyte_cdk/sources/streams/concurrent/cursor.py file. A new method for logging warnings related to missing cursor values has been introduced, and existing methods have been modified to utilize this new logging approach. Additionally, the test suite for the ConcurrentCursor class has been expanded to include tests that verify the behavior of the observe method when no cursor value is present.

Changes

File Path Change Summary
airbyte_cdk/sources/streams/concurrent/cursor.py - Updated observe method in ConcurrentCursor to handle ValueError with a new logging method.
- Added _log_for_record_without_cursor_value method to centralize logging for missing cursor values.
- Modified should_be_synced to call the new logging method.
unit_tests/sources/streams/concurrent/test_cursor.py - Added test_given_no_cursor_value_when_observe_then_do_not_raise to verify no exceptions on missing cursor value.
- Updated existing tests for cursor state handling and message emission.

Possibly related PRs

Suggested reviewers

  • aaronsteers
  • brianjlai

What do you think about the suggested reviewers? Do they align well with the changes made?


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b1fc031 and 4a5c21c.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/streams/concurrent/cursor.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/streams/concurrent/cursor.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
airbyte_cdk/sources/streams/concurrent/cursor.py (2)

219-225: LGTM! Consider being more specific with the exception handling

The error handling looks good and aligns well with the PR objective. Would you consider catching ValueError specifically from extract_value to be more precise? Something like this, wdyt?

 try:
     cursor_value = self._extract_cursor_value(record)
     if most_recent_cursor_value is None or most_recent_cursor_value < cursor_value:
         self._most_recent_cursor_value_per_partition[record.associated_slice] = cursor_value
-except ValueError:
+except ValueError as e:
+    if "Could not find cursor field" not in str(e):
+        raise
     self._log_for_record_without_cursor_value()

469-474: LGTM! Consider log level adjustment

Great implementation of the logging mechanism! The flag to prevent duplicate warnings is a nice touch. Would you consider using INFO level instead of WARNING since this is now an expected behavior rather than a warning condition? Just a thought, wdyt?

-            LOGGER.warning(
+            LOGGER.info(
unit_tests/sources/streams/concurrent/test_cursor.py (1)

104-117: Test looks good! Consider enhancing the coverage

The test effectively verifies that no exception is raised, but we could make it even more robust. Would you consider adding these assertions, wdyt?

     cursor.observe(
         Record(
             data={"record_with_A_CURSOR_FIELD_KEY": "any value"},
             associated_slice=partition.to_slice(),
             stream_name=_A_STREAM_NAME,
         )
     )

-    # did not raise
+    # Verify logger was triggered
+    assert cursor._should_be_synced_logger_triggered is True
+    
+    # Verify no cursor value was stored
+    assert partition.to_slice() not in cursor._most_recent_cursor_value_per_partition
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d242f79 and 88d55fd.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/streams/concurrent/cursor.py (2 hunks)
  • unit_tests/sources/streams/concurrent/test_cursor.py (1 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/streams/concurrent/cursor.py (1)

465-467: LGTM! Consistent error handling approach

The error handling here mirrors the approach in observe(), maintaining consistency in how missing cursor values are handled. The decision to sync records without cursor values makes sense for backward compatibility.

Copy link
Contributor

@darynaishchenko darynaishchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Tested on source-pipedrive, records without cursor field were synced successfully

@maxi297 maxi297 changed the title [ISSUE #10550] do not raise when record does not have cursor value in… feature(concurrent): [ISSUE #10550] do not raise when record does not have cursor value in… Dec 2, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
airbyte_cdk/sources/streams/concurrent/cursor.py (3)

230-236: Consider adding return type hint to improve type safety.

The error handling looks good! Would you consider adding a return type hint to make the method signature more explicit? wdyt?

-    def observe(self, record: Record) -> None:
+    def observe(self, record: Record) -> None:

476-478: Consider updating the docstring to reflect new behavior.

The method now returns True when a cursor value is missing. Would you consider updating the docstring to document this important behavior? Something like:

     def should_be_synced(self, record: Record) -> bool:
         """
         Determines if a record should be synced based on its cursor value.
         :param record: The record to evaluate
-
-        :return: True if the record's cursor value falls within the sync boundaries
+        :return: True if the record's cursor value falls within the sync boundaries or if the cursor value is missing
         """

480-485: Consider adding a docstring to the new method.

The implementation looks good! Would you consider adding a docstring to explain the purpose of the flag and the logging behavior? Something like:

     def _log_for_record_without_cursor_value(self) -> None:
+        """
+        Logs a warning when a cursor value is missing in a record.
+        The warning is logged only once per stream to prevent log spam.
+        """
         if not self._should_be_synced_logger_triggered:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 88d55fd and b1fc031.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/streams/concurrent/cursor.py (4 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/streams/concurrent/cursor.py (1)

189-189: LGTM! Type annotation enhancement looks good.

The expanded type annotation better reflects the possible key types and aligns with the PR's goal of handling records without cursor values.

@maxi297 maxi297 changed the title feature(concurrent): [ISSUE #10550] do not raise when record does not have cursor value in… feat(concurrent): [ISSUE #10550] do not raise when record does not have cursor value in… Dec 2, 2024
@github-actions github-actions bot added the enhancement New feature or request label Dec 2, 2024
@maxi297 maxi297 merged commit 72202ee into main Dec 3, 2024
23 of 24 checks passed
@maxi297 maxi297 deleted the issue-10550/do-not-break-when-record-does-not-have-cursor-value-on-concurrent branch December 3, 2024 14:04
@aaronsteers aaronsteers changed the title feat(concurrent): [ISSUE #10550] do not raise when record does not have cursor value in… feat(concurrent): do not raise when record does not have cursor value in concurrent cursor Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants