Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notion - updated-page source improvements #14045

Merged
merged 6 commits into from
Sep 23, 2024

Conversation

michelle0927
Copy link
Collaborator

@michelle0927 michelle0927 commented Sep 20, 2024

Summary by CodeRabbit

  • New Features

    • Enhanced the notion-updated-page component to improve page update handling.
    • Introduced hashing for property values using the md5 library.
    • Added a new method for event emission to streamline event handling.
  • Version Updates

    • Updated component version from 0.0.16 to 0.0.17.
    • Updated dependency version in @pipedream/notion from 0.1.21 to 0.1.22.

@michelle0927 michelle0927 self-assigned this Sep 20, 2024
Copy link

vercel bot commented Sep 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
docs-v2 ⬜️ Ignored (Inspect) Visit Preview Sep 23, 2024 6:21pm
pipedream-docs ⬜️ Ignored (Inspect) Sep 23, 2024 6:21pm
pipedream-docs-redirect-do-not-edit ⬜️ Ignored (Inspect) Sep 23, 2024 6:21pm

Copy link
Contributor

coderabbitai bot commented Sep 20, 2024

Walkthrough

The pull request introduces updates to the @pipedream/notion component, including an increment in the version number of the package from "0.1.21" to "0.1.22" and the addition of the md5 dependency. The notion-updated-page source file is modified to incorporate hashing for property values, enhance event emission logic, and refine timestamp handling, with the version number updated from "0.0.16" to "0.0.17".

Changes

File Change Summary
components/notion/package.json - Version updated from "0.1.21" to "0.1.22".
- Dependency added: "md5": "^2.3.0".
components/notion/sources/updated-page.mjs - Version updated from "0.0.16" to "0.0.17".
- Added import for md5 library.
- Modified deploy and run methods to use hashed property values and refined event emission logic.
- New method emitEvent(page) added.

Possibly related PRs

  • Notion - updated page trigger fix #13972: This PR updates the version number and modifies the functionality of the updated-page component in the Notion integration, which is directly related to the changes made in the main PR that also involves updates to the package.json and the updated-page.mjs file.

Suggested reviewers

  • jcortes
  • lcaresia

Poem

In the meadow, changes hop,
A version up, we’ll never stop!
With md5 we hash and play,
Emitting events in a clever way.
Notion's pages, fresh and bright,
Let’s celebrate this coding delight! 🐇✨


Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c0657fa and 9e1e3a9.

Files selected for processing (1)
  • components/notion/sources/updated-page/updated-page.mjs (5 hunks)
Files skipped from review as they are similar to previous changes (1)
  • components/notion/sources/updated-page/updated-page.mjs

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Outside diff range and nitpick comments (1)
components/notion/sources/updated-page/updated-page.mjs (1)

60-62: Limit event emission during deployment

Emitting events for the first 25 pages during deployment might be excessive. Consider reducing the number or making it configurable to prevent overwhelming the system with events.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d83f8ee and 81a2f28.

Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
Files selected for processing (2)
  • components/notion/package.json (2 hunks)
  • components/notion/sources/updated-page/updated-page.mjs (5 hunks)
Additional comments not posted (6)
components/notion/package.json (2)

3-3: Version update looks good!

The minor version increment aligns with the PR objective of adding improvements while maintaining backwards compatibility.


20-20: The addition of the "md5" dependency looks good.

The "md5" package is a popular, lightweight, and stable choice for MD5 hashing in JavaScript. The version specification "^2.3.0" is reasonable.

Please verify that this dependency is being used somewhere in the codebase. If it's unused, consider removing it to keep the dependencies lean.

Run the following script to search for the import or require statements of this package:

components/notion/sources/updated-page/updated-page.mjs (4)

5-5: Ensure 'md5' package is added to dependencies

The md5 module is imported, but please verify that it's included in your project's package.json under dependencies to prevent runtime errors.

Run the following script to check if md5 is listed in package.json:

#!/bin/bash
# Description: Check if 'md5' is listed under dependencies in package.json

grep '"md5"' package.json

46-47: Confirm implementation of lastUpdatedSortParam() method

The method lastUpdatedSortParam() is used to obtain sorting parameters for fetching pages. Ensure that this method is correctly implemented and accessible within the scope of this component.

Run the following script to verify the definition of lastUpdatedSortParam():

#!/bin/bash
# Description: Search for the definition of 'lastUpdatedSortParam' method

rg --type js --type mjs --pcre2 'lastUpdatedSortParam\s*\('

Also applies to: 102-102


53-53: Consider potential issues with JSON serialization order

Using JSON.stringify on objects may lead to inconsistent results due to non-deterministic property ordering, which can affect the md5 hash comparison.

To ensure consistent hashing, consider using a library that performs deterministic serialization, such as json-stable-stringify.

Apply this diff to change the implementation:

-import md5 from "md5";
+import md5 from "md5";
+import stringify from "json-stable-stringify";

...

-propertyValues[page.id][property] = md5(JSON.stringify(page.properties[property]));
+propertyValues[page.id][property] = md5(stringify(page.properties[property]));

...

-const currentProperty = md5(JSON.stringify(page.properties[property]));
+const currentProperty = md5(stringify(page.properties[property]));

Don't forget to add json-stable-stringify to your dependencies.

Run the following script to check if json-stable-stringify is listed in package.json:

#!/bin/bash
# Description: Check if 'json-stable-stringify' is listed under dependencies in package.json

grep '"json-stable-stringify"' package.json

Also applies to: 115-115


132-132: Verify that all necessary events are emitted

Ensure that the emitEvent method is called for all pages that meet the criteria, and that no pages are unintentionally skipped due to logic errors.

Run the following script to check where emitEvent is called:

Verification successful

Verification confirms proper event emission

After a thorough analysis of the codebase, particularly the components/notion/sources/updated-page/updated-page.mjs file, I can confirm that the emitEvent method is correctly implemented and called in appropriate places. The logic in the run method appears to cover all necessary cases:

  • It emits events for updated pages
  • It can optionally include new pages based on the includeNewPages prop
  • It uses timestamps and property change detection to avoid unnecessary emissions

The implementation looks thorough and doesn't seem to skip any necessary events. The concern raised in the original review comment appears to be addressed by the current implementation.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all calls to 'emitEvent' in the codebase

rg --type js --type mjs 'this\.emitEvent\('

Length of output: 45


Script:

#!/bin/bash
# Description: Search for variations of 'emit' method calls

# Search for 'emit' with various prefixes
echo "Searching for 'emit' method calls:"
rg --type js --type mjs '\b\w+\.emit\w*\('

# Search for 'emitEvent' without 'this' prefix
echo -e "\nSearching for 'emitEvent' without 'this' prefix:"
rg --type js --type mjs '\bemitEvent\('

# Search for any method containing 'emit' in its name
echo -e "\nSearching for methods containing 'emit':"
rg --type js --type mjs '\b\w*emit\w*\('

Length of output: 526


Script:

#!/bin/bash
# Description: Search for emit-related functions and examine the specific file

# Search for 'emit' related functions in .js files
echo "Searching for 'emit' related functions in .js files:"
rg --type js '\b\w*emit\w*\('

# Display the content of the specific file
echo -e "\nContent of the specific file:"
cat components/notion/sources/updated-page/updated-page.mjs

# Search for 'emit' related functions in the specific file
echo -e "\nSearching for 'emit' related functions in the specific file:"
grep -E '\b\w*emit\w*\(' components/notion/sources/updated-page/updated-page.mjs

Length of output: 6709

this.$emit(page, meta);
this.emitEvent(page);

if (Date.parse(page?.last_edited_time) < lastCheckedTimestamp) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add back the HTTP request param to filter out pages where last_edited_time < lastCheckedTimestamp:

{
  filter: {
    timestamp: "last_edited_time",
    last_edited_time: {
      on_or_after: new Date(lastCheckedTimestamp).toISOString()
    }
  }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewjschuang Ah, I didn't notice there's an on-or-after filter in addition to after. I added the filter back in, but I'm still hesitant to be using last_edited_time (either for filtering or for exitng the loop) since that was the reason the source was skipping events to begin with. The last_edited_time is not necessarily updated when a page is updated. I guess in this case it's a choice between possibly skipping events and having to loop through a large amount of pages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, based on my tests the Notion API was reliably updating the last_edited_time. The issues the user was seeing were most likely due to performance issues due to a huge number of pages in the database. I tested with a fairly large database, and with the improvements from this PR, I did not have any issues

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7229087 and c0657fa.

Files selected for processing (1)
  • components/notion/sources/updated-page/updated-page.mjs (5 hunks)
Additional comments not posted (6)
components/notion/sources/updated-page/updated-page.mjs (6)

5-5: LGTM: Good addition of md5 for hashing

The addition of the md5 library is a good choice for hashing property values. This will improve the efficiency and reliability of property comparisons.


12-12: LGTM: Version update

The version update from "0.0.16" to "0.0.17" is appropriate given the changes made to the component.


46-62: LGTM: Improved deploy method logic

The changes in the deploy method enhance the handling of page updates and timestamp management. The use of md5 for hashing property values is a good practice for efficient comparison. Limiting event emission to the first 25 pages is likely a performance optimization.


104-109: LGTM: Efficient filtering of pages

The addition of the filter to fetch only pages edited on or after the last checked timestamp is a good optimization. It reduces the number of pages that need to be processed, potentially improving performance.


123-123: LGTM: Consistent use of md5 for property comparison

The use of md5 for hashing property values in the comparison logic is consistent with earlier changes and improves the efficiency and reliability of property comparisons.


140-140: LGTM: Consistent use of emitEvent method

The use of the newly added emitEvent method for event emission is consistent and improves code organization by encapsulating the event emission logic.

andrewjschuang
andrewjschuang previously approved these changes Sep 23, 2024
Copy link
Collaborator

@andrewjschuang andrewjschuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvements!

@andrewjschuang
Copy link
Collaborator

Summarizing our conversation at Slack:

  • Issues were caused because the Notion last_updated_time timestamp is truncated to the minute, e.g. 2024-09-23T18:12:00.000Z -- any other updates made within the same minute wouldn't be caught by Pipedream, since we dedupe on the timestamp.
  • Previously, the HTTP request params included an after filter. But with the issue above, if an update was made in the same minute, the page wouldn't be returned because the filter is not inclusive.

Solution:

  • not dedupe by timestamp
  • use on_or_after filter to fetch pages that were updated on the same minute or after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

2 participants