Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler transform #797

Merged
merged 26 commits into from
Nov 16, 2024
Merged

Crawler transform #797

merged 26 commits into from
Nov 16, 2024

Conversation

touma-I
Copy link
Collaborator

@touma-I touma-I commented Nov 13, 2024

Why are these changes needed?

Implement crawler transforms using the dpi-connector API. This is based on the work done by the data sift but also had to add CLI in order to integrate with python runtime. This implementation uses the new layout for the transform using module name dpk_web2parquet

Related issue number (if any).

#751

@touma-I touma-I requested a review from hmtbr November 13, 2024 02:23
Copy link
Collaborator

@hmtbr hmtbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@touma-I Thank you very much for making this change! This simple implementation looks good to me. I added several comments, but most of them are nitpicking.

Signed-off-by: Maroun Touma <[email protected]>
@touma-I touma-I requested review from hmtbr and daw3rd November 14, 2024 12:40
@touma-I touma-I marked this pull request as ready for review November 14, 2024 12:54
daw3rd
daw3rd previously requested changes Nov 14, 2024
transforms/.make.modules Outdated Show resolved Hide resolved
transforms/.make.modules Outdated Show resolved Hide resolved
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Copy link
Member

@shahrokhDaijavad shahrokhDaijavad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few typos here and there

Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Copy link
Member

@shahrokhDaijavad shahrokhDaijavad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed a couple of things that I see now.

Copy link
Member

@shahrokhDaijavad shahrokhDaijavad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README file looks good to me.

@touma-I touma-I dismissed daw3rd’s stale review November 16, 2024 00:15

Will create a new PR to address additional comments related to processing parquet tables

@touma-I touma-I merged commit 0930a87 into dev Nov 16, 2024
134 checks passed
@touma-I touma-I deleted the crawler-transform branch November 16, 2024 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants