Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create script to pull all Paratext projects for testing #432

Merged
merged 2 commits into from
Jul 16, 2024

Conversation

Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Jul 15, 2024

Addresses #429


This change is Reviewable

@codecov-commenter
Copy link

codecov-commenter commented Jul 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.73%. Comparing base (5702123) to head (d8c5518).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #432   +/-   ##
=======================================
  Coverage   61.73%   61.73%           
=======================================
  Files         232      232           
  Lines       11825    11825           
  Branches     1510     1510           
=======================================
  Hits         7300     7300           
  Misses       3998     3998           
  Partials      527      527           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@johnml1135
Copy link
Collaborator

scripts/pull_all_usfm.py line 40 at r1 (raw file):

        try:
            file_data = client.data_files_download(file.id)
            with open(output_dir / file.name, "wb") as f:

files may have the same name, but you don't know if they are the same file or not. There probably is a magical way for accounting for this, such as, if the files are the same size, just ignore the new file, but if they have a different size, create a new file with a different name, etc.

@johnml1135
Copy link
Collaborator

scripts/pull_all_usfm.py line 40 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

files may have the same name, but you don't know if they are the same file or not. There probably is a magical way for accounting for this, such as, if the files are the same size, just ignore the new file, but if they have a different size, create a new file with a different name, etc.

I am afraid of 500 files being named Paratext.zip ...

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @johnml1135)


scripts/pull_all_usfm.py line 40 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

I am afraid of 500 files being named Paratext.zip ...

That's a good catch. I think the SF procedure uses unique ids, so it shouldn't matter, but better to generalize it. Done: I did <name>_<id> in order to preserve alphabetic ordering and readability.

@johnml1135
Copy link
Collaborator

scripts/pull_all_usfm.py line 40 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

That's a good catch. I think the SF procedure uses unique ids, so it shouldn't matter, but better to generalize it. Done: I did <name>_<id> in order to preserve alphabetic ordering and readability.

Will the files still be usable by the machine "parse everything" script?

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93)

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @johnml1135)


scripts/pull_all_usfm.py line 40 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Will the files still be usable by the machine "parse everything" script?

Yep, it doesn't care what they're named.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

@johnml1135 johnml1135 merged commit 05f074e into main Jul 16, 2024
2 checks passed
@johnml1135 johnml1135 deleted the usfm_testing branch July 16, 2024 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants