Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: save and check a config lock for everyvoice preprocess #440

Merged
merged 4 commits into from
Jun 6, 2024

Conversation

joanise
Copy link
Member

@joanise joanise commented May 27, 2024

PR Goal?

Add a config lock, and check it

This is a draft PR to seek early feedback. Still to do: add unit testing

Fixes?

Fixes #210

Feedback sought?

Sanity check in this current draft state. Does the config lock contain the right subset of the configuration?

Try the feature out, and have a look at the error messages I added.

At the moment, I check and write the lock when preprocessing starts. Here's a scenario this would fail to catch:

  1. Preprocess
  2. Change the config
  3. preprocess --overwrite but Kill the process partway through, before all previously preprocessed files are overwritten.
  4. preprocess again, without --overwrite: the remaining files preprocessed in 1. but not overwritten in 3. are still around, but we won't catch that we need to redo them

Question: is that caveat acceptable, or should I change the lock mechanism to write the config lock only at the end? Or maybe to update an existing config lock only at the end?

Priority?

normal

Tests added?

none yet, but I plan to add them.

How to test?

Run everyvoice preprocess on some config, change the config for text, audio or source_data, then run again and see it fail asking you to add --overwrite.

Confidence?

medium low

Version change?

no

Related PRs?

none

@joanise joanise requested a review from roedoejet May 27, 2024 21:25
@joanise joanise changed the title feat: save and check a config lock for everyvoice preprocess [draft] feat: save and check a config lock for everyvoice preprocess May 27, 2024
Copy link
Contributor

github-actions bot commented May 27, 2024

CLI load time: 0:00.24
Pull Request HEAD: 9d16ea49c740f31b2d8605f8e47c05d37b5e6f8d
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

Copy link

codecov bot commented May 27, 2024

Codecov Report

Attention: Patch coverage is 94.80519% with 4 lines in your changes missing coverage. Please review.

Project coverage is 73.67%. Comparing base (864fd08) to head (b0dd167).

Current head b0dd167 differs from pull request most recent head 9d16ea4

Please upload reports for the commit 9d16ea4 to get more accurate results.

Files Patch % Lines
everyvoice/preprocessor/preprocessor.py 97.22% 1 Missing and 1 partial ⚠️
everyvoice/text/text_processor.py 60.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #440      +/-   ##
==========================================
+ Coverage   73.08%   73.67%   +0.59%     
==========================================
  Files          44       44              
  Lines        2653     2724      +71     
  Branches      406      422      +16     
==========================================
+ Hits         1939     2007      +68     
  Misses        633      633              
- Partials       81       84       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@SamuelLarkin SamuelLarkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Eric mentioned, to play it safe, may be we should only write the lock once all files have been processed. Otherwise, to be sure that all audio are processed the same, we should reprocess them.
What about writing the preprocessed files under a hierarchy where the root directory's name is some sort of audio preprocessing signature. If the user changes the config's preprocessing then EV would expect the audio under a different branch of the hierarchy. EV could detect and warn user about the fact the config has changed because it wouldn't be able to find the new branch and would ask the user to redo preprocessing.

@roedoejet
Copy link
Member

Question: is that caveat acceptable, or should I change the lock mechanism to write the config lock only at the end? Or maybe to update an existing config lock only at the end?

What about having a "preprocessing_completed" boolean in the lock that writes when the preprocessing finishes. When preprocessing is started, you could write the config as you do, but set "preprocessing_completed" to false until it finishes and is set to true. That way you could check that boolean first and force the user to use the overwrite flag if it is false.

@roedoejet
Copy link
Member

What about writing the preprocessed files under a hierarchy where the root directory's name is some sort of audio preprocessing signature. If the user changes the config's preprocessing then EV would expect the audio under a different branch of the hierarchy. EV could detect and warn user about the fact the config has changed because it wouldn't be able to find the new branch and would ask the user to redo preprocessing.

This would work, but sounds a bit complicated. I would prefer the preprocessing_completed I mention above

Copy link
Member

@roedoejet roedoejet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good so far @joanise ! I do think you have the right things in the lock.

@joanise joanise changed the title [draft] feat: save and check a config lock for everyvoice preprocess feat: save and check a config lock for everyvoice preprocess Jun 3, 2024
@joanise
Copy link
Member Author

joanise commented Jun 3, 2024

Recommended changes done, and unit testing implemented. This is ready to re-review.

Copy link
Collaborator

@SamuelLarkin SamuelLarkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 951 to 967
with open(self.save_dir / ".config-lock", "w", encoding="utf8") as f:
json.dump(
self.get_config_lock(in_progress), f, indent=2, ensure_ascii=False
)
f.write("\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the .config-lock file read-only once written to discourage user to change that file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

with patch_logger(
everyvoice.preprocessor.preprocessor
) as logger:
with self.assertLogs(logger) as logs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quintuple with depth, this is serious ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol

@joanise joanise merged commit 4ff2c4d into main Jun 6, 2024
2 checks passed
@joanise joanise deleted the dev.ej/210-config-lock branch June 6, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add certification seal of approval for preprocessing
3 participants