Skip to content

Conversation

@GuittenyMartin
Copy link
Collaborator

Clean transcript output to hide weird output from whisperX.
"Vap'n'Roll Thierry" is sometimes randomly predicted for no reason.

Clean transcript output to hide weird output from whisperX
"Vap'n'Roll Thierry" is sometimes randomly predicted for no reason.
@GuittenyMartin GuittenyMartin marked this pull request as ready for review October 22, 2025 14:13
Comment on lines 196 to 198
formatted_output = formatted_output.replace(
"Vap'n'Roll Thierry", "[texte impossible à transcrire]"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Could you try making it extensible, following the Open–Closed Principle?
I’d suggest adding a settings parameter that takes a list of tuples, so we can easily configure any desired replacements.

Also, I’m curious, could you explain why you’re performing the replacement on formatted_output instead of the original text?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the replacement in formatting because cleaning can be considered a form of formatting but i can change it and make a cleaning function before the formatting.

@GuittenyMartin GuittenyMartin force-pushed the clean-transcript branch 2 times, most recently from 27e2d80 to e3449aa Compare October 23, 2025 13:35
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants