Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to translate submissions #4134

Merged
merged 11 commits into from
Nov 4, 2024
Merged

Conversation

wes-otf
Copy link
Contributor

@wes-otf wes-otf commented Sep 13, 2024

Pertaining to discussions had in #4037. This integrates the argostranslate library to allow for translation of applications. The current functionality allows staff users to translate the written answers of an application.

This feature utilizes htmx to translate the application without requiring a reload of the page, while also updating the URL so a user can have a sharable link (using the query parameters fl [from language] & tl [to language]) when collaborating with others on the application.

This feature is disabled by default & can be enabled via the SUBMISSION_TRANSLATIONS_ENABLED setting

Demos

Translating an application

Screen.Recording.2024-09-13.at.14.33.16.mp4

Clearing a translation via the form

Screen.Recording.2024-09-13.at.14.52.45.mp4

Test Steps

@wes-otf wes-otf added Type: Feature This is something new (not an enhancement of an existing thing). Type: Minor Minor change, used in release drafter labels Sep 13, 2024
@wes-otf
Copy link
Contributor Author

wes-otf commented Sep 13, 2024

Lots of work needed before this is ready, like:

  • Better code documentation
  • Cleaning/organizing initial PoC code (ie. the JS form validation/choice population in translate_application_form.html)
  • Server side form validation for translate_application_form.html
  • Lots & lots of unit tests
  • Management command (or migration?) to install new language packages
  • Perfecting formatting on language form
  • Better logic for parsing & replacing translations in HTML

So far happy with the initial pass though! OTF is on board already as this is something they really hate having to keep using an external AI type service for.

@wes-otf
Copy link
Contributor Author

wes-otf commented Sep 13, 2024

if anyone wants to play around with this, to install language packages you can use the following script (will install the Arabic -> English package but from_code & to_code can be changed to whatever)

import argostranslate.package
import argostranslate.translate

from_code = "ar"
to_code = "en"

# Download and install Argos Translate package
argostranslate.package.update_package_index()
available_packages = argostranslate.package.get_available_packages()
package_to_install = next(
    filter(
        lambda x: x.from_code == from_code and x.to_code == to_code, available_packages
    )
)
argostranslate.package.install_from_path(package_to_install.download())

@frjo
Copy link
Contributor

frjo commented Sep 14, 2024

I think the UI looks really nice. Neat to make the to/from lang a query string so it can be shared.

How would downloading the language package work on production?

@wes-otf
Copy link
Contributor Author

wes-otf commented Sep 16, 2024

I was thinking via a management command that way the instance could add languages as needed. Still thinking through what that syntax would look like but maybe something like:

python3 manage.py install_language from_to ...

ie. to install the arabic -> english & french -> english package it'd look like:

python3 manage.py install_language ar_en fr_en

@wes-otf
Copy link
Contributor Author

wes-otf commented Sep 18, 2024

The previously mentioned management commands were added - had a long layover today so went a little overboard w/ verbosities and messages (ie. packages{'s were' if len(existing_packages) > 1 else ' was'}) bull overall seems to work pretty well!

Installing English -> Arabic & English -> Spanish:

python3 manage.py install_languages ar_en fr_en

Uninstalling said languages:

python3 manage.py uninstall_languages ar_en fr_en

@wes-otf wes-otf force-pushed the feature/integrate-translations branch 3 times, most recently from c585d71 to 43d72a3 Compare October 15, 2024 20:04
@wes-otf wes-otf force-pushed the feature/integrate-translations branch 2 times, most recently from 835bed6 to fc7608a Compare October 18, 2024 22:03
@wes-otf wes-otf force-pushed the feature/integrate-translations branch from fc7608a to dba2b8b Compare October 30, 2024 21:18
@wes-otf wes-otf changed the title WIP: Add the ability to translate submissions Add the ability to translate submissions Oct 30, 2024
@wes-otf wes-otf marked this pull request as ready for review October 30, 2024 21:19
@wes-otf
Copy link
Contributor Author

wes-otf commented Oct 30, 2024

I think this is ready for review! Added tons of unit tests - might've gone a little overboard with mocking but really wanted logic testing to be isolated. Let me know what y'all think!

@wes-otf wes-otf force-pushed the feature/integrate-translations branch from dba2b8b to 25739aa Compare October 30, 2024 21:23
@frjo
Copy link
Contributor

frjo commented Nov 4, 2024

Works well in my testing and I like the UI.

Two observations:

  1. It is quite CPU intensive, I run it on an M2 and it is not fast.
  2. I get the warning below. Anything we need to worry about?
/Users/frjo/Sites/hypha/.venv/lib/python3.12/site-packages/stanza/models/tokenize/trainer.py:85: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(filename, lambda storage, loc: storage)

@frjo frjo merged commit 48dc03c into main Nov 4, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature This is something new (not an enhancement of an existing thing). Type: Minor Minor change, used in release drafter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants