Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set langPath to a blob url #965

Open
TheWorldEndsWithUs opened this issue Oct 13, 2024 · 3 comments
Open

Unable to set langPath to a blob url #965

TheWorldEndsWithUs opened this issue Oct 13, 2024 · 3 comments

Comments

@TheWorldEndsWithUs
Copy link

Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)

Describe the bug
When running tesseract js in the browser, I'd like to pass the language data via a blob URL because of the restrictions of the environment the code will be running on. However, when I pass the URL to langPath it fails to load the file.

To Reproduce
Steps to reproduce the behavior:

  1. Create a worker and set it's langPath property to a blob url.

Please attach any input image required to replicate this behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Device Version:
Chrome Browser

Additional context
Add any other context about the problem here.

@Balearica
Copy link
Member

Balearica commented Oct 13, 2024

The argument langPath is set to a directory (either local or a CDN) that Tesseract.js should use to automatically download the correct language data from. Blobs are individual files, so it would not make sense for langPath to accept blobs.

If you do not want Tesseract to automatically download the correct data from a directory, but rather want to manually write language data to the worker, follow the instructions provided in #794.

Edit: It looks like this question was answered in #794, however that was for an older version, and the answer may no longer be applicable. Would need to think about whether this is possible with the current interface.

@Balearica Balearica reopened this Oct 13, 2024
@TheWorldEndsWithUs
Copy link
Author

I wouldn't mind using an older version as long as it supports word-level OCR and it is mostly stable. If it is possible with the newest version I would prefer that, but beggers can't be choosers. Thanks for your help, I've tried doing a bunch of experiments trying to hot replace the code in the minimized file with a blob link to download it locally, but it didn't work.

@Balearica
Copy link
Member

The solution linked in #794 works with v4, however no longer works due to the consolidation of the createWorker, worker.initialize and worker.loadLanguage functions that occurred in v5. It should not be hard to add a feature to the current version that supports doing something similar, however this will require an update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants