`getModelJSON` fails with a bundler (see reproduction) #366

maudnals · 2023-10-20T19:58:21Z

Describe the bug
Hi there, I'm trying to use remote models in a basic web app that relies on a bundler (parcel).getModelJSON doesn't seem to load models correctly. I put together a minimal reproduction, see below.
(Note: no issue observed when not using a bundler, and e.g. using CDN transformers instead)
This may be a duplicate of, or related to, #364.
Thanks!

How to reproduce
Bug repro repo: https://github.com/maudnals/bug-repro-transformersjs/tree/main
Steps to reproduce:

Clone this repo and navigate to its root
Run yarn install
Run yarn parcel index.html
Observe the code in script.js. We're simply calling await pipeline('sentiment-analysis')
Open http://localhost:1234 in your browser, or whichever URL parcel gives you
Open DevTools
Observe the error logged in the DevTools console upon model loading:

Uncaught (in promise) SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON
    at JSON.parse (<anonymous>)
    at getModelJSON (hub.js:551:17)
    at async Promise.all (:61216/index 0)
    at async loadTokenizer (tokenizers.js:52:16)
    at async AutoTokenizer.from_pretrained (tokenizers.js:3826:48)
    at async Promise.all (:61216/index 0)
    at async loadItems (pipelines.js:2193:5)
    at async pipeline (pipelines.js:2139:19)
    at async app.js:9:14

Expected behavior
The model should load properly.

Logs/screenshots

Uncaught (in promise) SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON
    at JSON.parse (<anonymous>)
    at getModelJSON (hub.js:551:17)
    at async Promise.all (:61216/index 0)
    at async loadTokenizer (tokenizers.js:52:16)
    at async AutoTokenizer.from_pretrained (tokenizers.js:3826:48)
    at async Promise.all (:61216/index 0)
    at async loadItems (pipelines.js:2193:5)
    at async pipeline (pipelines.js:2139:19)
    at async app.js:9:14

Environment

Transformers.js version: "@xenova/transformers": "^2.6.2"
Browser (if applicable): Chrome 118
Operating system (if applicable): MacOS

Addtional context
When adding a breakpoint in getModelJSON, the model path seem fine (openai/...).
I wonder if the issue lays in path rewriting somewhere alongside the bundling steps?

The text was updated successfully, but these errors were encountered:

xenova · 2023-10-22T09:00:53Z

Hi there 👋 Thanks for making the repro 👍

I believe the problem is the following:

By default, transformers.js (like the python library) checks your local server for the model files.
If the model files are found locally, we use those.
If not, the model files are loaded from the HF hub.
The way we detect whether the files are found locally or not, is by looking at the status code being 404. If this is not the case, the file is assumed to be found locally, and the JSON parsing occurs (in this case, for the config.json, tokenizer.json, or tokenizer_config.json file)
I haven't tested with the bundler you seem to be using ("Parcel" I think it's called?), but I think that it's returning some HTML page for the 404 page (hence the error being unable to parse JSON due to unexpected character).

Could you try the following:

Start the server and visit http://localhost:1234/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english. This should fail and result in a 404 page. Could you open dev tools to see what the error code or status of the page is? It should be 404 too.

Update script.js to include the following:

import { pipeline, env } from '@xenova/transformers';
env.allowLocalModels = false;

This will bypass the local file check and download from the HF hub instead.

pappitti · 2023-11-25T13:26:27Z

Hi there, facing this exact issue at the moment. I thought about disallowing local models and I am glad to see it suggested here but it does not work for me : I keep getting the error.

kungfooman · 2023-11-25T14:26:47Z

I digged a bit into the generated Parcel code and the source map support is kind of broken (making it hard to inspect values and setting breakpoints), but from a bit of testing I figured that this works at least:

import { pipeline, env } from '@xenova/transformers';
env.allowLocalModels = false;
env.useBrowserCache = false;

Entire test script:

script.js

import { pipeline, env } from '@xenova/transformers';
env.allowLocalModels = false;
env.useBrowserCache = false;
console.log('✅ Pipeline function is correctly loaded: \n', pipeline);
console.log('❌ But attempting to create a pipeline fails: ');
// Allocate a pipeline for sentiment-analysis
(async function () {
  const pipe = await pipeline('sentiment-analysis');
  console.log("pipe", pipe);
})();

pappitti · 2023-11-25T14:44:42Z

Thanks, I got to the same conclusion : it only works if you force the download form the remote URL. I could not get to the bottom of it.

I was hopeful that another solution existed because, for the project I am working on, this work-around means that users would download the model over and again

cipherphage · 2023-12-06T14:51:12Z

I digged a bit into the generated Parcel code and the source map support is kind of broken (making it hard to inspect values and setting breakpoints), but from a bit of testing I figured that this works at least:

import { pipeline, env } from '@xenova/transformers';
env.allowLocalModels = false;
env.useBrowserCache = false;

Entire test script:

script.js

import { pipeline, env } from '@xenova/transformers';
env.allowLocalModels = false;
env.useBrowserCache = false;
console.log('✅ Pipeline function is correctly loaded: \n', pipeline);
console.log('❌ But attempting to create a pipeline fails: ');
// Allocate a pipeline for sentiment-analysis
(async function () {
  const pipe = await pipeline('sentiment-analysis');
  console.log("pipe", pipe);
})();

This really helped me! I know it's only tangentially related to the original post, but I ran into the same JSON parsing error when trying to build the client-side example from Hugging Face's Next.JS tutorial (https://huggingface.co/docs/transformers.js/tutorials/next#client-side-inference) and adding the line env.useBrowserCache = false; to my worker is what fixed it for me. After that, I had no issues using the library.

pappitti · 2023-12-06T17:42:14Z

I don’t think it is only tangentially related, I think it is the same bug everywhere : before downloading the tokenizer from HF, transformers.js checks if it exists locally or in browser cache by following a specific path for each option. If the
path does not exist, a 404 error is thrown. I think that the option (local or cache) is supposed to be ignored in case of 404 error but it isn’t. At one point the problematic path is just replaced by the root of your app (typical fall back) and this is why an HTML page is parsed instead of a json.
What I need to figure out now is when (at which point) the root is swapped in

cipherphage · 2023-12-07T17:56:13Z

@pappitti I see what you mean. Here's more context and details I found for my particular case:

Attempting to run in development mode the client-side example provided in Hugging Face's Nextjs tutorial:

getModelJSON parsing error is thrown on initial load of the web worker.
This is specifically on Brave Version 1.61.101 Chromium: 120.0.6099.71 (Official Build) (x86_64).
This does NOT happen on Chrome Version 120.0.6099.62 (Official Build) (x86_64); it runs without errors there (my apologies for not trying Chrome previously).
Here is the full screenshot of the error, including the warnings that lead up to it:

wesbos · 2024-01-31T14:27:40Z

Also hitting this, but it seems V3 will fix it.

Work around for now:

env.allowLocalModels = false;
env.useBrowserCache = false;

seems to be fixing it

azer · 2024-02-27T13:00:06Z

Also hit this issue and spent some time debugging. Several things contributing to this issue:

Making request to local url instead of remoteURL
Caching bad response with local key
In the next requests, looking up cache by both local and remote urls
Not invalidating the cache after receiving bad response

m4a1carbin4 · 2024-03-08T02:58:37Z

I Also hitting this, and In my debugging

I think cache.match(name) Seems to get data from an abnormal url.

this is what cache.match(name) return data and I don't know exactly, but it's confirmed that url is " "

In addition, I Think there is a problem that is likely to be a dev environmental problem in next.js with high confirmation, the build result works normally in the space of huggingface.

maudnals · 2024-05-06T13:28:01Z

Quick follow-up for the record, as it seems others have been encountering the issue:

Can confirm that env.allowLocalModels = false; with env.useBrowserCache = false; fixes it for me, at least with vite. It does slow down development since the model has to be downloaded from scratch at every load.
Great to hear this may be fixed in V3!

masylum · 2024-05-27T09:04:02Z

Those using allowLocalModels=false and useBrowserCache=false... are you downloading the model every single time? Is there a way to handle the caching of the model yourself?

tinrab · 2024-07-06T20:26:05Z

@masylum Yes. I'm using it in a React app, and it re-downloads models on each HMR reload. The production build seems okay from what I can tell.

maudnals · 2024-07-09T20:47:33Z

Same

pappitti · 2024-07-10T10:15:10Z

are you all saying that browser caching actually works in production build? I obviously did not think about trying given it crashed all the time in dev.

kungfooman · 2024-07-17T11:11:53Z

are you all saying that it browser caching actually works in production build?

The problem here is the appalling ignorance of build tools like vite. I always disliked them, but many people are still in their traps and pitfalls... in the best possible scenario, there would be no build tool while testing/developing for quickest possible iteration aka scientific exploration. Makes sense? Let that sink in.

It all boils down to that vite really believes it's a good idea to return a HTTP response code of 200 when a file doesn't exist:

https://stackoverflow.com/questions/69701743/how-can-i-configure-vites-dev-server-to-give-404-errors

You can test that yourself in this example: https://github.com/xenova/transformers.js/tree/main/examples/text-to-speech-client

Run npx vite
Go to http://localhost:5173/DoesNotExist
Check Network tab:

Even non-programmers know about the famous 404 when a page doesn't exist. What are they thinking? Only god knows, but what you should know: drop tools that at best slow you down and at worst introduce bugs in good code.

Solution?

Use a sane web server like lighttpd (as simple as apt install lighttpd) or XAMPP on Windows
Use ESM with importmaps
Rewrite JSX into pure JS
Use a bundler only at the very end for building a minified production artifact

Of course it requires some extra work, but in the long run you can thank yourself to not waste time on non-issues and the freedom of not being required to open dozens of terminals and messing around with shell commands just to use a simple HTML file.

Programming should be simple, fast and working, compare that to over-engineered, slow and non-working. Hence this issue exists.

pappitti · 2024-07-17T11:56:54Z

for that one project, it was indeed vite...
Thanks for the solution
(and lol for the rant)

amk-dev · 2024-07-17T21:12:10Z

usually in single page applications, all the 404 requests go to your root index.html, so your spa routes can deal with it. this is how vite behaves when you're running a spa. i'm working with vue, but it should be the case generally with react or any other frontend framework. so when your models are not found it goes to the default index.html, and the status code will be 200 with the HTML that your spa returns.

you can add a vite plugin to overwrite this behaviour for the files concerning transformers.js using the configureServer plugin option from vite. read about the api here

here is a simple config that should work. i havent researched into better ways to do this. i'm just setting up a simple prototype to test out transformers.js.

also remember this is for the dev server, when deploying to production you'll have to translate this concept into whatever setup you use.

import { defineConfig } from "vite";
import vue from "@vitejs/plugin-vue";
import fs from "fs";
import path from "path";

// https://vitejs.dev/config/
export default defineConfig({
  plugins: [
    vue(),
    {
      name: "out-json-404",
      configureServer(server) {
        server.middlewares.use((req, res, next) => {
          // check if the request is for a json file or onnx file
          if (req.url.endsWith(".json") || req.url.endsWith(".onnx")) {
            // check if the file exists
            const filePath = path.join(server.config.root, req.url);

            // if it doesnt exist, return a 404
            if (!fs.existsSync(filePath)) {
              res.statusCode = 404;
              res.end(JSON.stringify({ error: "JSON file not found" }));
              return;
            }
          }

          // continue as normal otherwise
          next();
        });
      },
    },
  ],
});

tinrab · 2024-07-17T21:37:26Z

@amk-dev It would be great if this library allowed providing models manually. I would love the ability to load ONNX files with react-query. The 404 issue could be avoided.

maudnals · 2024-10-18T14:30:03Z

Updated to v3; Now I'm getting the following error:

@parcel/core: Failed to resolve '#onnxruntime-webgpu' from './node_modules/@huggingface/transformers/src/backends/onnx.js'

bug-repro-transformersjs/node_modules/@huggingface/transformers/src/backends/onnx.js:29:27
28 | // @ts-ignore

29 | import * as ONNX_WEB from '#onnxruntime-webgpu';
| ^^^^^^^^^^^^^^^^^^^^^
30 |
31 | export { Tensor } from 'onnxruntime-common';

xenova · 2024-10-20T13:28:25Z

@maudnals This should be handled in

transformers.js/package.json

Lines 24 to 29 in 7ebd50c

    
           "imports": { 
        
             "#onnxruntime-webgpu": { 
        
               "node": "onnxruntime-web", 
        
               "default": "onnxruntime-web/webgpu" 
        
             } 
        
           },

. Is this perhaps a parcel issue?

maudnals · 2024-11-18T14:35:29Z

Most likely a parcel issue! I was able to repro the issue with parcel, but indeed Vite works fine:

I've filed a bug against the parcel repo:
parcel-bundler/parcel#10012

maudnals added the bug Something isn't working label Oct 20, 2023

maudnals mentioned this issue Oct 20, 2023

[Question] Error in getModelJSON with React #364

Closed

xenova mentioned this issue Jan 27, 2024

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Merged

13 tasks

erkkimon mentioned this issue Mar 25, 2024

React translation error #142

Closed

xenova closed this as completed in #545 Oct 18, 2024

maudnals mentioned this issue Nov 18, 2024

Parcel fails to build with Transformers.js v3 parcel-bundler/parcel#10012

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`getModelJSON` fails with a bundler (see reproduction) #366

`getModelJSON` fails with a bundler (see reproduction) #366

maudnals commented Oct 20, 2023 •

edited

Loading

xenova commented Oct 22, 2023

pappitti commented Nov 25, 2023

kungfooman commented Nov 25, 2023

pappitti commented Nov 25, 2023

cipherphage commented Dec 6, 2023

pappitti commented Dec 6, 2023

cipherphage commented Dec 7, 2023

wesbos commented Jan 31, 2024

azer commented Feb 27, 2024

m4a1carbin4 commented Mar 8, 2024

maudnals commented May 6, 2024 •

edited

Loading

masylum commented May 27, 2024

tinrab commented Jul 6, 2024

maudnals commented Jul 9, 2024

pappitti commented Jul 10, 2024 •

edited

Loading

kungfooman commented Jul 17, 2024

pappitti commented Jul 17, 2024

amk-dev commented Jul 17, 2024 •

edited

Loading

tinrab commented Jul 17, 2024

maudnals commented Oct 18, 2024 •

edited

Loading

xenova commented Oct 20, 2024

maudnals commented Nov 18, 2024

getModelJSON fails with a bundler (see reproduction) #366

getModelJSON fails with a bundler (see reproduction) #366

Comments

maudnals commented Oct 20, 2023 • edited Loading

xenova commented Oct 22, 2023

pappitti commented Nov 25, 2023

kungfooman commented Nov 25, 2023

pappitti commented Nov 25, 2023

cipherphage commented Dec 6, 2023

pappitti commented Dec 6, 2023

cipherphage commented Dec 7, 2023

wesbos commented Jan 31, 2024

azer commented Feb 27, 2024

m4a1carbin4 commented Mar 8, 2024

maudnals commented May 6, 2024 • edited Loading

masylum commented May 27, 2024

tinrab commented Jul 6, 2024

maudnals commented Jul 9, 2024

pappitti commented Jul 10, 2024 • edited Loading

kungfooman commented Jul 17, 2024

pappitti commented Jul 17, 2024

amk-dev commented Jul 17, 2024 • edited Loading

tinrab commented Jul 17, 2024

maudnals commented Oct 18, 2024 • edited Loading

xenova commented Oct 20, 2024

maudnals commented Nov 18, 2024

`getModelJSON` fails with a bundler (see reproduction) #366

`getModelJSON` fails with a bundler (see reproduction) #366

maudnals commented Oct 20, 2023 •

edited

Loading

maudnals commented May 6, 2024 •

edited

Loading

pappitti commented Jul 10, 2024 •

edited

Loading

amk-dev commented Jul 17, 2024 •

edited

Loading

maudnals commented Oct 18, 2024 •

edited

Loading