[Bug] RangeError: offset is out of bounds for text2text generation #370

do-me · 2023-10-26T06:21:34Z

I am using Xenova/LaMini-Flan-T5-783M for a simple RAG pipeline in SemanticFinder and it works like a charm on the first call. Whenever I want to run a second query, i.e. running the same model again I get the following error "out of bounds/range".

For reproduction, simply click Find in SemanticFinder, then Chat, wait for the output to appear and then a second time Chat.

Chrome:

Firefox:

I'm using transformers.js 2.6.2.

The worker.js here is pretty standard:

let tokenizer;

let chatModel = 'Xenova/LaMini-Flan-T5-783M';

async function token_to_text(beams){
    let chatTokenizer = await AutoTokenizer.from_pretrained(chatModel);
    let decoded_text =  chatTokenizer.decode(beams[0].output_token_ids, {
        skip_special_tokens: true
    });
    console.log(decoded_text);
    return decoded_text
}

// other code 

   case 'chat':
            text = message.text.trim()
            let max_new_tokens = message.max_new_tokens
            console.log(max_new_tokens, chatModel, text)

            let chatGenerator = await pipeline('text2text-generation', chatModel,
                {
                    progress_callback: data => {
                        self.postMessage({
                            type: 'chat_download',
                            data
                        });
                    }
                });

            let thisChat = await chatGenerator(text, {
                max_new_tokens: max_new_tokens,
                return_prompt: false,
                callback_function: async function (beams) {
                    //console.log(beams);
                    const decodedText = token_to_text(beams)
                    console.log(decodedText);
                }
            });

            self.postMessage({
                type: 'chat',
                chat: thisChat
            });

            break;

I was hesitant to create an issue here because I thought it was related to my code but it's weird that it's actually creating the first token of the answer and then fails. Also, I tested with e.g. Xenova/t5-small for text2text and didn't encounter the problem.

Is it possible that there is some kind of memory issue here?

The text was updated successfully, but these errors were encountered:

felladrin · 2023-10-26T08:53:11Z

Hi! It seems to be the same case from #8 (comment); the solution, as Xenova mentions, is to call await pipeline() only once during the execution of the app.

And you probably didn't encounter the problem with Xenova/t5-small because it's small, and you haven't instantiated it enough times to fill up the memory.

PS: SemanticFinder looks great! 💯

do-me · 2023-10-26T09:18:50Z

Argh, that totally makes sense, thank you! I should have better searched through the closed issues too 🤦
So I had the same issue all along in the summary function too, only that the model is small enough to be instantiated several times...
Thanks for the nice words! :)

do-me added the bug Something isn't working label Oct 26, 2023

do-me closed this as completed Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] RangeError: offset is out of bounds for text2text generation #370

[Bug] RangeError: offset is out of bounds for text2text generation #370

do-me commented Oct 26, 2023

felladrin commented Oct 26, 2023 •

edited

Loading

do-me commented Oct 26, 2023

[Bug] RangeError: offset is out of bounds for text2text generation #370

[Bug] RangeError: offset is out of bounds for text2text generation #370

Comments

do-me commented Oct 26, 2023

felladrin commented Oct 26, 2023 • edited Loading

do-me commented Oct 26, 2023

felladrin commented Oct 26, 2023 •

edited

Loading