Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v3] NoBadWordsLogitsProcessor breaks when the size of bad_word_ids exceeds the size of the input ids #910

Closed
1 of 5 tasks
tarekziade opened this issue Aug 29, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@tarekziade
Copy link

tarekziade commented Aug 29, 2024

System Info

v3

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

this block breaks for me with mozilla/distilvit where we have a long list of bad words

                for (let i = 1; i <= bad_word_ids.length - 1 && bad_word_ids.length < input_ids[i].length; ++i) {

                    // NOTE: We use != instead of !== to compare bigint and number
                    // @ts-ignore
                    if (bad_word_ids.at(-i - 1) != input_ids[i].at(-i)) {
                        // We have found a mismatch
                        mark = false;
                        break;
                    }
                }

with:

Uncaught (in promise) TypeError: can't access property "length", input_ids[i] is undefined

my input_ids has just one entry when I use the regular pipeline function with dtype=q4 and devide=wasm
The code is a bit confusing because i is reused in both the inner and outer loop.

I fixed it by changing the inner loop with :

// using `j` to avoid confusion with the outer loop
for (let j = 1; j <= bad_word_ids.length - 1; ++j) {
    // Ensure input_ids[i] is defined and has sufficient length before accessing
    if (input_ids[i] && bad_word_ids.length < input_ids[i].length) {
        // NOTE: We use != instead of !== to compare bigint and number
        if (bad_word_ids.at(-j - 1) != input_ids[i].at(-j)) {
            // We have found a mismatch
            mark = false;
            break;
        }
    } else {
        mark = false;
        break;
    }
}

I am not sure if that is the correct fix. Happy to do a PR

Reproduction

.

@tarekziade tarekziade added the bug Something isn't working label Aug 29, 2024
@huggingface huggingface deleted a comment Aug 29, 2024
xenova added a commit that referenced this issue Aug 30, 2024
@xenova
Copy link
Collaborator

xenova commented Aug 30, 2024

Whoops my bad - a silly issue of reusing variables (input_ids[i].at(-i) should have been a dead giveaway). Probably some bad replace all 🤷

Fixed by d21c87c.

Let me know if you face any other issues 👍

@xenova xenova closed this as completed Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants