Add `padding_side` and `pad_token_id` in `OrtBackend` #705

alvarobartt · 2025-08-21T12:11:34Z

What does this PR do?

This PR adds the padding_side from the tokenizer_config.json if applicable, otherwise it defaults to padding_side: "right" to handle the scenarios where the padding_side is other than "right", as e.g. https://huggingface.co/onnx-community/Qwen3-Embedding-0.6B-ONNX to ensure parity with the inputs and the outputs. And also reads the pad_token_id from the config.json, instead of setting it to 0 by default, which means the pad_token_id is read, if not there it falls back to eos_token_id, and finally to 0 if none are defined.

This PR also updates the input preparation and pooling strategies accordingly, so that those are applied one way or another based on the padding side, given that the pooling should be padding-agnostic, but with the padding side information we can efficiently apply the pooling strategies instead.

Additionally, this PR fixes the last-token pooling for the OrtBackend which was leading to issues (unrelated to the padding_side) as well as using the correct pad_token_id as reported in e.g. #704

As some other, smaller but still relevant changes, this PR:

Adds OrtBackend::prepare_inputs to prepare the ndarrays for the input_ids, attention_mask, etc. within a function to be reused for both OrtBackend::embed and OrtBackend::predict to prevent from duplicating the code
Adds OrtBackend::prepare_ort_inputs to go from ndarrays to ort::inputs!, and the reason is the same as per the function above.
Adds ModelInputs to capture all the inputs within the same struct so that it can be easily managed
Adds Config to read from config.json, required for both the pad_token_id and also for the past_key_values required configuration values

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil

- The shared functionality for the input preparation has been detached from both the `OrtBackend::embed` and `OrtBackend::predict`, into separate functions: - `prepare_inputs` to prepare the inputs based on what the ONNX model expects i.e., input_ids, attention_mask, etc. - `prepare_ort_inputs` to go from those inputs to `ort::inputs!` - Since the input processing in both `OrtBackend::embed` and `OrtBackend::predict` was default to right-padding, and both the pooling and post-processing in `OrtBackend::embed` too, the `PaddingSide` is now handled to ensure the proper methods are used taking into consideration the `padding_side`

backends/ort/src/lib.rs

Narsil

Looks alright, but I think we can simplify further.

backends/ort/src/lib.rs

Narsil · 2025-08-22T08:56:33Z

backends/ort/src/lib.rs

+                Pool::Cls => match self.padding_side {
+                    PaddingSide::Left => {
+                        if masking {
+                            let mut cls_embeddings = Vec::new();
+                            for (batch_idx, &seq_length) in
+                                model_inputs.input_lengths.iter().enumerate()
+                            {
+                                let padding = max_length as f32 - seq_length;
+                                let cls_pos = padding as usize;
+                                cls_embeddings
+                                    .push(outputs.slice(s![batch_idx, cls_pos, ..]).to_owned());
+                            }
+                            ndarray::stack(
+                                Axis(0),
+                                &cls_embeddings.iter().map(|x| x.view()).collect::<Vec<_>>(),
+                            )
+                            .unwrap()
+                            .into_dyn()
+                        } else {
+                            outputs.slice(s![.., 0, ..]).into_owned().into_dyn()
+                        }
+                    }
+                    PaddingSide::Right => outputs.slice(s![.., 0, ..]).into_owned().into_dyn(),
+                },
+                Pool::LastToken => match self.padding_side {
+                    // NOTE: when using left-padding, the last-token is always in the last position


Feels like there's a lot of switching on padding side. I haven't carefully looked at each line, but it seems to me that the code could be made significantly simpler by simply allocating things and using a different offset of insertion (overwriting) using the padding side.

Something like

let offset = if padding_side == Side::Left {0} else {max_length - length}; for (i, item) in elements.iter().enumerate(){ input_ids[i + offset] = item; }

Hmm yes it could be the case for some, the issue is that given the padding_side, we can apply the pooling in a most performant approach as in e.g. last-token pooling when left-padding it's literally the last token in the sequence, but when padding is right we need to iterate over each sequence to identify where it ends and then capture what's the last-token accordingly; this is why I kept one implementation per padding_side, but happy to unify those into a single match even if either right or left might be slightly less performance as in requiring more ops to obtain, which I'm also fine with, as that'd simplify the code a bit.

Fair enough.

Narsil

LGTM

alvarobartt added 8 commits August 20, 2025 13:39

Download tokenizer_config.json to read padding_side

f80047e

Add padding_side field in OrtBackend

c735ed1

Add pad_token_id with default value 0

3aac33d

Add ModelInputs for OrtBackend::prepare_inputs return type

52567e8

Add past_key_values in ModelInputs

0ad535a

Add Config instead of PastKeyValuesConfig & handle pad_token_id

63f5091

Update padding_side warning & add note for Pool::LastToken

ee1457d

alvarobartt changed the title ~~Add padding_side handling in OrtBackend~~ Add padding_side and pad_token_id in OrtBackend Aug 21, 2025

kozistr reviewed Aug 21, 2025

View reviewed changes

backends/ort/src/lib.rs Outdated Show resolved Hide resolved

alvarobartt marked this pull request as ready for review August 21, 2025 15:01

Use serde_json to parse PaddingSide instead

24dd9e4

alvarobartt requested a review from Narsil August 22, 2025 07:51

Narsil reviewed Aug 22, 2025

View reviewed changes

alvarobartt added 3 commits August 22, 2025 16:37

Add ConfigValidator to validate Config beforehand

7b4335e

Add TokenizerConfig with default right-padding

6795e1e

Revert bloat around Config and keep it simpler

983cb26

alvarobartt linked an issue Aug 26, 2025 that may be closed by this pull request

Qwen/Qwen3-Embedding-0.6B output error #704

Closed

4 tasks

Narsil approved these changes Sep 1, 2025

View reviewed changes

alvarobartt merged commit 92402e5 into main Sep 1, 2025
14 checks passed

alvarobartt deleted the handle-padding-side-onnx branch September 1, 2025 13:08

alvarobartt added a commit that referenced this pull request Sep 4, 2025

Add padding_side and pad_token_id in OrtBackend (#705)

35c0f77

BrewTestBot mentioned this pull request Sep 4, 2025

text-embeddings-inference 1.8.1 Homebrew/homebrew-core#236184

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `padding_side` and `pad_token_id` in `OrtBackend` #705

Add `padding_side` and `pad_token_id` in `OrtBackend` #705

Uh oh!

alvarobartt commented Aug 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Narsil Aug 22, 2025

Uh oh!

alvarobartt Aug 22, 2025

Uh oh!

Narsil Sep 1, 2025

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add padding_side and pad_token_id in OrtBackend #705

Add padding_side and pad_token_id in OrtBackend #705

Uh oh!

Conversation

alvarobartt commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Narsil Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

alvarobartt Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Narsil Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add `padding_side` and `pad_token_id` in `OrtBackend` #705

Add `padding_side` and `pad_token_id` in `OrtBackend` #705

alvarobartt commented Aug 21, 2025 •

edited

Loading