Input names are silently modified, leading to inputs mismatch during inference #1153

Pierre-Bartet · 2025-01-13T15:07:52Z

When converting models using convert_sklearn, initial_inputs names are silently modified to comply
with the onnx specification which asks for identifiers to follow the C90 identifier specification.
As a result, the input names will not match when trying to perform inference even when using exactly the same input data.

Furthermore, this behaviour is quite hidden, and if we look at the following basic tutorial, the column home.dest is conveniently dropped.
Trying to use it results in a renaming into home_dest, which makes the onnx models fail at inference because the input data doesn't match anymore.

This is a really treacherous behavior, and in fact it is probably not even necessary as onnx doesn't even enforce their own specification.
It is also the kind of behaviour users might be tempted to overcome by being "smart", but in fact most smart solutions are wrong, especially when taking into account the possibility that a renamed identifier might collide with an existing one.

By the way looking at the code (and testing it) the line that turn input names into C-style identifiers is not correct:
re.sub("[^\\w+]", "_", seed) let some special characters such as "+" unchanged.

The text was updated successfully, but these errors were encountered:

xadupre · 2025-01-13T16:34:46Z

Changing the default behaviour would probably hurt some existing code but we could add an option to change it? Would that work for you or feel free to suggest anything you would like to see implemented.

Pierre-Bartet · 2025-01-13T17:19:49Z

I think it depends on what the Onnx team plans to do. If they just drop the requirement from the specification then it can safely be drop everywhere without breaking anything.

In the meantime an option to just leave the names untouched would indeed solve my case!

Pierre-Bartet · 2025-01-24T07:49:48Z

The onnx specification has just been updated to drop the C90 identifier requirement: onnx/onnx#6652.

xadupre · 2025-01-24T09:36:09Z

I was not working on onnx when this statement was made, maybe it was for possible future developments. In practice, onnx, onnxruntime do not use this constraint.

Pierre-Bartet · 2025-01-24T09:41:52Z

Yes they never actually enforced it, and now it has even been removed from the specification (from MUST to SHOULD).
Does it mean sklearn-onnx can just drop the corresponding check now?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input names are silently modified, leading to inputs mismatch during inference #1153

Input names are silently modified, leading to inputs mismatch during inference #1153

Pierre-Bartet commented Jan 13, 2025 •

edited

Loading

xadupre commented Jan 13, 2025

Pierre-Bartet commented Jan 13, 2025

Pierre-Bartet commented Jan 24, 2025

xadupre commented Jan 24, 2025

Pierre-Bartet commented Jan 24, 2025

Input names are silently modified, leading to inputs mismatch during inference #1153

Input names are silently modified, leading to inputs mismatch during inference #1153

Comments

Pierre-Bartet commented Jan 13, 2025 • edited Loading

xadupre commented Jan 13, 2025

Pierre-Bartet commented Jan 13, 2025

Pierre-Bartet commented Jan 24, 2025

xadupre commented Jan 24, 2025

Pierre-Bartet commented Jan 24, 2025

Pierre-Bartet commented Jan 13, 2025 •

edited

Loading