Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input names are silently modified, leading to inputs mismatch during inference #1153

Open
Pierre-Bartet opened this issue Jan 13, 2025 · 5 comments

Comments

@Pierre-Bartet
Copy link
Contributor

Pierre-Bartet commented Jan 13, 2025

When converting models using convert_sklearn, initial_inputs names are silently modified to comply
with the onnx specification which asks for identifiers to follow the C90 identifier specification.
As a result, the input names will not match when trying to perform inference even when using exactly the same input data.

Furthermore, this behaviour is quite hidden, and if we look at the following basic tutorial, the column home.dest is conveniently dropped.
Trying to use it results in a renaming into home_dest, which makes the onnx models fail at inference because the input data doesn't match anymore.

This is a really treacherous behavior, and in fact it is probably not even necessary as onnx doesn't even enforce their own specification.
It is also the kind of behaviour users might be tempted to overcome by being "smart", but in fact most smart solutions are wrong, especially when taking into account the possibility that a renamed identifier might collide with an existing one.

By the way looking at the code (and testing it) the line that turn input names into C-style identifiers is not correct:
re.sub("[^\\w+]", "_", seed) let some special characters such as "+" unchanged.

@xadupre
Copy link
Collaborator

xadupre commented Jan 13, 2025

Changing the default behaviour would probably hurt some existing code but we could add an option to change it? Would that work for you or feel free to suggest anything you would like to see implemented.

@Pierre-Bartet
Copy link
Contributor Author

I think it depends on what the Onnx team plans to do. If they just drop the requirement from the specification then it can safely be drop everywhere without breaking anything.

In the meantime an option to just leave the names untouched would indeed solve my case!

@Pierre-Bartet
Copy link
Contributor Author

The onnx specification has just been updated to drop the C90 identifier requirement: onnx/onnx#6652.

@xadupre
Copy link
Collaborator

xadupre commented Jan 24, 2025

I was not working on onnx when this statement was made, maybe it was for possible future developments. In practice, onnx, onnxruntime do not use this constraint.

@Pierre-Bartet
Copy link
Contributor Author

Yes they never actually enforced it, and now it has even been removed from the specification (from MUST to SHOULD).
Does it mean sklearn-onnx can just drop the corresponding check now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants