Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization/deserialization of optimized tract models #1313

Open
cospectrum opened this issue Jan 24, 2024 · 6 comments
Open

Serialization/deserialization of optimized tract models #1313

cospectrum opened this issue Jan 24, 2024 · 6 comments

Comments

@cospectrum
Copy link

Hi, I intend to use tract for inference with AWS Lambda. I've observed that the initialization and optimization of ONNX models (from &[u8]) can be 2-3 times slower than the actual model execution. Perhaps it's a good idea to introduce a method for storing your graph IR as &[u8]?

@kali
Copy link
Collaborator

kali commented Jan 24, 2024

Hey, thanks for your interest.

You should give a try to the NNEF serialization. It's significantly faster to load and optimise than an ONNX model.

@cospectrum
Copy link
Author

Is there a way to load a model, optimize it with tract, and then save it back?

@kali
Copy link
Collaborator

kali commented Jan 24, 2024

The NNEF serialization is a step towards this, as you'll save the "decluttered" model. And decluttering account for the most expensive part of the the loading/declutter/optimize workflow (more than the actual optimisation).

There is no way to dump and reload a tract fully optimized model at this stage.

@cospectrum
Copy link
Author

If there is no such thing yet, it would be a good idea to start by at least providing public access to all the necessary internals of your IR so that I can create my own utility without a fork. Is IR public?

@cospectrum
Copy link
Author

Well, I see that TypedModel (created with into_optimed) is an alias to Graph whose fields are completely public. Perhaps I have everything I need!

@kali
Copy link
Collaborator

kali commented Jan 24, 2024

Yeah, the "IR" is just tract-core with TypedModel with some optimized operators. Most operators will retain their "decluttered" form, because there is not much to gain in optimizing them, but the most important ones (MatMul & co) are heavily modified.

There is no commitment on stability of operators (decluttered and optimised). Additionally optimized operators are not portable from one architecture to another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants