-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Model Manager v2, feature complete #5694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Cache stat collection enabled. - Implemented ONNX loading. - Add ability to specify the repo version variant in installer CLI. - If caller asks for a repo version that doesn't exist, will fall back to empty version rather than raising an error.
- Implement new model loader and modify invocations and embeddings - Finish implementation loaders for all models currently supported by InvokeAI. - Move lora, textual_inversion, and model patching support into backend/embeddings. - Restore support for model cache statistics collection (a little ugly, needs work). - Fixed up invocations that load and patch models. - Move seamless and silencewarnings utils into better location
- Replace legacy model manager service with the v2 manager. - Update invocations to use new load interface. - Fixed many but not all type checking errors in the invocations. Most were unrelated to model manager - Updated routes. All the new routes live under the route tag `model_manager_v2`. To avoid confusion with the old routes, they have the URL prefix `/api/v2/models`. The old routes have been de-registered. - Added a pytest for the loader. - Updated documentation in contributing/MODEL_MANAGER.md
- Begin to add SwaggerUI documentation for AnyModelConfig and other discriminated Unions.
…on ModelInfo to submodel_type to support new params in model manager
… accept keys, remove invalid assertion
…/InvokeAI into refactor/model-manager2/loader
RyanJDick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a huge diff, so I didn't try to review with much rigour. I just left a few comments on a few things that I noticed as I skimmed through.
invokeai/app/services/invocation_stats/invocation_stats_default.py
Outdated
Show resolved
Hide resolved
| """Block until the indicated job has reached terminal state, or when timeout limit reached.""" | ||
| start = time.time() | ||
| while not job.in_terminal_state: | ||
| if self._install_completed_event.wait(timeout=5): # in case we miss an event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as earlier regarding the hard-coded timeout. Also applies to wait_for_installs(...), below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All modified to use an inner timeout of 0.25. Will this be OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good enough 🙂 Could cap it based on the remaining time, but this seems good enough for now
- ModelMetadataStoreService is now injected into ModelRecordStoreService (these two services are really joined at the hip, and should someday be merged) - ModelRecordStoreService is now injected into ModelManagerService - Reduced timeout value for the various installer and download wait*() methods - Introduced a Mock modelmanager for testing - Removed bare print() statement with _logger in the install helper backend. - Removed unused code from model loader init file - Made `locker` a private variable in the `LoadedModel` object. - Fixed up model merge frontend (will be deprecated anyway!)
- Rename old "model_management" directory to "model_management_OLD" in order to catch dangling references to original model manager. - Caught and fixed most dangling references (still checking) - Rename lora, textual_inversion and model_patcher modules - Introduce a RawModel base class to simplfy the Union returned by the model loaders. - Tidy up the model manager 2-related tests. Add useful fixtures, and a finalizer to the queue and installer fixtures that will stop the services and release threads.
0bd2900 to
640afa0
Compare
- Replace AnyModelLoader with ModelLoaderRegistry - Fix type check errors in multiple files - Remove apparently unneeded `get_model_config_enum()` method from model manager - Remove last vestiges of old model manager - Updated tests and documentation resolve conflict with seamless.py
640afa0 to
4ffe672
Compare
|
Superseded by |
What type of PR is this? (check all applicable)
Have you discussed this change with the InvokeAI team?
Have you updated all relevant documentation?
Description
This PR adds model loading and caching functionality to the model manager refactor. Full details can be found in docs/contrib/MODEL_MANAGER. Compared to the original MM version, there are a few API changes.
The initialized model manager service can now be found in the invocation context's
context.services.model_managerattribute. This is a container with three parts, each corresponding to a different feature set of the model manager:**
context.services.model_manager.store- The ModelRecordService service for retrieving model configurations.**
context.services.model_manager.install- The ModelInstallService for model installation, deletion and manipulation.**
context.services.model_manager.load- The ModelLoadService for loading models into memory, adjudicating VRAM usage, and managing the conversion cacheThere are now three methods for loading models into memory:
load_model_by_key(),load_model_by_attr()andload_model_by_config(). The first uses a key returned by the ModelRecordService to load the model uniquely identified by that key. The second uses the familiarbase/type/nametrio to fetch and load a model. The third loads a model from the model configuration record provided by ModelRecordService. Invocations have been modified to useload_model_by_key()in almost all cases.The model loading methods return a
LoadedModelobject, which has the attributesconfigandmodel. Theconfigattribute contains a copy of the model's configuration record, from which you can get the name, type, description, base type, etc. Themodelattribute retrieves the loaded model itself. As previously, you create a context withLoadedModelto load the model into the execution device (e.g. VRAM for a CUDA system) and lock it there while it is in use:convert_cacheto the maximum size (in gigabytes) that the convert disk cache can grow to. Set it to zero to not cache any models not actively in use.Important Caveats
Related Tickets & Documents
QA Instructions, Screenshots, Recordings
This will require front end work to reenable the model popups and to manage the new multi-threaded background model downloading and installation features. I don't expect things to work fully the first time. When enough changes have been made to the front end to reveal the backend failures, I would be pleased to track down and fix those bugs.
Merge Plan
Currently it won't merge due to conflicts, which I'll address. There will also be conflicts with #5491 , which affects all the load_model() calls. I'm not sure of the optimal merge strategy for these two PRs.
Added/updated tests?
have not been included
[optional] Are there any post deployment tasks we need to perform?