Skip to content

Conversation

@lstein
Copy link
Collaborator

@lstein lstein commented Feb 10, 2024

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update
  • Community Node Submission

Have you discussed this change with the InvokeAI team?

  • Yes
  • No, because:

Have you updated all relevant documentation?

  • Yes
  • No

Description

This PR adds model loading and caching functionality to the model manager refactor. Full details can be found in docs/contrib/MODEL_MANAGER. Compared to the original MM version, there are a few API changes.

  1. The initialized model manager service can now be found in the invocation context's context.services.model_manager attribute. This is a container with three parts, each corresponding to a different feature set of the model manager:
    ** context.services.model_manager.store - The ModelRecordService service for retrieving model configurations.
    ** context.services.model_manager.install - The ModelInstallService for model installation, deletion and manipulation.
    ** context.services.model_manager.load - The ModelLoadService for loading models into memory, adjudicating VRAM usage, and managing the conversion cache

  2. There are now three methods for loading models into memory: load_model_by_key(), load_model_by_attr() and load_model_by_config(). The first uses a key returned by the ModelRecordService to load the model uniquely identified by that key. The second uses the familiar base/type/name trio to fetch and load a model. The third loads a model from the model configuration record provided by ModelRecordService. Invocations have been modified to use load_model_by_key() in almost all cases.

  3. The model loading methods return a LoadedModel object, which has the attributes config and model. The config attribute contains a copy of the model's configuration record, from which you can get the name, type, description, base type, etc. The model attribute retrieves the loaded model itself. As previously, you create a context with LoadedModel to load the model into the execution device (e.g. VRAM for a CUDA system) and lock it there while it is in use:

loaded_model = context.services.model_manager.load_model_by_key('83191a81bbf9118945')
with loaded_model as vae:
     vae.decode(...)
  1. By popular request, you can now adjust the size of the conversion cache by setting the InvokeAI configuration option convert_cache to the maximum size (in gigabytes) that the convert disk cache can grow to. Set it to zero to not cache any models not actively in use.

Important Caveats

  • I have not made any changes to the front end at all, so the model selection popups are non-functional and generations don't work.
  • I don't have a good way of running invocations from the command line, and so I haven't tested that they are working properly. There were also a number of changes needed for loras and textual inversions, and these haven't been tested. I fully expect there to be breakage.
  • On the other hand, I did spend quite a lot of time tracking down type checking errors that were pre-existing in the generation and model patching code.
  • I've fixed a couple of places where there were TODO's in the model generation/patching code. This includes automatic detection and registration of the correct image encoder for IP-Adapters, which I found requested in a bit of Ryan's code.

Related Tickets & Documents

  • Related Issue #
  • Closes #

QA Instructions, Screenshots, Recordings

This will require front end work to reenable the model popups and to manage the new multi-threaded background model downloading and installation features. I don't expect things to work fully the first time. When enough changes have been made to the front end to reveal the backend failures, I would be pleased to track down and fix those bugs.

Merge Plan

Currently it won't merge due to conflicts, which I'll address. There will also be conflicts with #5491 , which affects all the load_model() calls. I'm not sure of the optimal merge strategy for these two PRs.

Added/updated tests?

  • Yes -- but limited to one small embedding
  • No : please replace this line with details on why tests
    have not been included

[optional] Are there any post deployment tasks we need to perform?

Lincoln Stein added 13 commits January 22, 2024 14:37
- Cache stat collection enabled.
- Implemented ONNX loading.
- Add ability to specify the repo version variant in installer CLI.
- If caller asks for a repo version that doesn't exist, will fall back
  to empty version rather than raising an error.
- Implement new model loader and modify invocations and embeddings

- Finish implementation loaders for all models currently supported by
  InvokeAI.

- Move lora, textual_inversion, and model patching support into
  backend/embeddings.

- Restore support for model cache statistics collection (a little ugly,
  needs work).

- Fixed up invocations that load and patch models.

- Move seamless and silencewarnings utils into better location
- Replace legacy model manager service with the v2 manager.

- Update invocations to use new load interface.

- Fixed many but not all type checking errors in the invocations. Most
  were unrelated to model manager

- Updated routes. All the new routes live under the route tag
  `model_manager_v2`. To avoid confusion with the old routes,
  they have the URL prefix `/api/v2/models`. The old routes
  have been de-registered.

- Added a pytest for the loader.

- Updated documentation in contributing/MODEL_MANAGER.md
@github-actions github-actions bot added documentation Improvements or additions to documentation api python PRs that change python files Root PythonDeps invocations PRs that change invocations backend PRs that change backend files services PRs that change app services labels Feb 10, 2024
Lincoln Stein added 2 commits February 12, 2024 23:31
- Begin to add SwaggerUI documentation for AnyModelConfig and other
  discriminated Unions.
@Millu Millu linked an issue Feb 14, 2024 that may be closed by this pull request
1 task
@psychedelicious psychedelicious mentioned this pull request Feb 15, 2024
Copy link
Contributor

@RyanJDick RyanJDick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a huge diff, so I didn't try to review with much rigour. I just left a few comments on a few things that I noticed as I skimmed through.

"""Block until the indicated job has reached terminal state, or when timeout limit reached."""
start = time.time()
while not job.in_terminal_state:
if self._install_completed_event.wait(timeout=5): # in case we miss an event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as earlier regarding the hard-coded timeout. Also applies to wait_for_installs(...), below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All modified to use an inner timeout of 0.25. Will this be OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good enough 🙂 Could cap it based on the remaining time, but this seems good enough for now

Lincoln Stein added 2 commits February 15, 2024 23:25
- ModelMetadataStoreService is now injected into ModelRecordStoreService
  (these two services are really joined at the hip, and should someday be merged)
- ModelRecordStoreService is now injected into ModelManagerService
- Reduced timeout value for the various installer and download wait*() methods
- Introduced a Mock modelmanager for testing
- Removed bare print() statement with _logger in the install helper backend.
- Removed unused code from model loader init file
- Made `locker` a private variable in the `LoadedModel` object.
- Fixed up model merge frontend (will be deprecated anyway!)
- Rename old "model_management" directory to "model_management_OLD" in order to catch
  dangling references to original model manager.
- Caught and fixed most dangling references (still checking)
- Rename lora, textual_inversion and model_patcher modules
- Introduce a RawModel base class to simplfy the Union returned by the
  model loaders.
- Tidy up the model manager 2-related tests. Add useful fixtures, and
  a finalizer to the queue and installer fixtures that will stop the
  services and release threads.
@lstein lstein marked this pull request as ready for review February 18, 2024 03:39
@lstein lstein force-pushed the refactor/model-manager2/loader branch from 0bd2900 to 640afa0 Compare February 18, 2024 03:55
- Replace AnyModelLoader with ModelLoaderRegistry
- Fix type check errors in multiple files
- Remove apparently unneeded `get_model_config_enum()` method from model manager
- Remove last vestiges of old model manager
- Updated tests and documentation

resolve conflict with seamless.py
@lstein lstein force-pushed the refactor/model-manager2/loader branch from 640afa0 to 4ffe672 Compare February 18, 2024 04:04
@lstein
Copy link
Collaborator Author

lstein commented Feb 18, 2024

This comment records commits that will need to be cherry-picked into next:
09e7d35
ed2d9ae
4ffe672

@psychedelicious
Copy link
Contributor

Superseded by next branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api backend PRs that change backend files documentation Improvements or additions to documentation invocations PRs that change invocations python PRs that change python files Root services PRs that change app services

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: Files being sotred in unexpected places

5 participants