You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This pull request introduces significant changes to how model constants
(weights) are packaged and loaded in the CUDA backend, improving
modularity and flexibility. It also updates the runner APIs to support
more flexible loading modes and bumps PyTorch commit pins and nightly
version references.
**Packaging and Loading Model Constants:**
* Model constants are now separated from the `.so` file and stored as a
binary blob on disk, rather than being packaged directly into the shared
object. The preprocessing logic in `cuda_backend.py` is updated to
handle the new file outputs and manage cleanup.
[[1]](diffhunk://#diff-5b5ea2257772b3aba04b2534f5ea1429a0c631bfd25a7ef531f526e76c471d7aL149-R153)
[[2]](diffhunk://#diff-5b5ea2257772b3aba04b2534f5ea1429a0c631bfd25a7ef531f526e76c471d7aL165-R211)
* The C++ CUDA backend (`cuda_backend.cpp`) now loads the new weights
blob from the named data map and feeds it into the model container using
a newly added API function. The buffer is freed immediately after use
for better resource management.
[[1]](diffhunk://#diff-a4b17eccf1aa933837671c5184e02bc815d934a362344bb2b17b789cdfaa5375R153-R154)
[[2]](diffhunk://#diff-a4b17eccf1aa933837671c5184e02bc815d934a362344bb2b17b789cdfaa5375R183-R195)
**API and Infrastructure Updates:**
* A new function pointer type,
`AOTInductorModelUpdateConstantsFromBlobFunc`, is added to the delegate
handle structure in `aoti_delegate_handle.h` to support updating model
constants from a binary blob.
[[1]](diffhunk://#diff-0598c198d53bf756f6013186ea3155f15ddef247aa863e83ef30f27991b3a0a7R74-R78)
[[2]](diffhunk://#diff-0598c198d53bf756f6013186ea3155f15ddef247aa863e83ef30f27991b3a0a7R95)
* The CUDA backend now loads this new symbol from the shared object at
runtime.
**Runner API Improvements:**
* The multimodal runner API is updated to accept a `Module::LoadMode`
parameter, allowing for more flexible loading options such as memory
mapping. This change is propagated through helper functions and their
headers.
[[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L322-R322)
[[2]](diffhunk://#diff-005ac94c6b217e02d652aafc206d36b2ec1190af36aa0a632fd406975dfc2600L271-R272)
[[3]](diffhunk://#diff-005ac94c6b217e02d652aafc206d36b2ec1190af36aa0a632fd406975dfc2600L281-R284)
[[4]](diffhunk://#diff-ac7a381a7828a6f1a543d2beab4cf503c2d3547ab86821c8e1777df9305108aaL143-R144)
**Dependency Updates:**
* The PyTorch commit pin is updated in
`.ci/docker/ci_commit_pins/pytorch.txt` and the nightly version is
bumped in `torch_pin.py` for compatibility with the new packaging logic.
[[1]](diffhunk://#diff-e873e85ae7aa52ebeadb13a27cf83eff1891b1011e27f94ec040eb8407893c5eL1-R1)
[[2]](diffhunk://#diff-9665391232bd21d4ee0a293cbc7f76d99db902ab1e6e045a59f9a132325babc9L2-R2)
0 commit comments