How to speed up development workflow with local SageMaker? #4797

richardkmichael · 2024-07-24T02:29:31Z

richardkmichael
Jul 24, 2024

I'm working with a large model framework (Nvidia NeMo), and a relatively complex inference script. There is a lot of experimentation, so I thought it would be faster to use SageMaker in "local" mode, and I have it working.

However:

the model data bundle has a requirements.txt which installs ~1GB of dependencies (NeMo and deps), downloaded each time. SageMaker's docker compose rebuilds the container every time I re-run my local model.deploy(instance_type='local', ...) which takes at least 5 minutes (installing dependencies). So it is a very slow development process -- change code, re-deploy, find a bug, etc.
Since the inference code is not "live" within the running torch server, I need to save, recreate the model tar.gz bundle, and re-deploy for any change to the inference code. This is also painful, even if the docker container was re-used.

Any suggestions to speed up my local workflow?

I've considered --

Re 1:

Is it possible to specify a custom image with instance_type='local'? The model.deploy() function doesn't seem to accept an image name argument. But if so, I could build a custom Docker image, with the framework already installed, so that pip would quickly find requirement already satisfied when it processes my requirements.txt file.

Re 2:
I could automate the model bundle rebuild with a watcher on the inference code.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to speed up development workflow with local SageMaker? #4797

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to speed up development workflow with local SageMaker? #4797

richardkmichael Jul 24, 2024

Replies: 0 comments

richardkmichael
Jul 24, 2024