Skip to content

Commit 1ea9944

Browse files
mohamedzeidan2021pintaoz-awspintaozsage-makertimkuo-amazon
authored
kandinsky, nova, zimmer to Keynote 3 v2 (#1924)
* Add framework_version to all TensorFlowModel examples (#5038) * Add framework_version to all TensorFlowModel examples * update framework_version to x.x.x --------- Co-authored-by: pintaoz <[email protected]> * Fix hyperparameter strategy docs (#5045) * fix: pass in inference_ami_version to model_based endpoint type (#5043) * fix: pass in inference_ami_version to model_based endpoint type * documentation: update contributing.md w/ venv instructions and pip install fixes --------- Co-authored-by: Zhaoqi <[email protected]> * Add warning about not supporting torch.nn.SyncBatchNorm (#5046) * Add warning about not supporting * update wording --------- Co-authored-by: pintaoz <[email protected]> * prepare release v2.239.2 * update development version to v2.239.3.dev0 * change: update image_uri_configs 02-19-2025 06:18:15 PST * change: added ap-southeast-7 and mx-central-1 for Jumpstart (#5049) * added ap-southeast-7 and mx-central-1 for Jumpstart * added BKK dlc to djl-neuronx --------- Co-authored-by: Isha Chidrawar <[email protected]> * prepare release v2.239.3 * update development version to v2.239.4.dev0 * change: update image_uri_configs 02-20-2025 06:18:08 PST * feat: Add support for TGI Neuronx 0.0.27 and HF PT 2.3.0 image in PySDK (#5050) Co-authored-by: malavhs <[email protected]> * Add backward compatbility for RecordSerializer and RecordDeserializer (#5052) * Add backward compatbility for RecordSerializer and RecordDeserializer * fix circular import * fix test --------- Co-authored-by: pintaoz <[email protected]> * py_version doc fixes (#5048) * change: update image_uri_configs 02-21-2025 06:18:10 PST * fix: altconfig hubcontent and reenable integ test (#5051) * fix altconfig hubcontent and reenable integ test * linting * update exception thrown * feat: Add support for TGI Neuronx 0.0.27 and HF PT 2.3.0 image in PySDK (#5050) Co-authored-by: malavhs <[email protected]> * add test * update predictor spec accessor * lint * set custom field from HCD config to model spec data class * lint * remove logs * last update --------- Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: malavhs <[email protected]> * fix: forbid extras in Configs (#5042) * fix: make configs safer * fix: safer destructor in ModelTrainer * format * Update error message * pylint * Create BaseConfig * Remove main function entrypoint in ModelBuilder dependency manager. (#5058) * Remove main function entrypoint in ModelBuilder dependency manager. * Remove main function entrypoint in ModelBuilder dependency manager. --------- Co-authored-by: Joseph Zhang <[email protected]> * documentation: Removed a line about python version requirements of training script which can misguide users. (#5057) * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container --------- Co-authored-by: Roja Reddy Sareddy <[email protected]> * prepare release v2.240.0 * update development version to v2.240.1.dev0 * Fix key error in _send_metrics() (#5068) Co-authored-by: pintaoz <[email protected]> * fix: Added check for the presence of model package group before creating one (#5063) Co-authored-by: Keshav Chandak <[email protected]> * Use sagemaker session's s3_resource in download_folder (#5064) Co-authored-by: pintaoz <[email protected]> * Fix error when there is no session to call _create_model_request() (#5062) * Fix error when there is no session to call _create_model_request() * Fix codestyle --------- Co-authored-by: pintaoz <[email protected]> * Ensure Model.is_repack() returns a boolean (#5060) * Ensure Model.is_repack() returns a boolean * update test --------- Co-authored-by: pintaoz <[email protected]> * feat: Allow ModelTrainer to accept hyperparameters file (#5059) * Allow ModelTrainer to accept hyperparameter file and create Hyperparameter class * pylint * Detect hyperparameters from contents rather than file extension * pylint * change: add integs * change: add integs * change: remove custom hyperparameter tooling * Add tests for hp contracts * change: add unit tests and remove unreachable condition * fix integs * doc check fix * fix tests * fix tox.ini * add unit test * feature: support training for JumpStart model references as part of Curated Hub Phase 2 (#5070) * change: update image_uri_configs 01-27-2025 06:18:13 PST * fix: skip TF tests for unsupported versions (#5007) * fix: skip TF tests for unsupported versions * flake8 * change: update image_uri_configs 01-29-2025 06:18:08 PST * chore: add new images for HF TGI (#5005) * feat: add pytorch-tgi-inference 2.4.0 * add tgi 3.0.1 image * skip faulty test * formatting * formatting * add hf pytorch training 4.46 * update version alias * add py311 to training version * update tests with pyversion 311 * formatting --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> * feat: use jumpstart deployment config image as default optimization image (#4992) Co-authored-by: Erick Benitez-Ramos <[email protected]> * prepare release v2.238.0 * update development version to v2.238.1.dev0 * Fix ssh host policy (#4966) * Fix ssh host policy * Filter policy by algo- * Add docstring * Fix pylint * Fix docstyle summary * Unit test * Fix unit test * Change to unit test * Fix unit tests * Test comment out flaky tests * Readd the flaky tests * Remove flaky asserts * Remove flaky asserts --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> * change: Allow telemetry only in supported regions (#5009) * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions --------- Co-authored-by: Roja Reddy Sareddy <[email protected]> * mpirun protocol - distributed training with @remote decorator (#4998) * implemented multi-node distribution with @remote function * completed unit tests * added distributed training with CPU and torchrun * backwards compatibility nproc_per_node * fixing code: permissions for non-root users, integration tests * fixed docstyle * refactor nproc_per_node for backwards compatibility * refactor nproc_per_node for backwards compatibility * pylint fix, newlines * added unit tests for bootstrap_environment remote * added mpirun protocol for distributed training with @remote decorator * aligned mpi_utils_remote.py to mpi_utils.py for estimator * updated docstring for sagemaker sdk doc --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> * feat: Add support for deepseek recipes (#5011) * feat: Add support for deeepseek recipes * pylint * add unit test * feat: [JumpStart] Add access configs and training instance type variants artifact uri handling for Curated Hub Phase 2 training integration (#1653) * Add access config to training input for Curated Hub Training Integration * Add support to retrieve instance specific training artifact keys * Fix some typos and naming issues * Fix more typos * fix formatting issues with black * modify access config logic so accept_eula is passed into fit * update black formatting * Add more unit tests for passing access configs * fix style errors * fix for failing integ test * fix styles and integ test error * skip blocking integ test * fix formatting * remove env vars when access configs are being used * fix docstyle issue * update usage of access configs, remove conversion of training artifact key to uri * fix styling issues * fix styling issues * fix unit tests * fix adding hubaccessconfig only if hubcontentarn exists * move logic to JumpStartEstimator from Job * Fix styling issues * Remove unused code * fix styling issues * fix unit test failure * fix some formatting, add comments * remove typing for estimator in get_access_configs function * fix circular import dependency * fix styling issues --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> * Always add code channel, regardless of network isolation (#1657) * fix formatting issue * fix formatting issue * fix formatting issue * fix tensorflow file --------- Co-authored-by: sagemaker-bot <[email protected]> Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: varunmoris <[email protected]> Co-authored-by: Gary Wang <[email protected]> Co-authored-by: ci <ci> Co-authored-by: parknate@ <[email protected]> Co-authored-by: rsareddy0329 <[email protected]> Co-authored-by: Roja Reddy Sareddy <[email protected]> Co-authored-by: Bruno Pistone <[email protected]> * feat: Make DistributedConfig Extensible (#5039) * feat: Make DistributedConfig Extensible * pylint * Include none types when creating config jsons for safer reference * fix: update test to account for changes * format * Add integ test * pylint * prepare release v2.240.0 * update development version to v2.240.1.dev0 * Fix key error in _send_metrics() (#5068) Co-authored-by: pintaoz <[email protected]> * fix: Added check for the presence of model package group before creating one (#5063) Co-authored-by: Keshav Chandak <[email protected]> * Use sagemaker session's s3_resource in download_folder (#5064) Co-authored-by: pintaoz <[email protected]> * remove union * fix merge artifact * Change dir path to distributed_drivers * update paths --------- Co-authored-by: ci <ci> Co-authored-by: pintaoz-aws <[email protected]> Co-authored-by: pintaoz <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> * Skip tests with deprecated instance type (#5077) Co-authored-by: pintaoz <[email protected]> * prepare release v2.241.0 * update development version to v2.241.1.dev0 * pipeline definition function doc update (#5074) Co-authored-by: Rohan Gujarathi <[email protected]> * feat: add integ tests for training JumpStart models in private hub (#5076) * feat: add integ tests for training JumpStart models in private hub * fixed formatting * remove unused imports * fix unused imports * fix unit test failure and fix bug around versioning * fix formatting * fix unit tests * fix model_uri usage issue * fix some formatting * separate private hub setup code * add try catch block * fix flake8 issue so except clause is not bare * black formatting * fix: resolve infinite loop in _find_config on Windows systems (#4970) * fix: resolve Windows path handling in _find_config * Replace Path.match("/") with Path.anchor comparison * Fix infinite loop in _studio.py path traversal * test: Add tests for the new root path exploration * Fix formatting style * Fixed line to long * Fix docstyle by running black manually * Fix testcase with \\ when running on non-windows machines * Fix formatting style * cleanup unused import * change: update image_uri_configs 03-11-2025 07:18:09 PST * Fixing Pytorch training python version in tests (#5084) * Fixing Pytorch training python version in tests * Updating Inference test handling * remove s3 output location requirement from hub class init (#5081) * remove s3 output location requirement from hub class init * fix integ test hub * lint * fix test --------- Co-authored-by: Gokul Anantha Narayanan <[email protected]> * fix: Prevent RunContext overlap between test_run tests (#5083) Co-authored-by: Gokul Anantha Narayanan <[email protected]> * Torch upgrade (#5086) * Fix Flake8 Violations * UPDATE PYTORCH VERSION TO ADDRESS SECURITY RISK **Description** Currently used Pytorch version has a possible vulnerability . Internal - https://tiny.amazon.com/p5i4jla1 **Testing Done** Unit and Integration tests in the CodeBuild * REvert CPU Versions * Test Fix * Codestyle fixes * debug attempt * Fixes * Fix * Fix * prepare release v2.242.0 * update development version to v2.242.1.dev0 * add new regions to JUMPSTART_LAUNCHED_REGIONS (#5089) Co-authored-by: isha chidrawar <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> * ADD Documentation to ReadtheDocs for Upgrading torch versions (#5090) * ADD Documentation to ReadtheDocs for Upgrading torch versions **Description** **Testing Done** Only documentation updates * Fix for Codestyle * Remove unused import * Flake8 Fix * CodeStyle Fixes * feature: Enabled update_endpoint through model_builder (#5085) * feature: Enabled update_endpoint through model_builder * fix: fix unit test, black-check, pylint errors * fix: fix black-check, pylint errors --------- Co-authored-by: Roja Reddy Sareddy <[email protected]> * fix: factor in set instance type when building JumpStart models in ModelBuilder. (#5093) * Remove main function entrypoint in ModelBuilder dependency manager. * Remove main function entrypoint in ModelBuilder dependency manager. * fix: factor in set instance type when building JumpStart models in ModelBuilder. * Remove default instance type from ModelBuilder. * Restore default instance type. Tweak integ test. --------- Co-authored-by: Joseph Zhang <[email protected]> * change: update image_uri_configs 03-21-2025 07:17:55 PST * Skip tests failed due to deprecated instance type (#5097) Co-authored-by: pintaoz <[email protected]> * Feat: Added support for returing most recently created approved model package in a group (#5092) Co-authored-by: Keshav Chandak <[email protected]> * change: update image_uri_configs 03-25-2025 07:18:13 PST * chore: fix integ tests to use latest version of model (#5104) * change: update image_uri_configs 03-26-2025 07:18:16 PST * Update Jinja version (#5101) * Aligned disable_output_compression for @remote with Estimator (#5094) * Update transformers version (#5102) * fix: use temp file in unit tests (#5106) * fix: fix flaky spark processor integ (#5109) * fix: fix flaky spark processor integ * format * fix: fix flaky clarify model monitor test (#5107) * chore: move jumpstart region definitions to json file (#5095) * chore: move jumpstart region definitions to json file * chore: address formatting issues * fix: neo regions not ga in 5 regions * chore: make variable private --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> * change: Update for PT 2.5.1, SMP 2.8.0 (#5071) * prepare release v2.243.0 * update development version to v2.243.1.dev0 * fix: flaky test (#5111) * chore: fix semantic versioning for wildcard identifier (#5105) * Add mlflow tracking arn telemetry (#5113) Integ test failure is align with CI health * Master (#5112) * fix integ test hub * lint * fix jumpstart curated hub bugs * lint * fix tests * linting * lint * rm test file * fix test * fix * lint * remove test * update for test * documentation: update ModelStep data dependency info (#5120) Co-authored-by: Namrata Madan <[email protected]> * Update instance gpu info (#5119) * fix: remove historical job_name caching which causes long job name (#5118) * Fix issue #4856 by copying environment variables (#5115) * Fix issue #4856 by copying environment variables * Added handler for pipeline variable while creating process job (#5122) * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container * feature: Enabled update_endpoint through model_builder * fix: fix unit test, black-check, pylint errors * fix: fix black-check, pylint errors * fix:Added handler for pipeline variable while creating process job * fix: Added handler for pipeline variable while creating process job --------- Co-authored-by: Roja Reddy Sareddy <[email protected]> * documentation: update pipelines step caching examples to include more steps (#5121) Co-authored-by: Brock Wade <[email protected]> * prepare release v2.243.1 * update development version to v2.243.2.dev0 * Fix deepdiff dependencies (#5128) * Fix deepdiff dependencies * trigger tests * Fix: fix the issue due to PR changes, 5122 (#5124) * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container * feature: Enabled update_endpoint through model_builder * fix: fix unit test, black-check, pylint errors * fix: fix black-check, pylint errors * fix:Added handler for pipeline variable while creating process job * fix: Added handler for pipeline variable while creating process job * Revert the PR changes: #5122, due to issue https://t.corp.amazon.com/P223568185/overview * Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication --------- Co-authored-by: Roja Reddy Sareddy <[email protected]> * fix: tgi image uri unit tests (#5127) * fix: tgi image uri unit tests * fix: black-format and flake8 failures * fix: parse * fix: print statement --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> * prepare release v2.243.2 * update development version to v2.243.3.dev0 * change: update image_uri_configs 04-11-2025 07:18:19 PST * change: update image_uri_configs 04-15-2025 07:18:10 PST * change: update image_uri_configs 04-16-2025 07:18:18 PST * update pr test to deprecate py38 and add py312 (#5133) * Py312 upgrade step 2: Update dependencies, integ tests and unit tests (#5123) * clean up * bump maxdepth for doc/api/training to fix readthedocs * change maxdepth for readthedocs rendering doc/api/training page * change maxdepth for readthedocs rendering doc/api/training page * change maxdepth for readthedocs rendering doc/api/training page * Revert the PR changes 5122 (#5134) * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container * feature: Enabled update_endpoint through model_builder * fix: fix unit test, black-check, pylint errors * fix: fix black-check, pylint errors * fix:Added handler for pipeline variable while creating process job * fix: Added handler for pipeline variable while creating process job * Revert the PR changes: #5122, due to issue https://t.corp.amazon.com/P223568185/overview * Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication * Revert PR 5122 changes, due to issues with other processor codeflows --------- Co-authored-by: Roja Reddy Sareddy <[email protected]> Co-authored-by: Zhaoqi <[email protected]> * update readme to reflect py312 upgrade * prepare release v2.243.3 * update development version to v2.243.4.dev0 * chore: add huggingface images (#5142) * Update ModelTrainer to support s3 uri and tar.gz file as source_dir (#5144) * add s3 uri check to modeltrainer data source * update ModelTrainer to support s3 uri and tar.gz file as source_dir * black-format * add unit and integ tests * update logic and unit test to raise value error if the file is not .tar.gz * feature:support custom workflow deployment in ModelBuilder using SMD image. (#5143) * feature:support custom workflow deployment in ModelBuilder using SMD image. (#1661) * feature:support custom workflow deployment in ModelBuilder using SMD inference image. * Rename test case and pass session. * Address PR comments. * Tweak resource cleanup logic in integ test. * Fixing CodeBuild integ test failures. * Renamed integ test. * Remove unused integ test, restore once GA. --------- Co-authored-by: Joseph Zhang <[email protected]> * Cache client as instance attribute in property@ decorator. (#1668) * Remove property@ decorator from ABC definition. * Cache client as instance attribute in @property. * Fix flake8 issue. --------- Co-authored-by: Joseph Zhang <[email protected]> * Bugfixes from e2e testing. (#1670) * Fix Alabtross Inference component tests * trigger integ tests --------- Co-authored-by: cj-zhang <[email protected]> Co-authored-by: Joseph Zhang <[email protected]> Co-authored-by: Pravali Uppugunduri <[email protected]> * fix: pin mamba version to 24.11.3-2 to avoid inconsistent test runs (#5149) Co-authored-by: Namrata Madan <[email protected]> * Add model server timeout (#5151) Co-authored-by: adishaa <[email protected]> * Add Owner ID check for bucket with path when prefix is provided (#5146) * Fix Flake8 Violations * Add Owner ID check for bucket with path when prefix is provided **Description** Previously we called the head_bucket call to ensure the owner ID check, but this doesnt take into consideration cases where the s3 path is provided through the prefix. This change makes sure that director level permissions are supported. **Testing Done** Tested through unit tests, integ tests and manual testing through the installation file. Yes * Address PR comment * Codestyle fixes * Minor fix * Codestyle fixes * Fix Unit tests * prepare release v2.244.0 * update development version to v2.244.1.dev0 * chore: Add tei 1.6.0 image (#5145) * chore: add huggingface images * chore: add tei 1.6 image * chore: add tei 1.6.0 to tei mapping in tests * build(deps): bump mlflow in /tests/data/serve_resources/mlflow/pytorch (#5098) Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 2.20.3. - [Release notes](https://github.com/mlflow/mlflow/releases) - [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md) - [Commits](https://github.com/mlflow/mlflow/compare/v2.13.2...v2.20.3) --- updated-dependencies: - dependency-name: mlflow dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump mlflow (#5155) Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 2.20.3. - [Release notes](https://github.com/mlflow/mlflow/releases) - [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md) - [Commits](https://github.com/mlflow/mlflow/compare/v2.13.2...v2.20.3) --- updated-dependencies: - dependency-name: mlflow dependency-version: 2.20.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump scikit-learn (#5156) Bumps [scikit-learn](https://github.com/scikit-learn/scikit-learn) from 1.3.2 to 1.5.1. - [Release notes](https://github.com/scikit-learn/scikit-learn/releases) - [Commits](https://github.com/scikit-learn/scikit-learn/compare/1.3.2...1.5.1) --- updated-dependencies: - dependency-name: scikit-learn dependency-version: 1.5.1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Improve error logging and documentation for issue 4007 (#5153) * Improve error logging and documentation for issue 4007 * Add hyperlink to RTDs * fix: fix bad initialization script error message (#5152) Co-authored-by: Namrata Madan <[email protected]> * fix: pin test dependency (#5165) * fix: Map llama models to correct script (#5159) * fix: honor json serialization of HPs (#5164) * fix: honor json serialization of HPs * test * fix * chore: Allow omegaconf >=2.2,<3 (#5168) * Fix type annotations (#5166) * remove --strip-component for untar source tar.gz (#5163) * remove --strip-component for untar source tar.gz * update code.tar.gz in test --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> * fix: parameter mismatch in update_endpoint (#5135) * add AG v1.3 (#5171) Co-authored-by: Ubuntu <[email protected]> * Fix test_deploy_with_update_endpoint() (#5177) Co-authored-by: pintaoz <[email protected]> * huggingface-tei dlc image_uri (#5174) Co-authored-by: pintaoz-aws <[email protected]> * huggingface-neuronx dlc image_uri (#5172) * huggingface-neuronx dlc image_uri * huggingface-neuronx inference dlc --------- Co-authored-by: pintaoz-aws <[email protected]> * huggingface-llm-neuronx dlc (#5173) Co-authored-by: pintaoz-aws <[email protected]> * Fix test_huggingface_tei_uris() (#5178) * Fix test_huggingface_tei_uris() * Fix json --------- Co-authored-by: pintaoz <[email protected]> * Fix Flask-Limiter version (#5180) * prepare release v2.244.1 * update development version to v2.244.2.dev0 * change: Improve defaults handling in ModelTrainer (#5170) * Improve default handling * format * add tests & update docs * fix docstyle * fix input_data_config * fix use input_data_config parameter in train as authoritative source * fix tests * format * update checkpoint config * docstyle * make config creation backwards compatible * format * fix condition * fix Compute and Networking config when attributes are None * format * fix * format * change: Add image configs and region config for TPE (ap-east-2) (#5167) * add image configs and region config for TPE (ap-east-2) * remove TPE from djl-neuronx --------- Co-authored-by: isha chidrawar <[email protected]> Co-authored-by: pintaoz-aws <[email protected]> * change: update image_uri_configs 05-14-2025 07:18:16 PST * change: update jumpstart region_config 05-15-2025 07:18:15 PST * fix: clarify model monitor one time schedule bug (#5169) * fix: include model channel for gated uncompressed models (#5181) * prepare release v2.244.2 * update development version to v2.244.3.dev0 * change: update image_uri_configs 05-20-2025 07:18:17 PST * feat: Correct mypy type checking through PEP 561 (#5027) Co-authored-by: parknate@ <[email protected]> Co-authored-by: Molly He <[email protected]> * change: merge method inputs with class inputs (#5183) * fix: addWaiterTimeoutHandling (#4951) * addWaiterTimeoutHandling * codeStyleUpdate * updateCodeStyle * updateCodeStyle * updateCodeStyle * updateCodeStyle * updateCodeStyle * updateCodeStyle --------- Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> Co-authored-by: Ubuntu <[email protected]> * MLFLow update for dependabot (#5187) * MLFLow update for dependabot * Update lower bound * Unit test fixes * prepare release v2.245.0 * update development version to v2.245.1.dev0 * feature: Triton v25.04 DLC (#5188) Co-authored-by: Mohan Kishore <[email protected]> * update estimator documentation regarding hyperparameters for source_dir (#5190) * Update Attrs version to widen support (#5185) * Update Attrs version to widen support **Description** https://github.com/aws/sagemaker-python-sdk/issues/5075 **Testing Done** Running unit and integ tests Unit and integ tests passing indicate that this version upgrade does not break anything * Update version in conda_in_process.yml * Update test requirements * MLFlow update version --- Tested by : Running unit and integ tests * prepare release v2.246.0 * update development version to v2.246.1.dev0 * fix: Allow import failure for internal _hashlib module (#5192) * fix: Allow import failure for _hashlib module * Fix formatting * Appease flake8 * Add ignore_patterns in ModelTrainer to ignore specific files/folders (#5194) * Add ignore_patterns in ModelTrainer to ignore specific files/folders * fix black format * add unit test * add default ignore_patterns, fix minor path issue when uploaded to s3 * minor change to fix unit test failure * add new variables in default ignore_patterns * fix indentation error in docstring for readthedocs * Fix: Object of type ModelLifeCycle is not JSON serializable (#5197) * Fix: Object of type ModelLifeCycle is not JSON serializable * Fix unit test * Fix integ tests * Revert "Fix integ tests" This reverts commit f6513fe430d7f7f13486239aaaf6983efde2e00f. * Fix integration tests --------- Co-authored-by: adishaa <[email protected]> * change: update jumpstart region_config, update image_uri_configs 06-12-2025 07:18:12 PST * feat: Add support for MetricDefinitions in ModelTrainer (#5202) * feat: Add support for MetricDefinitions in ModelTrainer * style fix * Update model_trainer.py to generate the doc * resolve unit test failed * solve another unit test error --------- Co-authored-by: Chad Chiang <[email protected]> * prepare release v2.247.0 * update development version to v2.247.1.dev0 * change: update image_uri_configs 06-19-2025 07:18:34 PST * prepare release v2.247.1 * update development version to v2.247.2.dev0 * change: relax protobuf to <6.32 (#5211) * change: update image_uri_configs 06-26-2025 07:18:35 PST * feature: integrate amtviz for visualization of tuning jobs (#5044) * feature: integrate amtviz for visualization of tuning jobs * Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserialzers (#5037) * Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserializers * fix codestyle * fix test --------- Co-authored-by: pintaoz <[email protected]> * Add framework_version to all TensorFlowModel examples (#5038) * Add framework_version to all TensorFlowModel examples * update framework_version to x.x.x --------- Co-authored-by: pintaoz <[email protected]> * Fix hyperparameter strategy docs (#5045) * fix: pass in inference_ami_version to model_based endpoint type (#5043) * fix: pass in inference_ami_version to model_based endpoint type * documentation: update contributing.md w/ venv instructions and pip install fixes --------- Co-authored-by: Zhaoqi <[email protected]> * Add warning about not supporting torch.nn.SyncBatchNorm (#5046) * Add warning about not supporting * update wording --------- Co-authored-by: pintaoz <[email protected]> * prepare release v2.239.2 * update development version to v2.239.3.dev0 * change: update image_uri_configs 02-19-2025 06:18:15 PST * fix: codestyle, type hints, license, and docstrings * documentation: add docstring for amtviz module * fix: fix docstyle and flake8 errors * fix: code reformat using black --------- Co-authored-by: Uemit Yoldas <[email protected]> Co-authored-by: pintaoz-aws <[email protected]> Co-authored-by: pintaoz <[email protected]> Co-authored-by: parknate@ <[email protected]> Co-authored-by: timkuo-amazon <[email protected]> Co-authored-by: Zhaoqi <[email protected]> Co-authored-by: ci <ci> Co-authored-by: sagemaker-bot <[email protected]> * change: update image_uri_configs 07-04-2025 07:18:27 PST * Update TF DLC python version to py312 (#5231) * Update TF DLC python version to py312 * catch integ version * Bump SMD version to enable custom workflow deployment. (#5230) * Bump SMD version to enable custom workflow deployment. * Update SMD image uri UT. --------- Co-authored-by: Joseph Zhang <[email protected]> * Adding Hyperpod feature to enable hyperpod telemetry * Adding Hyperpod feature to enable hyperpod telemetry (#5235) * Adding Hyperpod feature to enable hyperpod telemetry * Adding Hyperpod feature to enable hyperpod telemetry --------- Co-authored-by: Roja Reddy Sareddy <[email protected]> * fix: sanitize git clone repo input url (#5234) * build(deps): bump torch in /tests/data/modules/script_mode (#5189) Bumps [torch](https://github.com/pytorch/pytorch) from 2.0.1+cpu to 2.7.0. - [Release notes](https://github.com/pytorch/pytorch/releases) - [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md) - [Commits](https://github.com/pytorch/pytorch/commits/v2.7.0) --- updated-dependencies: - dependency-name: torch dependency-version: 2.7.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: parknate@ <[email protected]> * build(deps): bump mlflow in /tests/data/serve_resources/mlflow/xgboost (#5218) Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 3.1.0. - [Release notes](https://github.com/mlflow/mlflow/releases) - [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md) - [Commits](https://github.com/mlflow/mlflow/compare/v2.13.2...v3.1.0) --- updated-dependencies: - dependency-name: mlflow dependency-version: 3.1.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: parknate@ <[email protected]> * build(deps): bump protobuf from 4.25.5 to 4.25.8 in /requirements/extras (#5209) Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 4.25.5 to 4.25.8. - [Release notes](https://github.com/protocolbuffers/protobuf/releases) - [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl) - [Commits](https://github.com/protocolbuffers/protobuf/compare/v4.25.5...v4.25.8) --- updated-dependencies: - dependency-name: protobuf dependency-version: 4.25.8 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: parknate@ <[email protected]> * build(deps): bump requests in /tests/data/serve_resources/mlflow/pytorch (#5200) Bumps [requests](https://github.com/psf/requests) from 2.32.2 to 2.32.4. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](https://github.com/psf/requests/compare/v2.32.2...v2.32.4) --- updated-dependencies: - dependency-name: requests dependency-version: 2.32.4 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: parknate@ <[email protected]> * prepare release v2.248.0 * update development version to v2.248.1.dev0 * Nova training support (#5238) * feature: Added Amazon Nova training support for ModelTrainer and Estimator Co-authored-by: Erick Benitez-Ramos <[email protected]> * prepare release v2.248.1 * update development version to v2.248.2.dev0 * change: When rootlessDocker is enabled, return a fixed SageMaker IP (#5236) * change: When rootlessDocker is enabled, return a fixed SageMaker IP * Add logging for docker info command failure --------- Co-authored-by: Jiali Xing <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> * fix: add hard dependency on sagemaker-core pypi lib (#5241) * change: update image_uri_configs 07-18-2025 07:18:28 PST * change: update image_uri_configs 07-22-2025 07:18:25 PST * Relax boto3 version requirement (#5245) * prepare release v2.248.2 * update development version to v2.248.3.dev0 * change: update image_uri_configs 07-23-2025 07:18:25 PST * Directly use customer-provided endpoint name for ModelBuilder deployment. (#5246) * Directly use customer-provided endpoint name for deployment in ModelBuilder. * Fix ModelBuilder UTs after removing unique_name_from_base import. --------- Co-authored-by: Joseph Zhang <[email protected]> * feature: AWS Batch for SageMaker Training jobs (#5249) --------- Co-authored-by: Greg Katkov <[email protected]> Co-authored-by: haoxinwa <[email protected]> Co-authored-by: JennaZhao <[email protected]> Co-authored-by: Jessica Zhu <[email protected]> Co-authored-by: David Lindskog <[email protected]> * prepare release v2.249.0 * update development version to v2.249.1.dev0 * Add more constraints to test requirements (#5254) * Add constraint file to test requirements * Add constraints --------- Co-authored-by: pintaoz <[email protected]> * feature: Add support for InstancePlacementConfig in Estimator for training jobs running on ultraserver capacity (#5259) --------- Co-authored-by: Greg Katkov <[email protected]> * prepare release v2.250.0 * update development version to v2.250.1.dev0 * feat: support pipeline versioning (#5248) Co-authored-by: Namrata Madan <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> * add sleep for model deployment (#5260) * fix: dockerfile stuck on interactive shell (#5261) * GPT OSS Hotfix (#5263) * changes for gpt_oss jobs support * added unit tests * fixing unit test * prepare release v2.251.0 * update development version to v2.251.1.dev0 * chore: onboard tei 1.8.0 (#5265) * chore: onboard tei 1.8.0 * chore: fix tei tests * feature: Greenland support for SagemakerTraining jobs (#1737) * feature: Greenland support for SagemakerTraining jobs * Added docstrings, tests to show sample filters for list_jobs * Fixed linting error * Addressed comments on rev1 * Removed unused attributes: retry_config and role_arn * Fixes identified during end to end test * Removed unused import * prepare release v2.251.1 * update development version to v2.251.2.dev0 * latest tgi (#5255) * latest tgi * add optimum-neuron tgi --------- Co-authored-by: sage-maker <[email protected]> * Feature/js mlops telemetry (#5268) * removed log statement * added telemetry for js and mlops * added for js estimator * fixed unit tests --------- Co-authored-by: Mohamed Zeidan <[email protected]> * feature: add eval custom lambda arn to hyperparameters (#5272) * fix: add retryable option to emr step in SageMaker Pipelines (#5281) * Add nova custom lambda in hyperparameter from estimator (#5282) * Add nova custom lambda in hyperparameter from estimator * Add nova custom lambda in hyperparameter from estimator * feat: change S3 endpoint env name (#5264) * fix: handle trial component status message longer than API supports (#5276) * merge rba without the iso region changes (#5290) * change: update image_uri_configs 08-28-2025 07:18:37 PST * change: update image_uri_configs 09-03-2025 07:18:37 PST * change: update image_uri_configs 09-05-2025 07:18:30 PST * change: update jumpstart region_config 09-17-2025 07:18:39 PST * Revert "change: update image_uri_configs 08-28-2025 07:18:37 PST" This reverts commit 96ea39db00c36050cc5478bd13f14e8c5f9347db. --------- Co-authored-by: sagemaker-bot <[email protected]> Co-authored-by: Eli Davidson <[email protected]> * Remove tags field from greenland job submitter (#1738) * Remove tags field from greenland job submitter * Update tests * allow is_production to be passed in * Add response in message --------- Co-authored-by: JieShen Ong <[email protected]> * prepare release v2.252.0 * update development version to v2.252.1.dev0 * feature: add model_type hyperparameter support for Nova recipes (#5291) Co-authored-by: xibei chen <[email protected]> * Fix flaky integ test (#5294) Co-authored-by: pintaoz <[email protected]> * fix: djl regions fixes #5273 (#5277) * test: adds unit test for djl lmi regions * test: adds regions in which djl images do not exist * fix: adds djl missing regions * fix: linting * docs: update contributing to add linting section --------- Co-authored-by: pintaoz-aws <[email protected]> * Adding default identity implementations to InferenceSpec (#5278) Co-authored-by: pintaoz-aws <[email protected]> * feature: Added condition to allow eval recipe. (#5298) * feature: Added condition to allow eval recipe. * change: renamed is_nova_recipe to is_nova_or_eval_recipe * chore: domain support for eu-isoe-west-1 (#5292) * Add numpy 2.0 support (#5199) * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * update tensorflow artifacts * update tensorflow artifacts * update tensorflow artifacts * testfile codestyle fixes * testfile codestyle fixes * update SKLearn image URI config * update SKLearn image URI config * docstyle fixes * docstyle fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes --------- Co-authored-by: Roja Reddy Sareddy <[email protected]> Co-authored-by: parknate@ <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> * Fix for a failed slow test: numpy fix (#5304) * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * update tensorflow artifacts * update tensorflow artifacts * update tensorflow artifacts * testfile codestyle fixes * testfile codestyle fixes * update SKLearn image URI config * update SKLearn image URI config * docstyle fixes * docstyle fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fix for slow test * numpy fix for slow test * numpy fix for slow test * numpy fix for slow test --------- Co-authored-by: Roja Reddy Sareddy <[email protected]> Co-authored-by: parknate@ <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> * Revert "Merge branch 'master-greenland' into master" This reverts commit 0ffec8a63af96c35f10663cd60832a807c0f6e16, reversing changes made to 6414203828e8c32bcae868b9ae18c172e8aedf38. * prepare release v2.253.0 * update development version to v2.253.1.dev0 * add TEI 1.8.2 (#5305) * add TEI 1.8.2 * add test * [hf-tei] add image uri to utils (#5287) * tei * tests --------- Co-authored-by: pintaoz-aws <[email protected]> Co-authored-by: Molly He <[email protected]> * Revert the change "Add Numpy 2.0 support" (#5307) * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * update tensorflow artifacts * update tensorflow artifacts * update tensorflow artifacts * testfile codestyle fixes * testfile codestyle fixes * update SKLearn image URI config * update SKLearn image URI config * docstyle fixes * docstyle fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fix for slow test * numpy fix for slow test * numpy fix for slow test * numpy fix for slow test * Revert 'Add numpy 2.0 support' * Revert 'Add numpy 2.0 support' * Revert 'Add numpy 2.0 support' --------- Co-authored-by: Roja Reddy Sareddy <[email protected]> Co-authored-by: parknate@ <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> * Update instance type regex to also include hyphens (#5308) * Revert "Merge branch 'master-greenland' into master" (#1747) This reverts commit 0ffec8a63af96c35f10663cd60832a807c0f6e16, reversing changes made to 6414203828e8c32bcae868b9ae18c172e8aedf38. * prepare release v2.253.1 * update development version to v2.253.2.dev0 * [hf] HF Inference TGI (#5302) * image * tests --------- Co-authored-by: Gokul Anantha Narayanan <[email protected]> * [Hugging Face][Pytorch] Inference DLC 4.51.3 (#5271) * new image * Update src/sagemaker/image_uri_config/huggingface.json removed missing CPU image * add cpu back --------- Co-authored-by: Molly He <[email protected]> * add HF Optimum Neuron DLCs (#5309) * add image * inf on dlc * neuron tgi dlcs * fix test --------- Co-authored-by: Zhaoqi <[email protected]> * feat: Triton v25.09 DLC (#5314) * Add Numpy 2.0 support (#5311) * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * Fix incompatible_dependecies test * update tensorflow artifacts * update tensorflow artifacts * update tensorflow artifacts * testfile codestyle fixes * testfile codestyle fixes * update SKLearn image URI config * update SKLearn image URI config * docstyle fixes * docstyle fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fixes * numpy fix for slow test * numpy fix for slow test * numpy fix for slow test * numpy fix for slow test * Revert 'Add numpy 2.0 support' * Revert 'Add numpy 2.0 support' * Revert 'Add numpy 2.0 support' * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support * Add numpy 2.0 support --------- Co-authored-by: Roja Reddy Sareddy <[email protected]> Co-authored-by: parknate@ <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> * prepare release v2.254.0 * update development version to v2.254.1.dev0 * [hf] HF PT Training DLCs (#5301) * image * add py312 * fix * test fix * typo --------- Co-authored-by: Molly He <[email protected]> * fix: update get_execution_role to directly return the ExecutionRoleArn if it presents in the resource metadata file (#5315) Co-authored-by: Jun Lyu <[email protected]> * Updating RegisterModel step with new params (#1766) * Adding Model package registration field for registermodel step * Adding base model for register model step * Fixes for baseModel * Fix unit tests * Adding tests for model, fixing checkstyle * Modifying BaseModel to ContainerBaseModel * Fixes * Fixing checkstyle * fixing imports * fix for unit test * Fix unit tests * Keynote3 kandinsky (#1833) * feature: Add support for SFT recipes - Add unit tests * Rename model_type and recipe * Feat: Add support for LLMFT in ModelTrainer and add unit tests * Upload verl recipe to S3 along with llmft * Update llmft check to reflect the new recipe structure --------- Co-authored-by: Ankita Agarwal <[email protected]> Co-authored-by: Ankita Agarwal <[email protected]> Co-authored-by: appari <[email protected]> * zimmer merged to master-v2 * numpy fix * fixed numpy version * more merge conflict * more merge conflicts * estimator fix * requirements conflict * pyproj fix * smcore<2.0.0 * Restrict sagemaker-core version to less than 2.0.0 (#1917) * malav changes w smcore<2.0.0 --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: pintaoz-aws <[email protected]> Co-authored-by: pintaoz <[email protected]> Co-authored-by: parknate@ <[email protected]> Co-authored-by: timkuo-amazon <[email protected]> Co-authored-by: Zhaoqi <[email protected]> Co-authored-by: ci <ci> Co-authored-by: sagemaker-bot <[email protected]> Co-authored-by: IshaChid76 <[email protected]> Co-authored-by: Isha Chidrawar <[email protected]> Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: malavhs <[email protected]> Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: cj-zhang <[email protected]> Co-authored-by: Joseph Zhang <[email protected]> Co-authored-by: rsareddy0329 <[email protected]> Co-authored-by: Roja Reddy Sareddy <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Rohan Narayan <[email protected]> Co-authored-by: varunmoris <[email protected]> Co-authored-by: Gary Wang <[email protected]> Co-authored-by: Bruno Pistone <[email protected]> Co-authored-by: Rohan Gujarathi <[email protected]> Co-authored-by: Rohan Gujarathi <[email protected]> Co-authored-by: Julian Grimm <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> Co-authored-by: rrrkharse <[email protected]> Co-authored-by: evakravi <[email protected]> Co-authored-by: Victor Zhu <[email protected]> Co-authored-by: ruiliann666 <[email protected]> Co-authored-by: Namrata Madan <[email protected]> Co-authored-by: Namrata Madan <[email protected]> Co-authored-by: jkasiraj <[email protected]> Co-authored-by: Brock Wade <[email protected]> Co-authored-by: Brock Wade <[email protected]> Co-authored-by: Pravali Uppugunduri <[email protected]> Co-authored-by: Molly He <[email protected]> Co-authored-by: Pravali Uppugunduri <[email protected]> Co-authored-by: Aditi Sharma <[email protected]> Co-authored-by: adishaa <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Roman A <[email protected]> Co-authored-by: David Tippett <[email protected]> Co-authored-by: Prateek M Desai <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: pagezyhf <[email protected]> Co-authored-by: zicanl-amazon <[email protected]> Co-authored-by: DemyCode <[email protected]> Co-authored-by: haozhx23 <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Mohan Kishore <[email protected]> Co-authored-by: Mohan Kishore <[email protected]> Co-authored-by: Will Childs-Klein <[email protected]> Co-authored-by: Chad Chiang <[email protected]> Co-authored-by: Chad Chiang <[email protected]> Co-authored-by: uyoldas <[email protected]> Co-authored-by: Uemit Yoldas <[email protected]> Co-authored-by: Sirut Buasai <[email protected]> Co-authored-by: Tritin Truong <[email protected]> Co-authored-by: Jiali Xing <[email protected]> Co-authored-by: Jiali Xing <[email protected]> Co-authored-by: papriwal <[email protected]> Co-authored-by: Greg Katkov <[email protected]> Co-authored-by: haoxinwa <[email protected]> Co-authored-by: JennaZhao <[email protected]> Co-authored-by: Jessica Zhu <[email protected]> Co-authored-by: David Lindskog <[email protected]> Co-authored-by: Greg Katkov <[email protected]> Co-authored-by: adtian2 <[email protected]> Co-authored-by: Kamalakannan Hari Krishna Moorthy <[email protected]> Co-authored-by: Mohamed Zeidan <[email protected]> Co-authored-by: Tim Tang <[email protected]> Co-authored-by: Timothy Wu <[email protected]> Co-authored-by: Cuong Vu <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Eli Davidson <[email protected]> Co-authored-by: Eli Davidson <[email protected]> Co-authored-by: Jie Shen Ong <[email protected]> Co-authored-by: JieShen Ong <[email protected]> Co-authored-by: sylvie7788 <[email protected]> Co-authored-by: xibei chen <[email protected]> Co-authored-by: Malte Reimann <[email protected]> Co-authored-by: aviruthen <[email protected]> Co-authored-by: chiragvp-aws <[email protected]> Co-authored-by: Gokul A <[email protected]> Co-authored-by: Zhaoqi <[email protected]> Co-authored-by: Andrew Song <[email protected]> Co-authored-by: JunLyu <[email protected]> Co-authored-by: Jun Lyu <[email protected]> Co-authored-by: Madhubalasri-B <[email protected]> Co-authored-by: CHANG-NING TSAI <[email protected]> Co-authored-by: Ankita Agarwal <[email protected]> Co-authored-by: Ankita Agarwal <[email protected]> Co-authored-by: appari <[email protected]>
1 parent 5d3f175 commit 1ea9944

File tree

35 files changed

+1543
-38
lines changed

35 files changed

+1543
-38
lines changed

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.254.2.dev0
1+
2.254.2.dev0

requirements/extras/test_requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,4 +62,4 @@ mypy-boto3-s3==1.35.76
6262
mypy-extensions==1.0.0
6363
mypy==1.9.0
6464
# apache-airflow transitive dependancy
65-
google-re2<1.1.20250805; python_version < "3.10"
65+
google-re2<1.1.20250805; python_version < "3.10"

src/sagemaker/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
from sagemaker.analytics import TrainingJobAnalytics, HyperparameterTuningJobAnalytics # noqa: F401
4949
from sagemaker.local.local_session import LocalSession # noqa: F401
5050

51+
from sagemaker.container_base_model import ContainerBaseModel # noqa: F401
5152
from sagemaker.model import Model, ModelPackage # noqa: F401
5253
from sagemaker.model_metrics import ModelMetrics, MetricsSource, FileSource # noqa: F401
5354
from sagemaker.pipeline import PipelineModel # noqa: F401

src/sagemaker/chainer/model.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
from typing import Callable, Optional, Union, List, Dict
1818

1919
import sagemaker
20-
from sagemaker import image_uris, ModelMetrics
20+
from sagemaker import image_uris, ModelMetrics, ContainerBaseModel
2121
from sagemaker.drift_check_baselines import DriftCheckBaselines
2222
from sagemaker.fw_utils import (
2323
model_code_key_prefix,
@@ -182,6 +182,8 @@ def register(
182182
source_uri: Optional[Union[str, PipelineVariable]] = None,
183183
model_card: Optional[Union[ModelPackageModelCard, ModelCard]] = None,
184184
model_life_cycle: Optional[ModelLifeCycle] = None,
185+
model_package_registration_type: Optional[Union[str, PipelineVariable]] = None,
186+
base_model: Optional[ContainerBaseModel] = None,
185187
):
186188
"""Creates a model package for creating SageMaker models or listing on Marketplace.
187189
@@ -236,6 +238,9 @@ def register(
236238
model_card (ModeCard or ModelPackageModelCard): document contains qualitative and
237239
quantitative information about a model (default: None).
238240
model_life_cycle (ModelLifeCycle): ModelLifeCycle object (default: None).
241+
model_package_registration_type (str or PipelineVariable): Model Package Registration
242+
Type (default: None).
243+
base_model (ContainerBaseModel): ContainerBaseModel object (default: None).
239244
240245
Returns:
241246
str: A string of SageMaker Model Package ARN.
@@ -278,6 +283,8 @@ def register(
278283
source_uri=source_uri,
279284
model_card=model_card,
280285
model_life_cycle=model_life_cycle,
286+
model_package_registration_type=model_package_registration_type,
287+
base_model=base_model,
281288
)
282289

283290
def prepare_container_def(
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"). You
4+
# may not use this file except in compliance with the License. A copy of
5+
# the License is located at
6+
#
7+
# http://aws.amazon.com/apache2.0/
8+
#
9+
# or in the "license" file accompanying this file. This file is
10+
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
11+
# ANY KIND, either express oXr implied. See the License for the specific
12+
# language governing permissions and limitations under the License.
13+
"""This file contains code related to base model for containers."""
14+
from __future__ import absolute_import
15+
16+
from typing import Optional, Union
17+
18+
from sagemaker.workflow.entities import PipelineVariable
19+
20+
21+
class ContainerBaseModel(object):
22+
"""Accepts Base Model parameters for conversion to request dict."""
23+
24+
def __init__(
25+
self,
26+
hub_content_name: Union[str, PipelineVariable] = None,
27+
hub_content_version: Optional[Union[str, PipelineVariable]] = None,
28+
recipe_name: Optional[Union[str, PipelineVariable]] = None,
29+
):
30+
"""Initialize a ``ContainerBaseModel`` instance and turn parameters into dict.
31+
32+
Args:
33+
hub_content_name (str or PipelineVariable): The hub content name
34+
hub_content_version (str or PipelineVariable): The hub content version
35+
(default: None)
36+
recipe_name (str or PipelineVariable): The Recipe name
37+
(default: None)
38+
"""
39+
self.hub_content_name = hub_content_name
40+
self.hub_content_version = hub_content_version
41+
self.recipe_name = recipe_name
42+
43+
def _to_request_dict(self):
44+
"""Generates a request dictionary using the parameters provided to the class."""
45+
base_model_request = {}
46+
if self.hub_content_name is not None:
47+
base_model_request["HubContentName"] = self.hub_content_name
48+
if self.hub_content_version is not None:
49+
base_model_request["HubContentVersion"] = self.hub_content_version
50+
if self.recipe_name is not None:
51+
base_model_request["RecipeName"] = self.recipe_name
52+
return base_model_request

src/sagemaker/estimator.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1817,6 +1817,8 @@ def register(
18171817
source_uri=None,
18181818
model_life_cycle=None,
18191819
model_card=None,
1820+
model_package_registration_type=None,
1821+
base_model=None,
18201822
**kwargs,
18211823
):
18221824
"""Creates a model package for creating SageMaker models or listing on Marketplace.
@@ -1868,6 +1870,9 @@ def register(
18681870
model_card (ModeCard or ModelPackageModelCard): document contains qualitative and
18691871
quantitative information about a model (default: None).
18701872
model_life_cycle (ModelLifeCycle): ModelLifeCycle object (default: None).
1873+
model_package_registration_type (str): Model Package Registration
1874+
Type (default: None).
1875+
base_model (ContainerBaseModel): ContainerBaseModel object (default: None).
18711876
**kwargs: Passed to invocation of ``create_model()``. Implementations may customize
18721877
``create_model()`` to accept ``**kwargs`` to customize model creation during
18731878
deploy. For more, see the implementation docs.
@@ -1924,6 +1929,8 @@ def register(
19241929
source_uri=source_uri,
19251930
model_card=model_card,
19261931
model_life_cycle=model_life_cycle,
1932+
model_package_registration_type=model_package_registration_type,
1933+
base_model=base_model,
19271934
)
19281935

19291936
@property

src/sagemaker/huggingface/model.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
from typing import Callable, Optional, Union, List, Dict
1818

1919
import sagemaker
20-
from sagemaker import image_uris, ModelMetrics
20+
from sagemaker import image_uris, ModelMetrics, ContainerBaseModel
2121
from sagemaker.deserializers import JSONDeserializer
2222
from sagemaker.drift_check_baselines import DriftCheckBaselines
2323
from sagemaker.fw_utils import (
@@ -372,6 +372,8 @@ def register(
372372
source_uri: Optional[Union[str, PipelineVariable]] = None,
373373
model_life_cycle: Optional[ModelLifeCycle] = None,
374374
model_card: Optional[Union[ModelPackageModelCard, ModelCard]] = None,
375+
model_package_registration_type: Optional[Union[str, PipelineVariable]] = None,
376+
base_model: Optional[ContainerBaseModel] = None,
375377
):
376378
"""Creates a model package for creating SageMaker models or listing on Marketplace.
377379
@@ -427,6 +429,9 @@ def register(
427429
model_card (ModeCard or ModelPackageModelCard): document contains qualitative and
428430
quantitative information about a model (default: None).
429431
model_life_cycle (ModelLifeCycle): ModelLifeCycle object (default: None).
432+
model_package_registration_type (str or PipelineVariable): Model Package Registration
433+
Type (default: None).
434+
base_model (ContainerBaseModel): ContainerBaseModel object (default: None).
430435
431436
Returns:
432437
A `sagemaker.model.ModelPackage` instance.
@@ -477,6 +482,8 @@ def register(
477482
source_uri=source_uri,
478483
model_life_cycle=model_life_cycle,
479484
model_card=model_card,
485+
model_package_registration_type=model_package_registration_type,
486+
base_model=base_model,
480487
)
481488

482489
def prepare_container_def(

src/sagemaker/model.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@
5252
from sagemaker.model_card.helpers import _hash_content_str
5353
from sagemaker.model_card.schema_constraints import ModelApprovalStatusEnum
5454
from sagemaker.session import Session
55+
from sagemaker.container_base_model import ContainerBaseModel
5556
from sagemaker.model_metrics import ModelMetrics
5657
from sagemaker.drift_check_baselines import DriftCheckBaselines
5758
from sagemaker.explainer import ExplainerConfig
@@ -477,6 +478,8 @@ def register(
477478
model_life_cycle: Optional[ModelLifeCycle] = None,
478479
accept_eula: Optional[bool] = None,
479480
model_type: Optional[JumpStartModelType] = None,
481+
model_package_registration_type: Optional[Union[str, PipelineVariable]] = None,
482+
base_model: Optional[ContainerBaseModel] = None,
480483
):
481484
"""Creates a model package for creating SageMaker models or listing on Marketplace.
482485
@@ -531,6 +534,9 @@ def register(
531534
model_card (ModeCard or ModelPackageModelCard): document contains qualitative and
532535
quantitative information about a model (default: None).
533536
model_life_cycle (ModelLifeCycle): ModelLifeCycle object (default: None).
537+
model_package_registration_type (str or PipelineVariable): Model Package Registration
538+
Type (default: None).
539+
base_model (ContainerBaseModel): ContainerBaseModel object (default: None).
534540
535541
Returns:
536542
A `sagemaker.model.ModelPackage` instance or pipeline step arguments
@@ -578,6 +584,9 @@ def register(
578584
if self.model_data is not None:
579585
container_def["ModelDataUrl"] = self.model_data
580586

587+
if base_model is not None and hasattr(base_model, "_to_request_dict"):
588+
container_def["BaseModel"] = base_model._to_request_dict()
589+
581590
model_pkg_args = sagemaker.get_model_package_args(
582591
self.content_types,
583592
self.response_types,
@@ -601,6 +610,8 @@ def register(
601610
source_uri=source_uri,
602611
model_card=model_card,
603612
model_life_cycle=model_life_cycle,
613+
model_package_registration_type=model_package_registration_type,
614+
base_model=base_model,
604615
)
605616
model_package = self.sagemaker_session.create_model_package_from_containers(
606617
**model_pkg_args
@@ -2150,6 +2161,7 @@ def __init__(
21502161
You can find additional parameters for initializing this class at
21512162
:class:`~sagemaker.model.Model`.
21522163
"""
2164+
21532165
super(FrameworkModel, self).__init__(
21542166
image_uri,
21552167
model_data,

src/sagemaker/modules/train/model_trainer.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@
108108
_get_args_from_recipe,
109109
_determine_device_type,
110110
_is_nova_recipe,
111+
_is_llmft_recipe,
111112
_load_base_recipe,
112113
)
113114

@@ -252,6 +253,7 @@ class ModelTrainer(BaseModel):
252253
_metric_definitions: Optional[List[MetricDefinition]] = PrivateAttr(default=None)
253254

254255
_is_nova_recipe: Optional[bool] = PrivateAttr(default=None)
256+
_is_llmft_recipe: Optional[bool] = PrivateAttr(default=None)
255257
_temp_recipe_train_dir: Optional[TemporaryDirectory] = PrivateAttr(default=None)
256258
_temp_code_dir: Optional[TemporaryDirectory] = PrivateAttr(default=None)
257259

@@ -632,12 +634,13 @@ def _create_training_job_args(
632634

633635
final_input_data_config = list(existing_channels.values()) + new_channels
634636

635-
if self._is_nova_recipe:
637+
if self._is_nova_recipe or self._is_llmft_recipe:
638+
636639
for input_data in final_input_data_config:
637640
if input_data.channel_name == SM_RECIPE:
638641
raise ValueError(
639642
"Cannot use reserved channel name 'recipe' as an input channel name "
640-
" for Nova Recipe"
643+
" for Nova or LLMFT Recipe"
641644
)
642645
recipe_file_path = os.path.join(self._temp_recipe_train_dir.name, SM_RECIPE_YAML)
643646
recipe_channel = self.create_input_data_channel(
@@ -646,7 +649,10 @@ def _create_training_job_args(
646649
key_prefix=input_data_key_prefix,
647650
)
648651
final_input_data_config.append(recipe_channel)
649-
self.hyperparameters.update({"sagemaker_recipe_local_path": SM_RECIPE_CONTAINER_PATH})
652+
if self._is_nova_recipe:
653+
self.hyperparameters.update(
654+
{"sagemaker_recipe_local_path": SM_RECIPE_CONTAINER_PATH}
655+
)
650656

651657
if final_input_data_config:
652658
final_input_data_config = self._get_input_data_config(
@@ -1201,14 +1207,15 @@ def from_recipe(
12011207
training_recipe=training_recipe, recipe_overrides=recipe_overrides
12021208
)
12031209
is_nova = _is_nova_recipe(recipe=recipe)
1210+
is_llmft = _is_llmft_recipe(recipe=recipe)
12041211

1205-
if device_type == "cpu" and not is_nova:
1212+
if device_type == "cpu" and not (is_nova or is_llmft):
12061213
raise ValueError(
12071214
"Training recipe is not supported for CPU instances. "
12081215
+ "Please provide a GPU or Tranium instance type."
12091216
)
1210-
if training_image is None and is_nova:
1211-
raise ValueError("training_image must be provided when using recipe for Nova.")
1217+
if training_image is None and (is_nova or is_llmft):
1218+
raise ValueError("training_image must be provided when using recipe for Nova or LLMFT")
12121219

12131220
if training_image_config and training_image is None:
12141221
raise ValueError("training_image must be provided when using training_image_config.")
@@ -1238,7 +1245,7 @@ def from_recipe(
12381245
model_trainer_args["training_image"] = training_image
12391246
if hyperparameters and not is_nova:
12401247
logger.warning(
1241-
"Hyperparameters are not supported for general training recipes. "
1248+
"Hyperparameters are not supported for general and LLMFT training recipes. "
12421249
+ "Ignoring hyperparameters input."
12431250
)
12441251
if is_nova:
@@ -1264,6 +1271,7 @@ def from_recipe(
12641271
**model_trainer_args,
12651272
)
12661273
model_trainer._is_nova_recipe = is_nova
1274+
model_trainer._is_llmft_recipe = is_llmft
12671275
model_trainer._temp_recipe_train_dir = tmp_dir
12681276
return model_trainer
12691277

0 commit comments

Comments
 (0)