v1.7.0 release
Breaking Changes
- Make scheduler-plugins the default gang scheduler. #1747 (Syulin7)
- Upgrade the kubernetes dependencies to v1.27 #1834 (tenzen-y)
New features
- Make scheduler-plugins the default gang scheduler. #1747 (Syulin7)
- Merge kubeflow/common to training-operator #1813 (johnugeorge)
- Auto-generate RBAC manifests by the controller-gen #1815 (Syulin7)
- Implement suspend semantics #1859 (tenzen-y)
- Set up controllers using goroutines to start the manager quickly #1869 (tenzen-y)
- Set correct ENV for PytorchJob to support torchrun #1840 (kuizhiqing)
Bug fixes
- Fix a bug that XGBoostJob's running condition isn't updated when the job is resumed #1866 (tenzen-y)
- Set a Running condition when the XGBoostJob is completed and doesn't have a Running condition #1789 (tenzen-y)
- Avoid to depend on local env when installing the code-generators #1810 (tenzen-y)
Misc
- Removing reconciler code #1879 (johnugeorge)
- Make Condition and ReplicaStatus optional #1862 (tenzen-y)
- Use the same reasons for Condition and Event #1854 (tenzen-y)
- Fully consolidate tfjob-operator to training-operator #1850 (tenzen-y)
- Clean up /pkg/common/util/v1 #1845 (tenzen-y)
- Refactoring tests in common/controller.v1 #1843 (tenzen-y)
- remove duplicate code of add task spec annotation #1839 (lowang-bh)
- fetch volcano log when e2e failed #1837 (lowang-bh)
- Add check pods are not scheduled when testing gang-scheduler integrations in e2e #1835 (tenzen-y)
- Replace dummy client with fake client #1818 (tenzen-y)
- Add default Intel MPI env variables to MPIJob #1804 (tkatila)
- Improve E2E tests for the gang-scheduling #1801 (tenzen-y)
- xgb yaml container name should be consistent with xgb job default container name #1794 (Crisescode)
- make timeout configurable from e2e tests #1787 (nagar-ajay)