v0.6.0
Released on 18 December, 2023
Description
- Conflicting directives in the SmartSim packaging instructions were
fixed - sacct and
sstat errors are now fatal for
Slurm-based workflow executions - Added documentation section about ML features and TorchScript
- Added TorchScript functions to Online Analysis tutorial
- Added multi-DB example to documentation
- Improved test stability on HPC systems
- Added support for producing & consuming telemetry outputs
- Split tests into groups for parallel execution in CI/CD pipeline
- Change signature of
Experiment.summary() - Expose first_device parameter for scripts, functions, models
- Added support for MINBATCHTIMEOUT in model execution
- Remove support for RedisAI 1.2.5, use RedisAI 1.2.7 commit
- Add support for multiple databases
Detailed Notes
- Several conflicting directives between the
setup.py and the
setup.cfg were fixed to mitigate
warnings issued when building the pip wheel.
(SmartSim-PR435) - When the Slurm functions sacct and
sstat returned an error, it would be
ignored and SmartSim's state could become inconsistent. To prevent
this, errors raised by sacct or
sstat now result in an exception.
(SmartSim-PR392) - A section named ML Features was added to documentation. It
contains multiple examples of how ML models and functions can be
added to and executed on the DB. TorchScript-based post-processing
was added to the Online Analysis tutorial
(SmartSim-PR411) - An example of how to use multiple Orchestrators concurrently was
added to the documentation
(SmartSim-PR409) - The test infrastructure was improved. Tests on HPC system are now
stable, and issues such as non-stopped
Orchestrators or experiments created
in the wrong paths have been fixed
(SmartSim-PR381) - A telemetry monitor was added to check updates and produce events
for SmartDashboard
(SmartSim-PR426) - Split tests into group_a,
group_b,
slow_tests for parallel execution in
CI/CD pipeline
(SmartSim-PR417,
SmartSim-PR424) - Change format argument to
style in
Experiment.summary(), this is an API
break
(SmartSim-PR391) - Added support for first_device parameter for scripts, functions, and
models. This causes them to be loaded to the first num_devices
beginning with first_device
(SmartSim-PR394) - Added support for MINBATCHTIMEOUT in model execution, which caps the
delay waiting for a minimium number of model execution operations to
accumulate before executing them as a batch
(SmartSim-PR387) - RedisAI 1.2.5 is not supported anymore. The only RedisAI version is
now 1.2.7. Since the officially released RedisAI 1.2.7 has a bug
which breaks the build process on Mac OSX, it was decided to use
commit
634916c
from RedisAI's GitHub repository, where such bug has been fixed.
This applies to all operating systems.
(SmartSim-PR383) - Add support for creation of multiple databases with unique
identifiers.
(SmartSim-PR342)