Skip to content

v0.6.0

Compare
Choose a tag to compare
@al-rigazzi al-rigazzi released this 18 Dec 21:07
· 99 commits to master since this release
9d97397

Released on 18 December, 2023

Description

  • Conflicting directives in the SmartSim packaging instructions were
    fixed
  • sacct and
    sstat errors are now fatal for
    Slurm-based workflow executions
  • Added documentation section about ML features and TorchScript
  • Added TorchScript functions to Online Analysis tutorial
  • Added multi-DB example to documentation
  • Improved test stability on HPC systems
  • Added support for producing & consuming telemetry outputs
  • Split tests into groups for parallel execution in CI/CD pipeline
  • Change signature of
    Experiment.summary()
  • Expose first_device parameter for scripts, functions, models
  • Added support for MINBATCHTIMEOUT in model execution
  • Remove support for RedisAI 1.2.5, use RedisAI 1.2.7 commit
  • Add support for multiple databases

Detailed Notes

  • Several conflicting directives between the
    setup.py and the
    setup.cfg were fixed to mitigate
    warnings issued when building the pip wheel.
    (SmartSim-PR435)
  • When the Slurm functions sacct and
    sstat returned an error, it would be
    ignored and SmartSim's state could become inconsistent. To prevent
    this, errors raised by sacct or
    sstat now result in an exception.
    (SmartSim-PR392)
  • A section named ML Features was added to documentation. It
    contains multiple examples of how ML models and functions can be
    added to and executed on the DB. TorchScript-based post-processing
    was added to the Online Analysis tutorial
    (SmartSim-PR411)
  • An example of how to use multiple Orchestrators concurrently was
    added to the documentation
    (SmartSim-PR409)
  • The test infrastructure was improved. Tests on HPC system are now
    stable, and issues such as non-stopped
    Orchestrators or experiments created
    in the wrong paths have been fixed
    (SmartSim-PR381)
  • A telemetry monitor was added to check updates and produce events
    for SmartDashboard
    (SmartSim-PR426)
  • Split tests into group_a,
    group_b,
    slow_tests for parallel execution in
    CI/CD pipeline
    (SmartSim-PR417,
    SmartSim-PR424)
  • Change format argument to
    style in
    Experiment.summary(), this is an API
    break
    (SmartSim-PR391)
  • Added support for first_device parameter for scripts, functions, and
    models. This causes them to be loaded to the first num_devices
    beginning with first_device
    (SmartSim-PR394)
  • Added support for MINBATCHTIMEOUT in model execution, which caps the
    delay waiting for a minimium number of model execution operations to
    accumulate before executing them as a batch
    (SmartSim-PR387)
  • RedisAI 1.2.5 is not supported anymore. The only RedisAI version is
    now 1.2.7. Since the officially released RedisAI 1.2.7 has a bug
    which breaks the build process on Mac OSX, it was decided to use
    commit
    634916c
    from RedisAI's GitHub repository, where such bug has been fixed.
    This applies to all operating systems.
    (SmartSim-PR383)
  • Add support for creation of multiple databases with unique
    identifiers.
    (SmartSim-PR342)