Releases: DataBiosphere/toil
8.0.0
Highlighted Features Added
toil debug-job
now has--retrieveTaskDirectory <dir>
which will set up a job's downloaded files under<dir>
and try to stop the job after doing the downloads. Jobs can callself.files_downloaded_hook()
to provide a stopping point for this mode. (#4815)toil debug-job
can now reconstruct the inside-the-container environment for CWL and WDL tasks. (#4815)- Added support for caching on Slurm and other HPC schedulers (#4775)
- Replace all instances of boto2 with boto3 for all Toil AWS code (#4718)
- Add support for Python 3.12 (#4718)
- Add support for Python 3.13 (#5145)
- Ceph input/output errors from file locking functions are now tolerated. (#4874)
- Toil now uses
flock
to enable directory locks to work properly (#4924) - Added support to get Slurm partitions and automatically send jobs to GPUs on Slurm (#4833) (supports both CWL and WDL)
- New
--symlinkJobStoreReads=False
option lets you force local-node copies (possibly in the cache) even when reading directly from a FileJobStore is possible, potentially reducing shared filesystem IO. (#4673) - Toil now supports reading and writing MiniWDL's call cache. (#4797)
- Toil now supports running CWL and WDL workflows from Dockstore, by using either a Dockstore page URL or TRS ID as the URL/filename of the workflow to run. Since these often contain
?
or#
, remember to quote them on the command line! (#5049) - Add support for parallel file imports (#5114)
- New argument
--importWorkersThreshold
. This specifies the threshold where files will begin to be imported on individual jobs. Small files will be batched into the same import job up to this threshold. --importWorkersDisk
defaults to 1 MiB. Should be increased when download streaming is not possible on a worker.
- New argument
Breaking Changes
CWL
- Prevent simultaneous Singularity container pulls in
toil-cwl-runner
(#4990) - Added support to import files on workers for toil-cwl-runner (#5025)
--runImportsOnWorkers
to enable importing files on workers--importWorkersDisk
to control how much disk space the import worker will use
- Don't error when passing through input as the output (#5138)
- CWL jobs with dynamic requirements now have input type checking properly protected by their conditionals. (#4930)
- Fixed a LoadListing bug with CWL workflows (#5149)
- Fix CWL Workflow Slurm memory test (#5151)
- workDir and jobStore defaults to tmp-outdir-prefix (#5154)
- CWL container prepull: no reason to if extensions are enabled, they are now supported by cwl-utils 0.36+ (#5188)
- CWL container prepull: skip if --no-containers is specified (#5188)
WDL
- Update WDL conformance tests on CI (#4875)
- Added support to run task only WDL files (#4960)
- Added support for the gpu field in WDL (#4949)
- Support passing inputs into
toil-wdl-runner
for task only WDLs (#4977) toil-wdl-runner
will now carry through task exit codes (#4978)toil-wdl-runner
will respect explicit null values for optional inputs (#4981)- toil-wdl-runner will not immediately error on nonexistent coerced files until outputted (#4994)
- File? type for string to file coercion is now supported (will be nullified)
- WDL output files will now live in directories named after their tasks instead of UUID directories (#5008)
- Fixed a bug with conditional statements inside a WDL scatter (#5055)
toil-wdl-runner
now correctly finds and returns outputs from tasks in scatters and conditionals when a WDL workflow lacks anoutput
section. (#5094)toil-wdl-runner
has a new--allCallOutputs
option to allow including all calls' outputs in a workflow's output. (#5093)toil-wdl-runner
can now detect and try not to delete the outputs of a workflow that is meant to use the Cromwell Output Organizer (croo
). Note thatcroo
still can't actually work on the output oftoil-wdl-runner
. (#5093)- --allCallOutputs no longer discards WDL workflow outputs section outputs. (#5106)
- File virtualization in toil-wdl-runner now only happens at task boundaries (#5028)
- File to String coercion should be supported
- Added support to import files on workers for toil-wdl-runner (#5103)
- Support WDL 1.1 disk specification as per spec (#5001)
- Fixed a bug with WDL file imports (#5121)
Kubernetes
Dependencies
- Toil can now use connexion 4 (#5196)
- Toil now uses htcondor 23.6 or 24, which are still on PyPI
Misc
- Makefile: use isolated builds, add dist target (sdist+wheel) and deprecate the sdist target. (#4820) (#4826)
- Toil will now wait
--jobStoreTimeout
seconds (default: 30) to see an update to/removal of a job that was run, and will not let the job succeed unless it is seen to make progress. (#3814) - Toil job descriptions no longer have a
command
field, and we track the link to the job body and the command to invoke the Toil worker separately. (#4811) - Several typos in the docs were fixed (#4889)
- Add a test to ensure batchsystem plugins are installable (#4879)
- Fix Toil utils to work without the AWS extra (#4953)
- Print commit hash with
toil --version
when installed from source. Before:7.1.0a1
. After:7.1.0a1-ccf57e6071e32675daabdcbacb91988e871745a9
(#4954) - Fixed a broken URL and an omitted variable in CI tests (#4974)
- Generate default config correctly (#5014)
- Use the latest setuptools when running cactus. (#5017)
- Toil will refuse to proceed if it detects that its coordination directory or a Singularity cache directory it needs to lock is on Ceph, to prevent hanging the Ceph MDS (#4972)
- Fix a NotImplementedError in the Grid Engine batchsystem (#5061)
- Added basic Grid Engine CI tests
- Update Cactus on CI to 2.9.0 (#5062)
- Separate out create/delete iam role functions into lib.
- Remove deprecated pipes module (#5122)
- New
--slurmTime
/TOIL_SLURM_TIME
setting to set the time limit on Slurm jobs in a way Toil itself understands. (#5010) - New
--slurmPE
argument to allow setting a parallel-job Slurm partition without usingTOIL_SLURM_PE
(#5010) - New
--slurmArgs
argument to allow specifying extra Slurm submission arguments without usingTOIL_SLURM_ARGS
(#5010) - For non-GPU jobs on Slurm, Toil will submit the job to a partition with a time limit long enough to accommodate the configured runtime (from
--slurmTime
). (For GPU jobs, the lowest-priority GPU partition is still always used.) (#5010) - Toil now has a
--slurmDefaultAllMem
option to run jobs lacking their own memory requirements with Slurm's--mem=0
, so they get a whole node's memory. (#4971) toil-cwl-runner
now has--no-cwl-default-ram
(and--cwl-default-ram
) to control whether the CWL spec's defaultramMin
is applied, or Toil's own default memory logic is used. (#4971)- The
--dont_allocate_mem
and--allocate_mem
options have been deprecated and replaced with--slurmAllocateMem
, which can beTrue
orFalse
. (#4971) - Added WDL unit tests to CI (#5110)
- Mesos build updated. (#5049)
- CWL and WDL argument parsing revised for Python 3.12. (#5049)
- Organize stats and logging files into
stats/inbox
andstats/archive
and avoid a circular rename. (#1727) - Added proper FTP support for jobstores (#5134)
- URL existence and size gets/checks are now done with HEAD requests (#5134)
- Dependabot configuration should now pass schema validation and is itself under CI (#5175)
- Toil now tests a version of Cactus that ought to run on Python 3.13. (#5184)
- WDL conformance tests on Kubernetes may now run for 30 minutes. (#5185)
- When importing files on workers, fall back to importing on the leader when file sizes are not obtainable (#5135)
Thank you to our contributors: @stxue1, @DailyDreaming, @adamnovak, @mr-c, @gmloose, @davidjsherman!
7.0.0
What's Changed
- Respect job local-ness when chaining by @adamnovak in #4809
- Fix Python 3.8 support by @adamnovak in #4823
- Fix missing description on PyPI by @mr-c in #4820
- Install build module for CI by @stxue1 in #4826
- Use a sentinel location instead of an unmodified location to mark missing files by @adamnovak in #4818
- Bump mypy from 1.8.0 to 1.9.0 by @dependabot in #4830
- Make sure output directory exists before using it by @adamnovak in #4832
- Pass through debugged job status code to prevent infinite loop by @stxue1 in #4829
- Add tests for environment pickling by @adamnovak in #4837
- Add colored logging by @stxue1 in #4828
- Remove unused CI test by @stxue1 in #4843
- Measure CPU and memory usage in WDL Docker containers by @adamnovak in #4819
- Allow debugging jobs by name (and status improvements) by @adamnovak in #4840
- Improve exception handling to not output tracebacks by @stxue1 in #4839
- Update pytest-cov requirement from <5,>=2.12.1 to >=2.12.1,<6 by @dependabot in #4851
- Update docutils requirement from <0.21,>=0.16 to >=0.16,<0.22 by @dependabot in #4866
- Update galaxy-util requirement from <23 to <25 by @dependabot in #4862
- Update galaxy-tool-util requirement from <23 to <25 by @dependabot in #4861
- Bump cwltool from 3.1.20240112164112 to 3.1.20240404144621 by @dependabot in #4870
- Bump gunicorn from 21.2.0 to 22.0.0 by @dependabot in #4871
- Retry Slurm interactions more by @adamnovak in #4869
- Replace use of boto with boto3 for
awsProvisioner.py
by @stxue1 in #4859 - Allow fetching job inputs for debugging by @adamnovak in #4848
- Make leader wait for expected updates to be visible in the job store, or fail the job by @adamnovak in #4811
- Enable FUSE for privileged Toil clusters by @stxue1 in #4824
- Detect if the GridEngine worker thread has crashed to prevent hanging the workflow by @stxue1 in #4873
- Bump mypy from 1.9.0 to 1.10.0 by @dependabot in #4878
- Support caching on SLURM by @stxue1 in #4884
- Add debug logging for single machine batchsystem to signal worker issue and startup by @stxue1 in #4881
- Update WDL conformance tests on CI by @stxue1 in #4876
- Replace all usage of boto2 with boto3 by @stxue1 in #4868
- Revert ensurepip to get-pip to fix Python 3.10 ARM CI appliance builds by @stxue1 in #4900
- docs cleanup by @mr-c in #4889
- Bump to a new major version by @adamnovak in #4885
- Warn user about wait times for stats gathering with a large quantity of jobs. by @DailyDreaming in #4893
- Allow symlinks to inputs as WDL outputs by @adamnovak in #4883
- bye pytz by @mr-c in #4890
- Stop suggesting infinity when validating half-open intervals by @adamnovak in #4887
- Fix WDL option spelling and tolerate Cromwell-isms by @adamnovak in #4906
- Remove wrapped CWL doc example. by @DailyDreaming in #4892
- Add retries to DockerCheckTest.testBadGoogleRepo by @stxue1 in #4909
- Fix 3.8 backport.timezone import by @stxue1 in #4908
- Update to Python 3.12 by @stxue1 in #4901
- Bump flask-cors from 4.0.0 to 4.0.1 by @dependabot in #4916
- Try /tmp before the workdir for the Toil coordination directory by @stxue1 in #4914
- CWL biocontainer tests: use version corresponding to v2 Docker Image Format by @mr-c in #4912
- Revert "Update to Python 3.12" by @DailyDreaming in #4917
- Bump miniwdl from 1.11.1 to 1.12.0 by @dependabot in #4920
- Support Python 3.12 by @stxue1 in #4919
- Add documentation for installing batch system plugins by @stxue1 in #4926
- Update Werkzeug to appease the Github security police by @adamnovak in #4925
- Revert "Update Werkzeug to appease the Github security police" by @DailyDreaming in #4928
- Bump cwltool from 3.1.20240404144621 to 3.1.20240508115724 by @dependabot in #4936
- Add batchsystem plugin test by @stxue1 in #4933
- Fix bad test paths. by @DailyDreaming in #4938
- Add better logic in finding a temp directory for the Toil coordination directory by @stxue1 in #4918
- Add supported workflow language versions to README by @adamnovak in #4923
6.1.0
Highlighted Features Added
- WDL and CWL task standard output and standard error logs that are not captured by the workflow will now be logged at INFO level and stored in the
--writeLogs
/--writeLogsGzip
directory. (#4657) - Use a default log limit of 100MiB (#4788)
Breaking Changes
- Stats and logging system again uses job display name (#4755)
--disableProgress
is once again a flag that doesn't take an argument (#4758)
CWL
- Don't clear out user-provided values for the --default-container option (#4730)
WDL
- WDL job names now include numbers for scatters (#4755)
- Multi-line WDL placeholder substitutions no longer interfere with de-indenting WDL command blocks (chanzuckerberg/miniwdl#665)
- Standard error for failed tasks is now always logged to the worker log somewhere (#4781)
Kubernetes
Dependencies
- Deps: removed the ruaml.yaml.string plugin dependency for a simpler solution (#4760)
Misc
- Toil will no longer warn about a missing XDG_RUNTIME_DIR (#4769)
- Read the Docs and CI docs builds should have Graphviz installed (pending CI image rebuild) (#4734)
- Add more Python3.12 compatibility by replacing the one function from distutils that we use,
strtobool()
. (#4765) - Set default cache folders to be accessible between toil-wdl-runner workflows (Same as MiniWDL/Singularity defaults) (#4761)
- Set toil-wdl-runner cache folders on Toil managed clusters to be at
/var/lib/toil
(#4761) - Fall back to assuming machine has 1 core when CPU count is unavailable. (#4545)
FileJobStore
now supports filenames that get modified when percent-encoded (#4779)
Thank you to our contributors:
@DailyDreaming @mr-c @stxue1 @adamnovak @app/dependabot
Full Changelog: releases/6.0.0...releases/6.1.0
6.0.0
NOTE!
We now have a config file! https://toil.readthedocs.io/en/latest/running/cliOptions.html#the-config-file
Breaking Changes
- Removed the parasol batch system
- Removed the TES batch system (this is now a plugin)
- Removed our WDL compiler in favor of an interpreter (we still support WDL, we just do it differently now)
- We no longer support python3.7
CWL
- Support CWL 1.2.1 (#4682)
- CWL Pipefish compatibility (#4636)
- Support per-task preemptibility in CWL (#4551)
- Fix configargparse in CWL (#4618)
- cwl: use the latest commit from the proposed CWL v1.2.1 branch (#4565)
- Upgrade cwltool to avoid broken galaxy-tool-util release. (#4639)
- Implement a better config file system for CWL/WDL options (#4666)
- Allow working with remote files in CWL and WDL workflows (#4690)
- Make cwl mutually exclusive groups exist only when cwl is not suppressed (#4725)
- Log more usefully for CWL workflows (#4736)
WDL
- Simplify WDL Toil job graphs (#4524)
- More WDL and Slurm documentation (#4558)
- Improve WDL documentation (#4732)
- Add String to File functionality into toil-wdl-runner (#4589)
- Run WDL output through Toil export system to support URIs (#4579)
- Allow the WDL output section to reference itself (#4592)
- Ensure sibling files in toil-wdl-runner (#4610)
- Make WDLOutputJob collect all task outputs (#4602)
- Report errors in WDL using MiniWDL's error location printer (#4637)
- Remove the WDL compiler. (#4679)
- Implement a better config file system for CWL/WDL options (#4666)
- Allow working with remote files in CWL and WDL workflows (#4690)
- Strip leading whitespace from WDL commands (#4720)
Misc
- Add config file support (#4569)
- Support Python3.11 and drop Python 3.7 (#4646)
- Move TES batch system to a plugin (#4650)
- Turn batch system tests back on (#4649)
- Separate out integration tests to run on a schedule (#4612)
- Avoid concurrent modification in cluster scaler tests (#4600)
- Remove old buckets from AWS (#4588)
- Tests: only request a single core (#4572)
- Reduce the number of assert statements (#4590)
- take any nvidia-smi exception as not having gpu (#4611)
- More resiliancy (#4395)
- Remove useage of the deprecated pkg_resources (#4701)
- Make sure cwltool always knows we have an outdir to fix #4698 (#4699)
- AWS jobStoreTest: re-use delete_s3_bucket from toil.lib.aws (#4700)
- Only count output file usage when using the file store (#4692)
- Remove the parasol batch system. (#4678)
- Move around reqs and move aws dev libraries to aws (#4664)
- Make sure the
--batchLogsDir
exists if it is set (#4635) - Update EC2 instances and EC2 update script. (#4745)
- remove extraneous dependency on old 'mock' (#4739)
- Point CI at the new public URLs for stuff we host
- Add init.py to options folder (#4723)
Bug Fixes
- Lower redirect log level to fix #4526 (#4578)
- Fix mypy from being broken by new boto types (#4577)
- Fix CI on local Gitlab runners (#4571)
- Banish ghost jobs (#4563)
- Stop deleting chained-to jobs which fail as orphaned jobs (#4557)
- Fix pickling error when jobstate file doesnt exist and fix threading error when lock file exists then disappears (#4575)
- Fix #3867 and try to explain but not crash when bad things happen to our mutex file (#4656)
- Fix CI Appliance Builds (#4655)
- Tolerate a failed AMI polling attempt (#4727)* Add pure Python fallback for getDirSizeRecursively() (#4753)
- Don't mark inputs (or outputs) executable for no reason (#4728)
- Fix scheduled CI tests (#4742)
- Fix --printJobInfo (#4709)
Thank you to our contributors: @stxue1 , @w-gao, @DailyDreaming , @mr-c , @adamnovak , @glennhickey, @misterbrandonwalker, and @a-detiste !
5.12.0
WDL
- Virtualize filenames as in-container paths from point of view of WDL command (#4527)
- Add WDL conformance tests to CI (#4530)
- Use less memory in the Giraffe WDL test (#4541)
Version Upgrades
- Upgrade to cwltool 3.1.20230601100705 (#4500)
- Update mock requirement from <5,>=4.0.3 to >=4.0.3,<6 (#4366)
Misc
- Anonymous access to Google Storage (#4518)
- Reorder config so that default settings are applied first (#4528)
- Add a way to forward accelerators to Docker containers (#4492)
Bug Fixes
- Fix test failures without docker installed (#4544)
- Prevent certain tests from being run twice in CI (#4529)
- Drop external Docker builder (#4523)
- Fix CI lint test (#4533)
- Grab AWS group policies on top of user (#4505)
- Grab accelerator set off the end of the list instead of by index (#4506)
- Fix RtD build (#4491)
- Include tests (#4499)
Thank you to our contributors: @stxue1 , @DailyDreaming , @mr-c , @adamnovak , and @tjni !
5.11.0
Breaking Changes
- Imported files will be symlinked by default, unless the user sets
--noLinkImports
or the workflow imports withsymlink=False
. (#3949)
WDL
- Toil will now stop if it encounters an error polling a possible import URL for a WDL workflow input file. (#4479)
- WDL workflows will be protected against imported files with no basenames. (#4477)
Misc
- Toil batch system ID numbers for issued jobs now start at 1. (#4482)
- Attempts to import files from URLs when the implementing job store is missing an extra are now better reported. (#4479)
- Include tests in the source distribution that gets published to PyPI (#4499)
Bug Fixes
- Toil should no longer crash when a delete wins a race against a load in
FileJobStore
(#4484) - Prevent local root jobs (such as WDLRootJob) from being run twice. (#4482)
- Slurm and other grid batch system jobs will now have more informative names (#4472)
- WDL workflows can no longer import
""
as a File. (#4477)
Thank you to our contributors: @stxue1, @DailyDreaming, @mr-c, @adamnovak
5.10.0
Changelog
Highlighted Features Added
- Add a
--caching
option which explicitly states whether to use caching with a workflow. Uses a default value depending on whether or not we are using the file job store if not specified. (#4218) - New prototype WDL runner
python -m toil.wdl.wdltoil
using MiniWDL (#3468) - MiniWDL-based WDL implementation can now run the vg Giraffe WDL workflow ( #4353)
- Toil now tests against our own tiny set of WDL conformance tests (#4351)
- Toil can run the HPRC assembly WDL workflows (#4435)
- Toil can now use Mesos roles (#4455)
Breaking Changes
- Replace "preemptable" with "preemptible", add example of using --defaultPreemptible flag to Preemptibility documentation (#1951)
CWL
- CWL: run all ExpressionTools on the Leader node, instead of submitting separate jobs (#4157)
Kubernetes
- Kubernetes batch system: Delete jobs individually when batch delete fails (#3403)
- Documentation for running a Toil leader for a Kubernetes workflow outside Kubernetes now covers examples and common problems for running CWL workflows (document toil-cwl-runner + "Running the Leader Outside Kubernetes" #3422)
- Kubernetes batch system: support
--maxCores
,--maxDisk
, and--maxMemory
(#2864) - Add tutorial for Kubernetes launch cluster (#3743)
Dependencies
- Require htcondor 10 exactly (#4315)
- Toil jobs now have a
local
parameter which determines if they should run on the leader. (#4388)
Misc
- The offline tests can now be run in parallel (#3493)
- Code updated to be more idiomatic for Python3.7 (#4295)
- Support for a
--network
fortoil launch-cluster
for Google cloud (#4196) - Support for a
--use_private_ip
fortoil launch-cluster
to dial nodes by private IP instead of public IP (#4196) - GPU scheduling should now be supported on Slurm (#4308)
- Toil now supports a
--batchLogsDir
option andTOIL_BATCH_LOGS_DIR
environment variable, to provide a directory other than the work dir where Toil will instruct HPC batch systems to save their captured job logs. htcondor
batch system should now work again, and will retry connections- Updated the --coalesceStatusCalls help documentation to reflect the current state of #4431 (#4437)
- Toil no longer trusts XDG_RUNTIME_DIR under Slurm (fixes some of the issues behind #4395 when Slurm is configured not to follow the XDG spec) (#4435)
- Toil now puts it lock files for Singularity cache directories for WDL in those directories (#4435)
- Toil's WDL interpreter can now use local-to-the-leader jobs for evaluating WDL code that doesn't need appreciable resources (#4388)
- Toil now tolerates more possible exceptions related to the panasas network file system (#4440)
- Type hinting to functions in resource.py (#938)
- Added return type to inVirtualEnv() in
__init__.py
(#938) - Added None checks to some function bodies (#938)
Bug Fixes
- Stop crashing when predefined batch job exit reasons are used and need to go into the message bus log file (#4321)
- Added
import subprocess
to restore the behavior of #588. (#4429) - Toil will no longer use the stored message bus path from an old execution of a workflow when deciding where to save the message bus log when restarting a workflow (#4438)
- Fix --custom-net mutual exclusivity bug. (#4458)
Thank you to our contributors: @stxue1 , @DailyDreaming , @mr-c , @adamnovak , @jfennick , @misterbrandonwalker , @w-gao , @stephanaime , @glennhickey , @Hexotical , @manabuishii @gmloose , @boukn , and @thiagogenez !
5.9.2
Changelog
Bug Fixes
- Change build tag import (#4329)
Thank you to our contributors: @adamnovak , @Hexotical !
5.9.0
Changelog
Bug Fixes
- Fix --provisioner and --metrics together (#4328)
- Ignore incorrect type hint from boto3, remove json.loads (#4330)
- Warn about missing --bypass-file-store with in-place update (#4337)
- Replace prepareHTSubmission with prepareSubmission in HTCondor (#4319)
- Merge "Google fixes" (#4293)
- Support (only) current htcondor (#4320)
- Delete k8s jobs individually when batch delete fails (#4306)
Misc
- Update aws spot documentation (#4310)
- Enable parallel testing (#3493)
- Add documentation for running CWL workflows on non-Toil-managed Kubernetes clusters (#4332)
- Export all slurm args by default (#4237)
- Allow for subclasses of base types in messages (#4322)
- Non cache default (#4299)
Dependencies
- Bump mypy from 0.982 to 0.991 (#4345)
- Bump schema-salad>=8.4.20230128170514,<9 to schema-salad>=8.3.20220913105718,<8.4 (#4342) (#4341)
- Bump cwltool from 3.1.20221008225030 to 3.1.20221201130942 (#4338)
- Bump pyupgrade to 3.7 (#4295)
Thank you to our contributors: @adamnovak , @Hexotical , @w-gao, @mr-c , @gmloose , @boukn , and @thiagogenez !
5.8.0
Changelog
Highlighted Features Added
- Toil server now exposes workflow tasks via WES (#4046).
- Toil server now has a
--wes_dialect agc
option that will hide any tasks that don't have Amazon Batch job IDs, and put the IDs in the task names for those that do (#4047). - Toil jobs now accept an
accelerators
requirement, likeaccelerators=1
oraccelerators={'kind': 'gpu', 'brand': 'nvidia', 'count': 2}
(#4163) - Include total requested cores for each job type in
toil stats
(#4173) - Toil jobs now expose
job.accelerators
to workflow - Add prefix suffix params to
AbstractFileStore.getLocalTempFile
andAbstractFileStore.getLocalTempFileName
(#4273) - CWL:
--no-compute-checksum
,--strict-cpu-limit
,--disable-validate
, and--fast-parser
are now available
Breaking Changes
- Toil's built-in autoscaler now guesses that some memory and disk space on nodes will not actually be available for jobs; pass
--assumeZeroOverhead
to revert to the old behavior (#2103)
CWL
- CWL job unit and display names have been changed to make more sense as task names, and management of them has been unified into a
CWLNamedJob
. (#4046/#4047) - CWL
CUDARequirement
is parsed bycwltool
and turned into a requirement for the minimum requested number of nvidia GPU accelerators (#3982) - fix false warning when outputSource contains only one None value (#4300)
Kubernetes
KubernetesBatchSystem
can addnvidia.com/gpu
andamd.com/gpu
resource requests for jobs that request those accelerators (#4163)KubernetesBatchSystem
can request GPUs bymodel
key, if nodes are labeled appropriately (#4163)
Dependencies
Misc
- Toil WES server now accepts requests that leave out workflow_params. (#4037)
- The
MessageBus
has been expanded to usepypubsub
, and now hasMessageInbox
andMessageOutbox
objects to represent connections to it. (#4046/#4047) ToilMetrics
now rides on theMessageBus
rails. (#4046/#4047)- Toil workflows now have a
--writeMessages
option, which takes a file to which a line-oriented stream ofMessageBus
messages will be written. Reading this file will allow you to recover the current state of the workflow. (#4046/#4047) - Add code for warning check to be used when launching cluster with AWS. (#3514)
- Use a CI prebake image for gitlab testing. (#4185)
- Toil clusters now have
/var/tmp
as the default temporary directory, since they often make large temporary files (#4148) - Adds basic testing for slurm using a slurm docker cluster by running sample workflows. (#3856)
- Add message bus documentation (#4239)
SingleMachineBatchSystem
can schedule nvidia GPU accelerators, limiting the concurrent jobs to no more than there are accelerators to support, and settingCUDA_VISIBLE_DEVICES
in the tasks' environments to tell them which nvidia GPU(s) to use. (#4163)AWSBatchBatchSystem
can use AWS Batch's GPU resource to provide nvidia GPU accelerators (#4163)- Toil jobs no longer need to re-run after their child/followOn/service jobs in order to delete themselves. (#3188)
- Message bus is now thread safe (#4276)
- Docker build has been updated with new Aventer Mesos deb URL (fixes #4290)
docker
binary in the container has been updated to that included in the Ubuntu repos (fixes #4282)- Singularity in the appliance has been updated to 3.10 which is >=3.9, for cgroups v2 support.
- Base Ubuntu container image for the appliance has been updated to 22.04, which has a new enough libc for Debian's Singularity 3.10 debs.
- Safer type usage checking for systems without boto3 installed
- Tests are now more runnable post-installation. Temporary paths are not selected based upon the location of the tests themselves. (#4287)
Bug Fixes
- Only use
/var/run/user
if XDG tells us we have it in our session. Otherwise we will try other places, including/run/lock/toil
. (#4170) toil destroy-cluster
: terminate stopped instances when destroying the cluster (#4271)- fileJobStore: handle arbitrary
os.link
errors to work on some filesystems (#2232)
Thank you to our contributors!