Skip to content

Releases: DataBiosphere/toil

8.0.0

12 Feb 18:34
Compare
Choose a tag to compare

Highlighted Features Added

  • toil debug-job now has --retrieveTaskDirectory <dir> which will set up a job's downloaded files under <dir> and try to stop the job after doing the downloads. Jobs can call self.files_downloaded_hook() to provide a stopping point for this mode. (#4815)
  • toil debug-job can now reconstruct the inside-the-container environment for CWL and WDL tasks. (#4815)
  • Added support for caching on Slurm and other HPC schedulers (#4775)
  • Replace all instances of boto2 with boto3 for all Toil AWS code (#4718)
  • Add support for Python 3.12 (#4718)
  • Add support for Python 3.13 (#5145)
  • Ceph input/output errors from file locking functions are now tolerated. (#4874)
  • Toil now uses flock to enable directory locks to work properly (#4924)
  • Added support to get Slurm partitions and automatically send jobs to GPUs on Slurm (#4833) (supports both CWL and WDL)
  • New --symlinkJobStoreReads=False option lets you force local-node copies (possibly in the cache) even when reading directly from a FileJobStore is possible, potentially reducing shared filesystem IO. (#4673)
  • Toil now supports reading and writing MiniWDL's call cache. (#4797)
  • Toil now supports running CWL and WDL workflows from Dockstore, by using either a Dockstore page URL or TRS ID as the URL/filename of the workflow to run. Since these often contain ? or #, remember to quote them on the command line! (#5049)
  • Add support for parallel file imports (#5114)
    • New argument --importWorkersThreshold. This specifies the threshold where files will begin to be imported on individual jobs. Small files will be batched into the same import job up to this threshold.
    • --importWorkersDisk defaults to 1 MiB. Should be increased when download streaming is not possible on a worker.

Breaking Changes

CWL

  • Prevent simultaneous Singularity container pulls in toil-cwl-runner (#4990)
  • Added support to import files on workers for toil-cwl-runner (#5025)
    • --runImportsOnWorkers to enable importing files on workers
    • --importWorkersDisk to control how much disk space the import worker will use
  • Don't error when passing through input as the output (#5138)
  • CWL jobs with dynamic requirements now have input type checking properly protected by their conditionals. (#4930)
  • Fixed a LoadListing bug with CWL workflows (#5149)
  • Fix CWL Workflow Slurm memory test (#5151)
  • workDir and jobStore defaults to tmp-outdir-prefix (#5154)
  • CWL container prepull: no reason to if extensions are enabled, they are now supported by cwl-utils 0.36+ (#5188)
  • CWL container prepull: skip if --no-containers is specified (#5188)

WDL

  • Update WDL conformance tests on CI (#4875)
  • Added support to run task only WDL files (#4960)
  • Added support for the gpu field in WDL (#4949)
  • Support passing inputs into toil-wdl-runner for task only WDLs (#4977)
  • toil-wdl-runner will now carry through task exit codes (#4978)
  • toil-wdl-runner will respect explicit null values for optional inputs (#4981)
  • toil-wdl-runner will not immediately error on nonexistent coerced files until outputted (#4994)
    • File? type for string to file coercion is now supported (will be nullified)
  • WDL output files will now live in directories named after their tasks instead of UUID directories (#5008)
  • Fixed a bug with conditional statements inside a WDL scatter (#5055)
  • toil-wdl-runner now correctly finds and returns outputs from tasks in scatters and conditionals when a WDL workflow lacks an output section. (#5094)
  • toil-wdl-runner has a new --allCallOutputs option to allow including all calls' outputs in a workflow's output. (#5093)
  • toil-wdl-runner can now detect and try not to delete the outputs of a workflow that is meant to use the Cromwell Output Organizer (croo). Note that croo still can't actually work on the output of toil-wdl-runner. (#5093)
  • --allCallOutputs no longer discards WDL workflow outputs section outputs. (#5106)
  • File virtualization in toil-wdl-runner now only happens at task boundaries (#5028)
    • File to String coercion should be supported
  • Added support to import files on workers for toil-wdl-runner (#5103)
  • Support WDL 1.1 disk specification as per spec (#5001)
  • Fixed a bug with WDL file imports (#5121)

Kubernetes

Dependencies

  • Toil can now use connexion 4 (#5196)
  • Toil now uses htcondor 23.6 or 24, which are still on PyPI

Misc

  • Makefile: use isolated builds, add dist target (sdist+wheel) and deprecate the sdist target. (#4820) (#4826)
  • Toil will now wait --jobStoreTimeout seconds (default: 30) to see an update to/removal of a job that was run, and will not let the job succeed unless it is seen to make progress. (#3814)
  • Toil job descriptions no longer have a command field, and we track the link to the job body and the command to invoke the Toil worker separately. (#4811)
  • Several typos in the docs were fixed (#4889)
  • Add a test to ensure batchsystem plugins are installable (#4879)
  • Fix Toil utils to work without the AWS extra (#4953)
  • Print commit hash with toil --version when installed from source. Before: 7.1.0a1. After: 7.1.0a1-ccf57e6071e32675daabdcbacb91988e871745a9 (#4954)
  • Fixed a broken URL and an omitted variable in CI tests (#4974)
  • Generate default config correctly (#5014)
  • Use the latest setuptools when running cactus. (#5017)
  • Toil will refuse to proceed if it detects that its coordination directory or a Singularity cache directory it needs to lock is on Ceph, to prevent hanging the Ceph MDS (#4972)
  • Fix a NotImplementedError in the Grid Engine batchsystem (#5061)
    • Added basic Grid Engine CI tests
  • Update Cactus on CI to 2.9.0 (#5062)
  • Separate out create/delete iam role functions into lib.
  • Remove deprecated pipes module (#5122)
  • New --slurmTime/TOIL_SLURM_TIME setting to set the time limit on Slurm jobs in a way Toil itself understands. (#5010)
  • New --slurmPE argument to allow setting a parallel-job Slurm partition without using TOIL_SLURM_PE (#5010)
  • New --slurmArgs argument to allow specifying extra Slurm submission arguments without using TOIL_SLURM_ARGS (#5010)
  • For non-GPU jobs on Slurm, Toil will submit the job to a partition with a time limit long enough to accommodate the configured runtime (from --slurmTime). (For GPU jobs, the lowest-priority GPU partition is still always used.) (#5010)
  • Toil now has a --slurmDefaultAllMem option to run jobs lacking their own memory requirements with Slurm's --mem=0, so they get a whole node's memory. (#4971)
  • toil-cwl-runner now has --no-cwl-default-ram (and --cwl-default-ram) to control whether the CWL spec's default ramMin is applied, or Toil's own default memory logic is used. (#4971)
  • The --dont_allocate_mem and --allocate_mem options have been deprecated and replaced with --slurmAllocateMem, which can be True or False. (#4971)
  • Added WDL unit tests to CI (#5110)
  • Mesos build updated. (#5049)
  • CWL and WDL argument parsing revised for Python 3.12. (#5049)
  • Organize stats and logging files into stats/inbox and stats/archive and avoid a circular rename. (#1727)
  • Added proper FTP support for jobstores (#5134)
  • URL existence and size gets/checks are now done with HEAD requests (#5134)
  • Dependabot configuration should now pass schema validation and is itself under CI (#5175)
  • Toil now tests a version of Cactus that ought to run on Python 3.13. (#5184)
  • WDL conformance tests on Kubernetes may now run for 30 minutes. (#5185)
  • When importing files on workers, fall back to importing on the leader when file sizes are not obtainable (#5135)

Thank you to our contributors: @stxue1, @DailyDreaming, @adamnovak, @mr-c, @gmloose, @davidjsherman!

7.0.0

21 May 22:25
Compare
Choose a tag to compare

What's Changed

6.1.0

08 May 18:55
3f9cba3
Compare
Choose a tag to compare

Highlighted Features Added

  • WDL and CWL task standard output and standard error logs that are not captured by the workflow will now be logged at INFO level and stored in the --writeLogs/--writeLogsGzip directory. (#4657)
  • Use a default log limit of 100MiB (#4788)

Breaking Changes

  • Stats and logging system again uses job display name (#4755)
  • --disableProgress is once again a flag that doesn't take an argument (#4758)

CWL

  • Don't clear out user-provided values for the --default-container option (#4730)

WDL

  • WDL job names now include numbers for scatters (#4755)
  • Multi-line WDL placeholder substitutions no longer interfere with de-indenting WDL command blocks (chanzuckerberg/miniwdl#665)
  • Standard error for failed tasks is now always logged to the worker log somewhere (#4781)

Kubernetes

Dependencies

  • Deps: removed the ruaml.yaml.string plugin dependency for a simpler solution (#4760)

Misc

  • Toil will no longer warn about a missing XDG_RUNTIME_DIR (#4769)
  • Read the Docs and CI docs builds should have Graphviz installed (pending CI image rebuild) (#4734)
  • Add more Python3.12 compatibility by replacing the one function from distutils that we use, strtobool(). (#4765)
  • Set default cache folders to be accessible between toil-wdl-runner workflows (Same as MiniWDL/Singularity defaults) (#4761)
  • Set toil-wdl-runner cache folders on Toil managed clusters to be at /var/lib/toil (#4761)
  • Fall back to assuming machine has 1 core when CPU count is unavailable. (#4545)
  • FileJobStore now supports filenames that get modified when percent-encoded (#4779)

Thank you to our contributors:

@DailyDreaming @mr-c @stxue1 @adamnovak @app/dependabot

Full Changelog: releases/6.0.0...releases/6.1.0

6.0.0

16 Jan 19:40
Compare
Choose a tag to compare

NOTE!

We now have a config file! https://toil.readthedocs.io/en/latest/running/cliOptions.html#the-config-file

Breaking Changes

  • Removed the parasol batch system
  • Removed the TES batch system (this is now a plugin)
  • Removed our WDL compiler in favor of an interpreter (we still support WDL, we just do it differently now)
  • We no longer support python3.7

CWL

  • Support CWL 1.2.1 (#4682)
  • CWL Pipefish compatibility (#4636)
  • Support per-task preemptibility in CWL (#4551)
  • Fix configargparse in CWL (#4618)
  • cwl: use the latest commit from the proposed CWL v1.2.1 branch (#4565)
  • Upgrade cwltool to avoid broken galaxy-tool-util release. (#4639)
  • Implement a better config file system for CWL/WDL options (#4666)
  • Allow working with remote files in CWL and WDL workflows (#4690)
  • Make cwl mutually exclusive groups exist only when cwl is not suppressed (#4725)
  • Log more usefully for CWL workflows (#4736)

WDL

  • Simplify WDL Toil job graphs (#4524)
  • More WDL and Slurm documentation (#4558)
  • Improve WDL documentation (#4732)
  • Add String to File functionality into toil-wdl-runner (#4589)
  • Run WDL output through Toil export system to support URIs (#4579)
  • Allow the WDL output section to reference itself (#4592)
  • Ensure sibling files in toil-wdl-runner (#4610)
  • Make WDLOutputJob collect all task outputs (#4602)
  • Report errors in WDL using MiniWDL's error location printer (#4637)
  • Remove the WDL compiler. (#4679)
  • Implement a better config file system for CWL/WDL options (#4666)
  • Allow working with remote files in CWL and WDL workflows (#4690)
  • Strip leading whitespace from WDL commands (#4720)

Misc

  • Add config file support (#4569)
  • Support Python3.11 and drop Python 3.7 (#4646)
  • Move TES batch system to a plugin (#4650)
  • Turn batch system tests back on (#4649)
  • Separate out integration tests to run on a schedule (#4612)
  • Avoid concurrent modification in cluster scaler tests (#4600)
  • Remove old buckets from AWS (#4588)
  • Tests: only request a single core (#4572)
  • Reduce the number of assert statements (#4590)
  • take any nvidia-smi exception as not having gpu (#4611)
  • More resiliancy (#4395)
  • Remove useage of the deprecated pkg_resources (#4701)
  • Make sure cwltool always knows we have an outdir to fix #4698 (#4699)
  • AWS jobStoreTest: re-use delete_s3_bucket from toil.lib.aws (#4700)
  • Only count output file usage when using the file store (#4692)
  • Remove the parasol batch system. (#4678)
  • Move around reqs and move aws dev libraries to aws (#4664)
  • Make sure the --batchLogsDir exists if it is set (#4635)
  • Update EC2 instances and EC2 update script. (#4745)
  • remove extraneous dependency on old 'mock' (#4739)
  • Point CI at the new public URLs for stuff we host
  • Add init.py to options folder (#4723)

Bug Fixes

  • Lower redirect log level to fix #4526 (#4578)
  • Fix mypy from being broken by new boto types (#4577)
  • Fix CI on local Gitlab runners (#4571)
  • Banish ghost jobs (#4563)
  • Stop deleting chained-to jobs which fail as orphaned jobs (#4557)
  • Fix pickling error when jobstate file doesnt exist and fix threading error when lock file exists then disappears (#4575)
  • Fix #3867 and try to explain but not crash when bad things happen to our mutex file (#4656)
  • Fix CI Appliance Builds (#4655)
  • Tolerate a failed AMI polling attempt (#4727)* Add pure Python fallback for getDirSizeRecursively() (#4753)
  • Don't mark inputs (or outputs) executable for no reason (#4728)
  • Fix scheduled CI tests (#4742)
  • Fix --printJobInfo (#4709)

Thank you to our contributors: @stxue1 , @w-gao, @DailyDreaming , @mr-c , @adamnovak , @glennhickey, @misterbrandonwalker, and @a-detiste !

5.12.0

27 Jul 03:19
6d5a5b8
Compare
Choose a tag to compare

WDL

  • Virtualize filenames as in-container paths from point of view of WDL command (#4527)
  • Add WDL conformance tests to CI (#4530)
  • Use less memory in the Giraffe WDL test (#4541)

Version Upgrades

  • Upgrade to cwltool 3.1.20230601100705 (#4500)
  • Update mock requirement from <5,>=4.0.3 to >=4.0.3,<6 (#4366)

Misc

  • Anonymous access to Google Storage (#4518)
  • Reorder config so that default settings are applied first (#4528)
  • Add a way to forward accelerators to Docker containers (#4492)

Bug Fixes

  • Fix test failures without docker installed (#4544)
  • Prevent certain tests from being run twice in CI (#4529)
  • Drop external Docker builder (#4523)
  • Fix CI lint test (#4533)
  • Grab AWS group policies on top of user (#4505)
  • Grab accelerator set off the end of the list instead of by index (#4506)
  • Fix RtD build (#4491)
  • Include tests (#4499)

Thank you to our contributors: @stxue1 , @DailyDreaming , @mr-c , @adamnovak , and @tjni !

5.11.0

15 Jun 15:17
Compare
Choose a tag to compare

Breaking Changes

  • Imported files will be symlinked by default, unless the user sets --noLinkImports or the workflow imports with symlink=False. (#3949)

WDL

  • Toil will now stop if it encounters an error polling a possible import URL for a WDL workflow input file. (#4479)
  • WDL workflows will be protected against imported files with no basenames. (#4477)

Misc

  • Toil batch system ID numbers for issued jobs now start at 1. (#4482)
  • Attempts to import files from URLs when the implementing job store is missing an extra are now better reported. (#4479)
  • Include tests in the source distribution that gets published to PyPI (#4499)

Bug Fixes

  • Toil should no longer crash when a delete wins a race against a load in FileJobStore (#4484)
  • Prevent local root jobs (such as WDLRootJob) from being run twice. (#4482)
  • Slurm and other grid batch system jobs will now have more informative names (#4472)
  • WDL workflows can no longer import "" as a File. (#4477)

Thank you to our contributors: @stxue1, @DailyDreaming, @mr-c, @adamnovak

5.10.0

18 May 09:03
21422a3
Compare
Choose a tag to compare

Changelog

Highlighted Features Added

  • Add a --caching option which explicitly states whether to use caching with a workflow. Uses a default value depending on whether or not we are using the file job store if not specified. (#4218)
  • New prototype WDL runner python -m toil.wdl.wdltoil using MiniWDL (#3468)
  • MiniWDL-based WDL implementation can now run the vg Giraffe WDL workflow ( #4353)
  • Toil now tests against our own tiny set of WDL conformance tests (#4351)
  • Toil can run the HPRC assembly WDL workflows (#4435)
  • Toil can now use Mesos roles (#4455)

Breaking Changes

  • Replace "preemptable" with "preemptible", add example of using --defaultPreemptible flag to Preemptibility documentation (#1951)

CWL

  • CWL: run all ExpressionTools on the Leader node, instead of submitting separate jobs (#4157)

Kubernetes

  • Kubernetes batch system: Delete jobs individually when batch delete fails (#3403)
  • Documentation for running a Toil leader for a Kubernetes workflow outside Kubernetes now covers examples and common problems for running CWL workflows (document toil-cwl-runner + "Running the Leader Outside Kubernetes" #3422)
  • Kubernetes batch system: support --maxCores, --maxDisk, and --maxMemory (#2864)
  • Add tutorial for Kubernetes launch cluster (#3743)

Dependencies

  • Require htcondor 10 exactly (#4315)
  • Toil jobs now have a local parameter which determines if they should run on the leader. (#4388)

Misc

  • The offline tests can now be run in parallel (#3493)
  • Code updated to be more idiomatic for Python3.7 (#4295)
  • Support for a --network for toil launch-cluster for Google cloud (#4196)
  • Support for a --use_private_ip for toil launch-cluster to dial nodes by private IP instead of public IP (#4196)
  • GPU scheduling should now be supported on Slurm (#4308)
  • Toil now supports a --batchLogsDir option and TOIL_BATCH_LOGS_DIR environment variable, to provide a directory other than the work dir where Toil will instruct HPC batch systems to save their captured job logs.
  • htcondor batch system should now work again, and will retry connections
  • Updated the --coalesceStatusCalls help documentation to reflect the current state of #4431 (#4437)
  • Toil no longer trusts XDG_RUNTIME_DIR under Slurm (fixes some of the issues behind #4395 when Slurm is configured not to follow the XDG spec) (#4435)
  • Toil now puts it lock files for Singularity cache directories for WDL in those directories (#4435)
  • Toil's WDL interpreter can now use local-to-the-leader jobs for evaluating WDL code that doesn't need appreciable resources (#4388)
  • Toil now tolerates more possible exceptions related to the panasas network file system (#4440)
  • Type hinting to functions in resource.py (#938)
  • Added return type to inVirtualEnv() in __init__.py (#938)
  • Added None checks to some function bodies (#938)

Bug Fixes

  • Stop crashing when predefined batch job exit reasons are used and need to go into the message bus log file (#4321)
  • Added import subprocess to restore the behavior of #588. (#4429)
  • Toil will no longer use the stored message bus path from an old execution of a workflow when deciding where to save the message bus log when restarting a workflow (#4438)
  • Fix --custom-net mutual exclusivity bug. (#4458)

Thank you to our contributors: @stxue1 , @DailyDreaming , @mr-c , @adamnovak , @jfennick , @misterbrandonwalker , @w-gao , @stephanaime , @glennhickey , @Hexotical , @manabuishii @gmloose , @boukn , and @thiagogenez !

5.9.2

04 Feb 05:38
Compare
Choose a tag to compare

Changelog

Bug Fixes

  • Change build tag import (#4329)

Thank you to our contributors: @adamnovak , @Hexotical !

5.9.0

03 Feb 06:04
8155e0a
Compare
Choose a tag to compare

Changelog

Bug Fixes

  • Fix --provisioner and --metrics together (#4328)
  • Ignore incorrect type hint from boto3, remove json.loads (#4330)
  • Warn about missing --bypass-file-store with in-place update (#4337)
  • Replace prepareHTSubmission with prepareSubmission in HTCondor (#4319)
  • Merge "Google fixes" (#4293)
  • Support (only) current htcondor (#4320)
  • Delete k8s jobs individually when batch delete fails (#4306)

Misc

  • Update aws spot documentation (#4310)
  • Enable parallel testing (#3493)
  • Add documentation for running CWL workflows on non-Toil-managed Kubernetes clusters (#4332)
  • Export all slurm args by default (#4237)
  • Allow for subclasses of base types in messages (#4322)
  • Non cache default (#4299)

Dependencies

  • Bump mypy from 0.982 to 0.991 (#4345)
  • Bump schema-salad>=8.4.20230128170514,<9 to schema-salad>=8.3.20220913105718,<8.4 (#4342) (#4341)
  • Bump cwltool from 3.1.20221008225030 to 3.1.20221201130942 (#4338)
  • Bump pyupgrade to 3.7 (#4295)

Thank you to our contributors: @adamnovak , @Hexotical , @w-gao, @mr-c , @gmloose , @boukn , and @thiagogenez !

5.8.0

04 Jan 23:01
79792b7
Compare
Choose a tag to compare

Changelog

Highlighted Features Added

  • Toil server now exposes workflow tasks via WES (#4046).
  • Toil server now has a --wes_dialect agc option that will hide any tasks that don't have Amazon Batch job IDs, and put the IDs in the task names for those that do (#4047).
  • Toil jobs now accept an accelerators requirement, like accelerators=1 or accelerators={'kind': 'gpu', 'brand': 'nvidia', 'count': 2} (#4163)
  • Include total requested cores for each job type in toil stats (#4173)
  • Toil jobs now expose job.accelerators to workflow
  • Add prefix suffix params to AbstractFileStore.getLocalTempFile and AbstractFileStore.getLocalTempFileName (#4273)
  • CWL: --no-compute-checksum, --strict-cpu-limit, --disable-validate, and --fast-parser are now available

Breaking Changes

  • Toil's built-in autoscaler now guesses that some memory and disk space on nodes will not actually be available for jobs; pass --assumeZeroOverhead to revert to the old behavior (#2103)

CWL

  • CWL job unit and display names have been changed to make more sense as task names, and management of them has been unified into a CWLNamedJob. (#4046/#4047)
  • CWL CUDARequirement is parsed by cwltool and turned into a requirement for the minimum requested number of nvidia GPU accelerators (#3982)
  • fix false warning when outputSource contains only one None value (#4300)

Kubernetes

  • KubernetesBatchSystem can add nvidia.com/gpu and amd.com/gpu resource requests for jobs that request those accelerators (#4163)
  • KubernetesBatchSystem can request GPUs by model key, if nodes are labeled appropriately (#4163)

Dependencies

Misc

  • Toil WES server now accepts requests that leave out workflow_params. (#4037)
  • The MessageBus has been expanded to use pypubsub, and now has MessageInbox and MessageOutbox objects to represent connections to it. (#4046/#4047)
  • ToilMetrics now rides on the MessageBus rails. (#4046/#4047)
  • Toil workflows now have a --writeMessages option, which takes a file to which a line-oriented stream of MessageBus messages will be written. Reading this file will allow you to recover the current state of the workflow. (#4046/#4047)
  • Add code for warning check to be used when launching cluster with AWS. (#3514)
  • Use a CI prebake image for gitlab testing. (#4185)
  • Toil clusters now have /var/tmp as the default temporary directory, since they often make large temporary files (#4148)
  • Adds basic testing for slurm using a slurm docker cluster by running sample workflows. (#3856)
  • Add message bus documentation (#4239)
  • SingleMachineBatchSystem can schedule nvidia GPU accelerators, limiting the concurrent jobs to no more than there are accelerators to support, and setting CUDA_VISIBLE_DEVICES in the tasks' environments to tell them which nvidia GPU(s) to use. (#4163)
  • AWSBatchBatchSystem can use AWS Batch's GPU resource to provide nvidia GPU accelerators (#4163)
  • Toil jobs no longer need to re-run after their child/followOn/service jobs in order to delete themselves. (#3188)
  • Message bus is now thread safe (#4276)
  • Docker build has been updated with new Aventer Mesos deb URL (fixes #4290)
  • docker binary in the container has been updated to that included in the Ubuntu repos (fixes #4282)
  • Singularity in the appliance has been updated to 3.10 which is >=3.9, for cgroups v2 support.
  • Base Ubuntu container image for the appliance has been updated to 22.04, which has a new enough libc for Debian's Singularity 3.10 debs.
  • Safer type usage checking for systems without boto3 installed
  • Tests are now more runnable post-installation. Temporary paths are not selected based upon the location of the tests themselves. (#4287)

Bug Fixes

  • Only use /var/run/user if XDG tells us we have it in our session. Otherwise we will try other places, including /run/lock/toil. (#4170)
  • toil destroy-cluster: terminate stopped instances when destroying the cluster (#4271)
  • fileJobStore: handle arbitrary os.link errors to work on some filesystems (#2232)

Thank you to our contributors!