Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Optionally check configureExecutable config value path actually exists #348

Closed
wants to merge 3 commits into from

Conversation

clintval
Copy link
Member

@clintval clintval commented May 17, 2019

I have a use case which is not supported by Dagr's current methods for executable configuration.

I want to configure a default location where executables can be placed such that Dagr always uses them if they can be found. If the executables do not exist on disk at the path specified in configuration, then I want Dagr to fallback onto the system path.

With this PR, two alternate behaviors can be allowed without changing Dagr configuration. This will make sharing Dagr pipelines easier!

Dependency Strategies I want to Advertise

  1. To get up and running very quickly, install dependencies with this command:

    ❯ conda install vcfanno

    Then execute the pipeline:

    ❯ java -jar pipeline.jar RunMe
  2. To use a specific version of the pipeline dependencies or to use a different version than provided on your system path, simply place the executables at the following location:

    ❯ ls /pipeline/packages/
    /pipeline/packages/vcfanno 

    Then execute the pipeline:

    ❯ java -jar pipeline.jar RunMe

I'm not thrilled about providing the argument mustExist, would you rather this logic be allowed through a new function? Open to suggestions!

@codecov-io
Copy link

codecov-io commented May 17, 2019

Codecov Report

Merging #348 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #348      +/-   ##
==========================================
+ Coverage   91.53%   91.55%   +0.01%     
==========================================
  Files          31       31              
  Lines        1146     1148       +2     
  Branches      108      101       -7     
==========================================
+ Hits         1049     1051       +2     
  Misses         97       97
Impacted Files Coverage Δ
...rc/main/scala/dagr/core/config/Configuration.scala 96.87% <100%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2189cf...e49e227. Read the comment docs.

@clintval clintval force-pushed the cv_configure_backup branch 2 times, most recently from a1b82e2 to 45608b1 Compare May 17, 2019 16:54
@clintval clintval force-pushed the cv_configure_backup branch from 45608b1 to 27bfd86 Compare May 17, 2019 18:24
@nh13 nh13 self-assigned this May 21, 2019
@nh13 nh13 self-requested a review May 21, 2019 23:29
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes, but I could also use @tfenne's input. I think it may be time to move dagr out from using it's own Configuration class and use the commons Configuration class. If we do the latter, we could either add some of the missing methods directly (ex. configureExecutable) to the commons Configuration class, or extend it to a new trait/class (for things that want to configure executables) and put that there.

Regardless, I think this PR is fine (once comments have been addressed) and we can choose to do the migration later. @tfenne thoughts?

core/src/main/scala/dagr/core/config/Configuration.scala Outdated Show resolved Hide resolved
core/src/main/scala/dagr/core/config/Configuration.scala Outdated Show resolved Hide resolved
core/src/main/scala/dagr/core/config/Configuration.scala Outdated Show resolved Hide resolved
core/src/main/scala/dagr/core/config/Configuration.scala Outdated Show resolved Hide resolved
@nh13 nh13 requested a review from tfenne May 21, 2019 23:58
@clintval
Copy link
Member Author

clintval commented May 22, 2019

I did my best wordsmithing although I feel like improvements could still be made. A re-review would be great! I agree allowMissingConfigReference is clearer than mustExist but I'm not sure missing is the right verb and that ConfigReference is the right noun. For lack of a better idea, it rests.

The parameter rename raises a few API design questions for me that might be useful context for when we eventually move to commons.util.Configuration.

  • For a function which is destined to raise an exception when the executable can not be configured, why would we allow a state in which, under default use, a missing executable can be successfully returned?
  • In the default case, do we expect the user to assert the executable exists after looking it up in the config since they will have no context as to if the executable path was found either in the config (uncertain executable existence) or on the system path (certain executable existence)?

@tfenne
Copy link
Member

tfenne commented May 28, 2019

I'm not entirely sure how I feel about this PR to be completely honest. My main concern is that a mistake in config could lead to totally unexpected consequences. E.g. I frequently have multiple versions of bwa on my system, and one of them lives on my default PATH. If I add to config something like bwa.executable=/packages/bwa-0.7.13/bni/bwa, and expect pipeline will run bwa 0.7.13, I'd be unpleasantly surprised to find a) my pipelines didn't fail and b) they ran whatever version of bwa is on my system path. That fear could be at least partly allayed by making the default behaviour to throw an error if a configured executable doesn't exist, but that means that any usages of configureExecutable inside of dagr wouldn't function how you'd want them.

Re: the name of allowMissingConfigReference, I think this is really hard to do without making it very wordy - since there are two things that could be missing; the config key could be undefined, or it's value could point to a nonexistent executable/file. Here are a few options for (IMO) better names (with defaults that match the existing dagr behaviour):

  • configuredExecutablesMustExist: Boolean = true
  • alwaysFallBackToSystemPath: Boolean = false
  • allowConfiguredExecutablesToNotExist: Boolean = false

@clintval
Copy link
Member Author

@tfenne, I understand your concern. Activating the feature in this PR would open you up to system path fallbacks if you included typos in your Dagr configuration.

Ultimately, I'm looking for a way to configure a Dagr pipeline using hard-coded paths as default, but permitting a system path search as a fallback if the executable is missing so I can leverage alternate dependencies that are installed using a package/environment manager like conda. This would allow me greater ease in end-to-end testing over a variety of tool dependency versions using the same Dagr pipelines (same release, same JAR).

I usually run Dagr pipelines on blank cloud instances so multiple system binaries risk is small for me but I also understand these pipelines run just as well on personal environments.

Is this PR blocked from your perspective?

Happy to keep chipping away at this feature if it's a desired addition!

@nh13
Copy link
Member

nh13 commented May 28, 2019

@clintval why not have an empty dagr config, then use the system path to express priority order searching for binaries?

@nh13
Copy link
Member

nh13 commented May 28, 2019

@tfenne @clintval we could also implement the fallback as a global variable within Configuration that one could toggle. Be default, it fails if the executable provided by the config path does not exist. See #350

@clintval
Copy link
Member Author

Good suggestions all around. I'm 👍 on #350.

@clintval clintval closed this Jan 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants