Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add a wrapper for quarto markdown rendering #3003

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

votti
Copy link
Contributor

@votti votti commented Jun 20, 2024

Quarto has become a frequently used markdown notebook authoring system.

This wrapper uses quarto to render R Quarto .qmd notebooks using the the Quarto params mechanism to pass parameters from snakemake.

R dependency handling with renv is supported.

The meta.yml contains a more detailed description of the wrapper.
Test cases for all the major functionality and demonstrating the renv integration are included.

QC

  • I confirm that:

For all wrappers added by this PR,

  • there is a test case which covers any introduced changes,
  • input: and output: file paths in the resulting rule can be changed arbitrarily,
  • either the wrapper can only use a single core, or the example rule contains a threads: x statement with x being a reasonable default,
  • rule names in the test case are in snake_case and somehow tell what the rule is about or match the tools purpose or name (e.g., map_reads for a step that maps reads),
  • all environment.yaml specifications follow the respective best practices,
  • the environment.yaml pinning has been updated by running snakedeploy pin-conda-envs environment.yaml on a linux machine,
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:),
  • all fields of the example rules in the Snakefiles and their entries are explained via comments (input:/output:/params: etc.),
  • stderr and/or stdout are logged correctly (log:), depending on the wrapped tool,
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to (see here; this also means that using any Python tempfile default behavior works),
  • the meta.yaml contains a link to the documentation of the respective tool or command,
  • Snakefiles pass the linting (snakemake --lint),
  • Snakefiles are formatted with snakefmt,
  • Python wrapper scripts are formatted with black.
  • Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).

This wrapper uses quarto to render R quarto markdown
.qmd notebooks.
R dependency handling with conda or renv is supported.
@votti votti changed the title Feature: Add a wrapper for quarto markdown rendering feat: Add a wrapper for quarto markdown rendering Jun 20, 2024
@fgvieira
Copy link
Collaborator

fgvieira commented Jun 21, 2024

Do you need to specify all those dependencies?
Can you try to specify just the absolutely needed packages (no dependencies)? For example, - r-quarto =1.4.

And why do you need a second environment?

Vito Zanotelli added 2 commits June 21, 2024 09:22
This simplifies the dependencies in the main
conda environment.

This required then to rewrite the tests to use
a separate conda environment with renv installed
in order to build and restore the renv.
@votti
Copy link
Contributor Author

votti commented Jun 21, 2024

Hi @fgvieira ,
Thanks a lot for your comments!
Maybe all this build dependencies for r-packages were indeed not needed. I simplified this to just depend on r-quarto.
Looking at the other packages, it seems like pining the version in the environment.yml is not needed as the pin.txt files are used. Thus I did not pin the r-quarto version.

@multiple environments:
The other environments are simply for testings and to demonstrate how:

  1. r dependencies required to render the markdown can be installed by overwriting the conda: directive of the wrapper. This is tested by having additionally data.table installed in the extra.yml environment.
  2. demonstrate and test how renv may be used together with this quarto wrapper. This is achieved by installing data.table with renv. This requires a conda env with r-renv installed.

@fgvieira
Copy link
Collaborator

fgvieira commented Jun 21, 2024

@votti we need a version in the environment.yaml for the update github action. Can you add one?

Don't think we need the test/.gitignore file, right?

If not needed, best to have a single env. You can always add data.table to the main environment (if you think it is needed).

utils/quarto/test/Snakefile Outdated Show resolved Hide resolved
# resources CLI flag: --resources renv_lock=1 --set-resource-scopes renv_lock=global


rule renv_example_init:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar the renv testcase requires renv as an additional dependency which I would track in the separate renv.yaml.

utils/quarto/test/.gitignore Outdated Show resolved Hide resolved
Generally these should not need to be updated.
If updating renv.yaml, the renv_dt.lock file needs to be
adapted as well.
test.py Outdated Show resolved Hide resolved
Vito Zanotelli added 3 commits June 21, 2024 11:47
As discussed in the PR this does not need to be
explicitly shown. Instead I have added this more expliiclty to the
documentation.
@votti votti force-pushed the feature/quarto branch 2 times, most recently from f337a87 to b17f428 Compare June 22, 2024 04:37
votti and others added 2 commits June 22, 2024 06:48
Before this would have raised an error `log` not defined
Comment on lines 44 to 45
threads: lambda wc, threads: threads
mem: lambda wc, resources: resources.mem
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense for the wrapper to use threads and mem specified under resources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely think it would make sense and attempted to implement this, but technically I did not find a way to achieve it:
We currently pass parameters for the markdown rendering using the --P, --execute-param KEY:VALUE syntax of the quarto render command. This only accepts parameters that are actually defined in the markdown document to be rendered and fails otherwise. This behaviour is coded in the knit_params_get used under the hood: https://github.com/rstudio/rmarkdown/blob/4b2f3426c1f46a8d207b0d661fc51c2508ecdad7/R/params.R#L28-L34
I dont see any possiblity to easily change this behaviour.

Thus if we would pass threads and mem as a parameter automatically, these would require the explicit declaration of these parameters in the markdown rendered.
If the markdown rendered does not explicitly supports these parameters, rendering would fail.

Thus to me requiring all markdowns to be rendered using this wrapper to support these two resource parameters would not be a good solution, as it would limit the applicability of the wrapper.
Also it would not be a complete solution as supporting any other resources would still require a workaround as the one documented here as well.
Automatically passing any resource parameter would be even more brittle as additional resources may be passed depending on the snakemake configuration used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, params/threads/mem need to be specified inside the .qmd file AND passed as command-line arguments (--P, --execute-param KEY:VALUE)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is how it works.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a bit unpractical since the params of the rule might not match the ones in the .qmd file! On the other hand, it is also not possible to add more parameters.

How are parameters included in the .qmd file? Is it an upstream rule?
Would it make sense for the wrapper to overwrite (or check) the params in the .qmd file to ensure they match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think of the qmd file like an extension to Rmarkdown or akin to a Jupyter notebook.
I imagine the typical use of the wrapper would be users to hand-write the .qmd notebook, like they would do in a Rmarkdown or Jupyter notebook.
The parameter are included when authoring the file. The defined parameters are then used in the report to eg customize titles or as parameters in functions.

Would it make sense for the wrapper to overwrite (or check) the params in the .qmd file to ensure they match?

@overwriting: I am not sure if this would be sensible: adding additional parameters would not typically not really having an effect. I would say is is really rare for people to write code that handles arbitrary additional parameters and thus typically these would simply be ignored.

@cheking to ensure they match: this is what quarto render does at runtime and the error message raised is quite understandable, so I would say this is already done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if the wrapper just read the parameters in the .qmd file and add those with --execute-param KEY:VALUE? Would that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could make sense and it should be 'easy' to implement using knitr:knit_params: https://rdrr.io/cran/knitr/man/knit_params.html
or by writing a python function that parses the 'front matter' of such documents based some regexp.

This definitely would be quite convenient to implicitly pass these parameters if appropriately named parameters exist in the document.
Personally I am not so sure if I like such an implicit parameter passing as I tend to prefer explicit over implicit for such cases. Implicitly passing these may make it difficult to detect errors. Eg if in the parameter in the quarto document is called 'thread' instead of 'threads' this would lead to a silent ignoring of the parameter.
Also it is not possible to see at a glance from the rule if the document is respecting the resources.

What would you think as an alternative of a parameter: 'params: resource_parameters' that allow the user to specify a list of strings with resources to be passed as parameters:
Eg:

params:
    resource_parameters: ['threads', 'mem_mb']

What would you prefer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now implemented that if a quarto params named resources_<resource_name> is defined
the corresponding snakemake resource parameter is automatically passed.
eg quarto param resources_mem_mb -> snakemake.resources.mem_mb will be
passed.
threads will also be passed if a quarto parameter resrouces_threads is present.

Would this work for you?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how would those quarto params be defined?

I am not very familiar with quarto but my concern is that, if the rule resources change then the same input won't work (since the resources are hard-coded inside the file).
As I see it, the only way to ensure it, is to always overwrite (or add) the resources in the .qmd file with those in snakemake.resources. If you think the current wrapper can do it, then I am happy. 😄

Co-authored-by: Filipe G. Vieira <[email protected]>
votti and others added 6 commits June 27, 2024 08:41
As suggested resources that match the name should be automatically
passed.

Tests added:
- test if a custom resource added will be passed
- test if the special resource 'threads' will be passed
If a quarto params named resources_<resource_name> is defined
the corresponding snakemake resource parameter is automatically passed.
eg quarto param resources_mem_mb -> snakemake.resources.mem_mb will be
passed.
Accidentally the header of the notes section
in meta.yaml was removed
@fgvieira
Copy link
Collaborator

fgvieira commented Aug 8, 2024

The idea behind renv is to allow to install other R packages not included in the original wrapper environment, no?
Would it be possible to just use conda for that? I mean, use the wrapper but with a custom conda environment (conda:).

@johanneskoester what do you think of using renv inside the wrapper? Or should it be done at the snakemake level (like conda or singularity)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants