Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

offer a HTML5 compatible format as html_document() is HTML4 only #1776

Open
3 tasks done
UlvHare opened this issue Feb 23, 2020 · 6 comments
Open
3 tasks done

offer a HTML5 compatible format as html_document() is HTML4 only #1776

UlvHare opened this issue Feb 23, 2020 · 6 comments
Labels
feature a feature request or enhancement theme: HTML5 releated to HTML5 formats

Comments

@UlvHare
Copy link

UlvHare commented Feb 23, 2020

There is a quite old commit forcing rmarkdown --> pandoc to render all html documents as html4. Mostly I use my own html5 templates but even putting in YAML something like:

---
title: "My title"
date: "`r Sys.time()`"
output:
  html_document:
    fig_width: 8
    fig_height: 6
    toc: true
    theme: null
    highlight: "pygments"
    md_extensions: -autolink_bare_uris
    self_contained: FALSE
    template: journal.html5
    pandoc_args: ["--to=html5"]
---

there is --to html4 in output meaning that pandoc uses it. The only working solution is manually "revert" those strings in rmarkdown sources locally and rebuild it at every update. And when I need to use e.g. bookdown, which relies on (deprecated?) html4, I install CRAN version again. A bit uncomfortable.

Maybe let it to user to specify the version of html? For convenience, let default variant be "html4" but user should can to simply put "html5" somewhere in YAML-block to have html5 file in output.


> xfun::session_info('rmarkdown')
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Gentoo/Linux

Locale:
  LC_CTYPE=ru_RU.utf8          LC_NUMERIC=C                 LC_TIME=ru_RU.utf8           LC_COLLATE=C                
  LC_MONETARY=ru_RU.utf8       LC_MESSAGES=ru_RU.utf8       LC_PAPER=ru_RU.utf8          LC_NAME=ru_RU.utf8          
  LC_ADDRESS=ru_RU.utf8        LC_TELEPHONE=ru_RU.utf8      LC_MEASUREMENT=ru_RU.utf8    LC_IDENTIFICATION=ru_RU.utf8

Package version:
  Rcpp_1.0.3      base64enc_0.1.3 digest_0.6.25   evaluate_0.14   glue_1.3.1      grDevices_3.6.2 graphics_3.6.2 
  highr_0.8       htmltools_0.4.0 jsonlite_1.6.1  knitr_1.28      magrittr_1.5    markdown_1.1    methods_3.6.2  
  mime_0.9        rlang_0.4.4     rmarkdown_2.1   stats_3.6.2     stringi_1.4.6   stringr_1.4.0   tinytex_0.19   
  tools_3.6.2     utils_3.6.2     xfun_0.12       yaml_2.2.1     

Pandoc version: 2.7.3

By filing an issue to this repo, I promise that

  • I have fully read the issue guide at https://yihui.org/issue/.
  • I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('rmarkdown'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('rstudio/rmarkdown').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.

@cderv
Copy link
Collaborator

cderv commented Feb 23, 2020

I think that indeed the support for HTML5 can be improve here.

Just to complete, pandoc_args: ["--to=html5"] is not working correctly I think because --to=html4 is also provided, and the pandoc_args does not override the one from the html_document format.

I don't know how yet the html_document is tied with html4 not html5, but it seems reasonable that when custom template are provided the html4 is not enforced.

I think without a big impact we could either insure that if a --to pandoc args is provided it override the previous one, or if a custom template is provided the to is set to the extension of the template (like .html5 as you did). Just throwing some ideas here to improve this...
This need to be think through

Let's not that you could also create your own output_format based on html_document, to modify the pandoc option and sets it to html5. Here is an example using the output_format argument of the render function.

Here is an example where you can see that --to html5 is present, but not --to html4 as before.

html5_document <- rmarkdown::output_format(
  # this is required but will be override by the base_format
  knitr = NULL,
  # this will be merge with the base_format, modifying the default html -> html4
  pandoc = list(to = "html5"),
  # use your html_document with correct option as a base format to modify
  base_format = rmarkdown::html_document(
    fig_width = 8,
    fig_height = 6,
    toc = TRUE,
    theme = NULL,
    highlight = "pygments",
    md_extensions = "-autolink_bare_uris",
    self_contained = FALSE,
    # I don't have this template file 
    # template = "journal.html5"
  )
)
# working in a temp folder for the example
dir.create(tmp_dir <- tempfile())
old <- setwd(tmp_dir)
xfun::write_utf8(c(
  "---",
  "title: test",
  "---",
  "",
  "# title test"
), "test.Rmd")
# enforcing the custom output_format
rmarkdown::render("test.Rmd", output_format = html5_document)
#> processing file: test.Rmd
#> output file: test.knit.md
#> "C:/PROGRA~3/CHOCOL~1/bin/pandoc" +RTS -K512m -RTS test.utf8.md --to html5 --from markdown+tex_math_single_backslash-autolink_bare_uris+smart --output test.html --email-obfuscation none --standalone --section-divs --table-of-contents --toc-depth 3 --template "C:\Users\chris\Documents\R\win-library\3.6\rmarkdown\rmd\h\default.html" --highlight-style pygments --include-in-header "C:\Users\chris\AppData\Local\Temp\RtmpGyb1a2\rmarkdown-str5bcc7ee71a7c.html" --mathjax --variable "mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --lua-filter "C:/Users/chris/Documents/R/win-library/3.6/rmarkdown/rmd/lua/pagebreak.lua" --lua-filter "C:/Users/chris/Documents/R/win-library/3.6/rmarkdown/rmd/lua/latex-div.lua"
#> 
#> Output created: test.html
setwd(old)

Created on 2020-02-23 by the reprex package (v0.3.0.9001)

I think this is currently the way to go to customize an output format and change some default behavior. This output_format + base_format is a good customisation mechanism.

Hope it helps.

@UlvHare
Copy link
Author

UlvHare commented Feb 26, 2020

Merci bien, cderv!
It's a very good idea to make a custom html5 output format. I tried to do it some years ago but got stucked with the too small example at help pages. Yor one is much more clear.
As for me, it's enough to be happy, but maybe other people need html5-output "out of the box" in rmarkdown, so I don't close this issue.

@cderv cderv added the feature a feature request or enhancement label Jan 10, 2022
@cderv cderv changed the title If possible, please NOT force rmarkdown to render html as html4 offer a HTML5 compatible format as html_document() is HTML4 only Jan 10, 2022
@cderv cderv moved this to Backlog in R Markdown Team Projects Jan 10, 2022
@cderv cderv added the theme: HTML5 releated to HTML5 formats label Jan 10, 2022
@octomike
Copy link

octomike commented Oct 30, 2024

This just became very relevant for me. It turns out, pandoc started to escape href URIs starting with 3.2.1 because of jgm/pandoc#9905

This specifically broke all toc entries in my document that do not consisted solely of ascii characters:

Uncaught Error: Syntax error, unrecognized expression: #bundesl%C3%A4nder
    jQuery 7
    refresh file:///XX/report_files/bootstrap-3.3.7/js/bootstrap.min.js:6
    jQuery 3
    refresh file:///XX/report_files/bootstrap-3.3.7/js/bootstrap.min.js:6
    b file:///XX/report_files/bootstrap-3.3.7/js/bootstrap.min.js:6
    c file:///XX/report_files/bootstrap-3.3.7/js/bootstrap.min.js:6
    jQuery 2
    c file:///XX/report_files/bootstrap-3.3.7/js/bootstrap.min.js:6
    <anonymous> file:///XX/report_files/downcute-0.1/downcute.js:44
    EventListener.handleEvent* file:///XX/report_files/downcute-0.1/downcute.js:11

A way to tell pandoc to use --to html5 would solve this.

@cderv
Copy link
Collaborator

cderv commented Oct 31, 2024

Thanks for the feedback. This is not a trivial addition to R Markdown, and nowadays effort for new feature as focus on Quarto.

If you don't know quarto, this is really similar to R Markdown approach, and it is based on years for experience. See https://quarto.org and https://quarto.org/docs/faq/rmarkdown.html

The HTML format offered by Quarto is using html5: https://quarto.org/docs/output-formats/html-basics.html

So if you are working on new project, this is a good tool to start with for those project. You would get HTML5 right away, with a bunch of new features, while having all the benefits of R Markdown and knitr that you already leverage.

This just became very relevant for me. It turns out, pandoc started to escape href URIs starting with 3.2.1 because of jgm/pandoc#9905

This specifically broke all toc entries in my document that do not consisted solely of ascii characters:

HTML5 or HTML4 aside, this looks like an issue with newer pandoc that I need to look at 🤔

@atusy
Copy link
Collaborator

atusy commented Nov 1, 2024

@octomike

If you want to use R Markdown (not Quarto), then my minidown package would help you.
It employs HTML5 and implements some major features of html_document.

https://github.com/atusy/minidown

@cderv
Copy link
Collaborator

cderv commented Nov 4, 2024

Thanks @atusy I did not realize it was using HTML5 ! Really cool ! Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement theme: HTML5 releated to HTML5 formats
Projects
Status: Backlog
Development

No branches or pull requests

4 participants