Special Characters in References not rendering correctly #158

fmsabatini · 2021-02-23T13:05:02Z

I'm having trouble with my awesomeCV.
After updating to R 4.0.2 it's not rendering my reference list correctly anymore. The special characters (umlaut, apostrophs and so on) are not picked up as UTF-8 symbols.
I spent all morning updating all packages, as well as pandoc. Yet, the references are not displaying as expected:

SessionInfo()

R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rmarkdown_2.7    dplyr_1.0.0      vitae_0.4.2.9000

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6      highr_0.8         pillar_1.4.4      compiler_4.0.1    prettyunits_1.1.1 remotes_2.2.0     testthat_2.3.2    digest_0.6.25     pkgbuild_1.0.8    pkgload_1.1.0    
[11] jsonlite_1.6.1    tibble_3.0.1      memoise_1.1.0     evaluate_0.14     lifecycle_0.2.0   pkgconfig_2.0.3   rlang_0.4.10      cli_2.0.2         rstudioapi_0.11   curl_4.3         
[21] yaml_2.2.1        xfun_0.21         stringr_1.4.0     withr_2.2.0       knitr_1.29        hms_0.5.3         desc_1.2.0        generics_0.0.2    fs_1.4.1          vctrs_0.3.6      
[31] devtools_2.3.0    tidyselect_1.1.0  rprojroot_2.0.2   glue_1.4.1        R6_2.4.1          processx_3.4.2    fansi_0.4.1       sessioninfo_1.1.1 readr_1.3.1       purrr_0.3.4      
[41] callr_3.4.3       magrittr_1.5      ps_1.3.3          ellipsis_0.3.1    htmltools_0.5.0   usethis_1.6.1     assertthat_0.2.1  utf8_1.1.4        tinytex_0.24      stringi_1.4.6    
[51] crayon_1.3.4

Pandoc Version

rmarkdown::pandoc_version()
[1] ‘2.11.2’

I appreciate this is a problem linked to the locale of my Windows 10. I tried to change my locale to UTF-8, but without success

> Sys.getlocale() 
[1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"

Sys.setlocale("LC_CTYPE", "UTF-8")
`[1] ""
Warning message:
In Sys.setlocale("LC_CTYPE", "UTF-8") :
  OS reports request to set locale to "UTF-8" cannot be honored`

Suggestions?

The text was updated successfully, but these errors were encountered:

andreamrau · 2021-02-24T15:40:13Z

Thanks so much for the fantastic work on the vitae package -- I've really been enjoying using it to semi-automate the construction of my CV.

I'm bumping up this issue because I'm having the exact same problem as @fmsabatini myself. I'm on Windows 10, R 4.0.0, and the latest version of pandoc (2.11.4), and the accents in the references of the awesomecv template are not rendering correctly (exactly as in the above screenshot).

The encoding error clearly seems to be due to the locale of Windows 10, and it specifically occurs when pandoc is called via rmarkdown::pandoc_citeproc_convert in vitae::bibliography_entries, i.e. the special characters appear incorrectly when calling pandoc packages.bib -s -t csljson.

I've noted that if I manually change the accents to HTML (e.g. ü -> ü) in the .bib file, they do render correctly. This is a workaround that works for me since I'm not generating my .bib file within my .Rmd file.

montesmariana · 2021-03-26T16:04:17Z

I'm having the same issue, and it's very upsetting because many of my publications are in Spanish.
I'm also using Windows 10, R 4.0.4, pandoc version 2.11.2. I tried changing the accents to HTML in the original file and that didn't work. Saving the bib file with the local encoding doesn't work either.

So I decided to do the cleaning after loading the file, directly on the tibble. It is not super elegant, particularly because I have to bypass the nested lists with S3 classes, but for now it works!

win2utf <- function(string) {
  if (is.null(string) | inherits(string, "csl_dates")) return(string)
  if (is.character(string)) {
    mapping <- list(
      c("Ã©", "é"),
      c("Ã³", "ó"),
      c("Ã¡", "á"),
      c("Ã¼", "ü"),
      c("Ã", "í"),
      c("Ã�", "Á"),
      c("Ãº", "ú"),
      c("Ã±", "ñ"),
      c("â€™", "'"),
      c("â€“", '"'),
      c("í¶", "ö")
    )
    for (mapper in mapping){
      string <- gsub(mapper[[1]], mapper[[2]], string)
    }
  } else if (is.list(string)) {
    iscsllist <- inherits(string, "list_of_csl_names")
    baselist <- map(string, function(string_element) {
      if (inherits(string_element, "csl_name")) {
        for (cname in names(string_element)) {
          string_element[[cname]] <- win2utf(string_element[[cname]])
        }
        return(string_element)
      } else {
        return(win2utf(string_element))
      }
    })
    if (iscsllist) {
      string <- list_of_csl_names(baselist)
    } else {
        string <- baselist
      }
  }
  return(string)
}

bibliography_entries("publications_recoded.bib") %>%
  mutate(across(everything(), win2utf)) %>% 
  arrange(author$family, desc(issued))

I manually copy-pasted the wrongly parsed characters in the mapping, but I don't know if they are properly pasted here. Those are just the characters I had in my current list, so it might need expansion... Any idea if there is such a mapping somewhere?

sbacelar · 2021-04-14T10:28:57Z

I have the same problem with Portuguese characters in Windows 10 but not with MacOS or Linux with the same files.

sebdunnett · 2021-07-26T13:22:35Z

Bumping with same problem on Windows 10 outputting to PDF through RMarkdown/pandoc/xelatex. Absolutely love the vitae package, thank you so much for creating it. PDF outputs with accented letters in the bib were definitely generating fine Nov of last year (when I got my current job).

Mendeley (my ref software of choice) escapes the characters fine in the bib file, e.g. á is {'a} etc. Pulling the bib file in with vitae::bibliography_entries() generates the Ã© characters @montesmariana talks about. Using their brilliant custom function replaces them fine within RStudio, but then they reappear in the PDF knit.

mitchelloharawild · 2021-07-27T00:01:52Z

Could you provide a sample of a bib file which produces this issue?

montesmariana · 2021-07-27T07:50:15Z

Could you provide a sample of a bib file which produces this issue?

Here is the link to my .bib file.

If I run it with the win2utf function above, this is what it looks like:

If instead I run it without:

I think that I also tried to open the file and save it again with UTF-8 encoding or something like that, but the result is the same as with this file.

sebdunnett · 2021-07-28T07:56:31Z

Could you provide a sample of a bib file which produces this issue?

Ran @montesmariana's bib file through my code and same result

mitchelloharawild · 2021-07-28T08:10:49Z

Thanks for the reprex and confirmation. I'm currently hoping for some input on the {rmarkdown} side: rstudio/rmarkdown#2195

It might be an issue with how rmarkdown::pandoc_citeproc_convert() passes the encoded bib file to pandoc.

montesmariana · 2021-07-28T08:57:00Z

Curiously, that is not a problem in other rmarkdown documents: I can use this same bib file and the references are rendered nicely.
I also tried again with a .bib file generated with Zotero and BetterBibtex, which escapes the character like @sedbunett mentioned, but the result is the same: it works with win2utf(), it doesn't otherwise.

mitchelloharawild · 2021-07-28T11:41:02Z

This should be fixed now. Please try installing the development version of the package and see if it works for you.
Thanks for your bug reports!

montesmariana · 2021-07-28T11:44:50Z

Yes, it works perfectly for me, thank you for your hard work!

sebdunnett · 2021-07-29T09:47:05Z

Ditto, thanks so much for the quick fix @mitchelloharawild

This was referenced Jul 27, 2021

Error with vitae when using non-English characters on Windows #167

Closed

Encoding problems in pandoc_citeproc_convert() with Windows rstudio/rmarkdown#2195

Closed

mitchelloharawild added the bug Something isn't working label Jul 28, 2021

mitchelloharawild self-assigned this Jul 28, 2021

mitchelloharawild closed this as completed in 01d8591 Jul 28, 2021

CarlosPoses mentioned this issue Sep 6, 2021

Special characters read from .xlsx or .csv files not rendered correctly #180

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special Characters in References not rendering correctly #158

Special Characters in References not rendering correctly #158

fmsabatini commented Feb 23, 2021 •

edited

Loading

andreamrau commented Feb 24, 2021 •

edited

Loading

montesmariana commented Mar 26, 2021

sbacelar commented Apr 14, 2021 •

edited

Loading

sebdunnett commented Jul 26, 2021

mitchelloharawild commented Jul 27, 2021

montesmariana commented Jul 27, 2021

sebdunnett commented Jul 28, 2021

mitchelloharawild commented Jul 28, 2021

montesmariana commented Jul 28, 2021

mitchelloharawild commented Jul 28, 2021

montesmariana commented Jul 28, 2021

sebdunnett commented Jul 29, 2021

Special Characters in References not rendering correctly #158

Special Characters in References not rendering correctly #158

Comments

fmsabatini commented Feb 23, 2021 • edited Loading

andreamrau commented Feb 24, 2021 • edited Loading

montesmariana commented Mar 26, 2021

sbacelar commented Apr 14, 2021 • edited Loading

sebdunnett commented Jul 26, 2021

mitchelloharawild commented Jul 27, 2021

montesmariana commented Jul 27, 2021

sebdunnett commented Jul 28, 2021

mitchelloharawild commented Jul 28, 2021

montesmariana commented Jul 28, 2021

mitchelloharawild commented Jul 28, 2021

montesmariana commented Jul 28, 2021

sebdunnett commented Jul 29, 2021

fmsabatini commented Feb 23, 2021 •

edited

Loading

andreamrau commented Feb 24, 2021 •

edited

Loading

sbacelar commented Apr 14, 2021 •

edited

Loading