Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special Characters in References not rendering correctly #158

Closed
fmsabatini opened this issue Feb 23, 2021 · 12 comments
Closed

Special Characters in References not rendering correctly #158

fmsabatini opened this issue Feb 23, 2021 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@fmsabatini
Copy link

fmsabatini commented Feb 23, 2021

I'm having trouble with my awesomeCV.
After updating to R 4.0.2 it's not rendering my reference list correctly anymore. The special characters (umlaut, apostrophs and so on) are not picked up as UTF-8 symbols.
I spent all morning updating all packages, as well as pandoc. Yet, the references are not displaying as expected:

image

SessionInfo()

R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rmarkdown_2.7    dplyr_1.0.0      vitae_0.4.2.9000

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6      highr_0.8         pillar_1.4.4      compiler_4.0.1    prettyunits_1.1.1 remotes_2.2.0     testthat_2.3.2    digest_0.6.25     pkgbuild_1.0.8    pkgload_1.1.0    
[11] jsonlite_1.6.1    tibble_3.0.1      memoise_1.1.0     evaluate_0.14     lifecycle_0.2.0   pkgconfig_2.0.3   rlang_0.4.10      cli_2.0.2         rstudioapi_0.11   curl_4.3         
[21] yaml_2.2.1        xfun_0.21         stringr_1.4.0     withr_2.2.0       knitr_1.29        hms_0.5.3         desc_1.2.0        generics_0.0.2    fs_1.4.1          vctrs_0.3.6      
[31] devtools_2.3.0    tidyselect_1.1.0  rprojroot_2.0.2   glue_1.4.1        R6_2.4.1          processx_3.4.2    fansi_0.4.1       sessioninfo_1.1.1 readr_1.3.1       purrr_0.3.4      
[41] callr_3.4.3       magrittr_1.5      ps_1.3.3          ellipsis_0.3.1    htmltools_0.5.0   usethis_1.6.1     assertthat_0.2.1  utf8_1.1.4        tinytex_0.24      stringi_1.4.6    
[51] crayon_1.3.4 

Pandoc Version

rmarkdown::pandoc_version()
[1] ‘2.11.2’

I appreciate this is a problem linked to the locale of my Windows 10. I tried to change my locale to UTF-8, but without success

> Sys.getlocale() 
[1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"
Sys.setlocale("LC_CTYPE", "UTF-8")
`[1] ""
Warning message:
In Sys.setlocale("LC_CTYPE", "UTF-8") :
  OS reports request to set locale to "UTF-8" cannot be honored`

Suggestions?

@andreamrau
Copy link

andreamrau commented Feb 24, 2021

Thanks so much for the fantastic work on the vitae package -- I've really been enjoying using it to semi-automate the construction of my CV.

I'm bumping up this issue because I'm having the exact same problem as @fmsabatini myself. I'm on Windows 10, R 4.0.0, and the latest version of pandoc (2.11.4), and the accents in the references of the awesomecv template are not rendering correctly (exactly as in the above screenshot).

The encoding error clearly seems to be due to the locale of Windows 10, and it specifically occurs when pandoc is called via rmarkdown::pandoc_citeproc_convert in vitae::bibliography_entries, i.e. the special characters appear incorrectly when calling pandoc packages.bib -s -t csljson.

I've noted that if I manually change the accents to HTML (e.g. ü -> ü) in the .bib file, they do render correctly. This is a workaround that works for me since I'm not generating my .bib file within my .Rmd file.

@montesmariana
Copy link

I'm having the same issue, and it's very upsetting because many of my publications are in Spanish.
I'm also using Windows 10, R 4.0.4, pandoc version 2.11.2. I tried changing the accents to HTML in the original file and that didn't work. Saving the bib file with the local encoding doesn't work either.

So I decided to do the cleaning after loading the file, directly on the tibble. It is not super elegant, particularly because I have to bypass the nested lists with S3 classes, but for now it works!

win2utf <- function(string) {
  if (is.null(string) | inherits(string, "csl_dates")) return(string)
  if (is.character(string)) {
    mapping <- list(
      c("é", "é"),
      c("ó", "ó"),
      c("á", "á"),
      c("ü", "ü"),
      c("í", "í"),
      c("Ã�", "Á"),
      c("ú", "ú"),
      c("ñ", "ñ"),
      c("’", "'"),
      c("–", '"'),
      c("í¶", "ö")
    )
    for (mapper in mapping){
      string <- gsub(mapper[[1]], mapper[[2]], string)
    }
  } else if (is.list(string)) {
    iscsllist <- inherits(string, "list_of_csl_names")
    baselist <- map(string, function(string_element) {
      if (inherits(string_element, "csl_name")) {
        for (cname in names(string_element)) {
          string_element[[cname]] <- win2utf(string_element[[cname]])
        }
        return(string_element)
      } else {
        return(win2utf(string_element))
      }
    })
    if (iscsllist) {
      string <- list_of_csl_names(baselist)
    } else {
        string <- baselist
      }
  }
  return(string)
}

bibliography_entries("publications_recoded.bib") %>%
  mutate(across(everything(), win2utf)) %>% 
  arrange(author$family, desc(issued))

I manually copy-pasted the wrongly parsed characters in the mapping, but I don't know if they are properly pasted here. Those are just the characters I had in my current list, so it might need expansion... Any idea if there is such a mapping somewhere?

@sbacelar
Copy link

sbacelar commented Apr 14, 2021

I have the same problem with Portuguese characters in Windows 10 but not with MacOS or Linux with the same files.

@sebdunnett
Copy link

Bumping with same problem on Windows 10 outputting to PDF through RMarkdown/pandoc/xelatex. Absolutely love the vitae package, thank you so much for creating it. PDF outputs with accented letters in the bib were definitely generating fine Nov of last year (when I got my current job).

Mendeley (my ref software of choice) escapes the characters fine in the bib file, e.g. á is {'a} etc. Pulling the bib file in with vitae::bibliography_entries() generates the é characters @montesmariana talks about. Using their brilliant custom function replaces them fine within RStudio, but then they reappear in the PDF knit.

@mitchelloharawild
Copy link
Owner

Could you provide a sample of a bib file which produces this issue?

@montesmariana
Copy link

Could you provide a sample of a bib file which produces this issue?

Here is the link to my .bib file.

If I run it with the win2utf function above, this is what it looks like:
win2utf_original

If instead I run it without:
raw_original

I think that I also tried to open the file and save it again with UTF-8 encoding or something like that, but the result is the same as with this file.

@sebdunnett
Copy link

Could you provide a sample of a bib file which produces this issue?

Ran @montesmariana's bib file through my code and same result

@mitchelloharawild mitchelloharawild added the bug Something isn't working label Jul 28, 2021
@mitchelloharawild mitchelloharawild self-assigned this Jul 28, 2021
@mitchelloharawild
Copy link
Owner

Thanks for the reprex and confirmation. I'm currently hoping for some input on the {rmarkdown} side: rstudio/rmarkdown#2195

It might be an issue with how rmarkdown::pandoc_citeproc_convert() passes the encoded bib file to pandoc.

@montesmariana
Copy link

Curiously, that is not a problem in other rmarkdown documents: I can use this same bib file and the references are rendered nicely.
I also tried again with a .bib file generated with Zotero and BetterBibtex, which escapes the character like @sedbunett mentioned, but the result is the same: it works with win2utf(), it doesn't otherwise.

@mitchelloharawild
Copy link
Owner

This should be fixed now. Please try installing the development version of the package and see if it works for you.
Thanks for your bug reports!

@montesmariana
Copy link

Yes, it works perfectly for me, thank you for your hard work!

@sebdunnett
Copy link

Ditto, thanks so much for the quick fix @mitchelloharawild

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants