Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error detecting locale - incomplete final line #233

Closed
cderv opened this issue Jan 10, 2018 · 13 comments · Fixed by #715
Closed

Error detecting locale - incomplete final line #233

cderv opened this issue Jan 10, 2018 · 13 comments · Fixed by #715
Labels
bug an unexpected problem or unintended behavior encoding 🌏

Comments

@cderv
Copy link
Contributor

cderv commented Jan 10, 2018

Hi,

When deploying to a RStudio Connect server with rsconnect, I have this warning.

Warning message:
Error detecting locale: Error in read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'raw'
 (Using default: en_US) 

I believe there is something wrong in the process of detecting local on Windows.

I found that rsconnect:::systemInfo is used for detecting local on windows. The command systeminfo /FO csv is called with system. I get back a result in a csv format. Currently, the function systemInfo use read.csv that cause the error. If I use readr::read_csv no more error.

raw <- system("systeminfo /FO csv", intern = TRUE, wait = TRUE)
# I get a warning
info.csv <- read.csv(textConnection(raw))
#> Error in read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'raw'
# I get no warning
info_csv <- readr::read_csv(raw)
#  and locale can be detected
locale <- as.character(info_csv[[20]])
locale
#> [1] "fr;Fran<U+0087>ais (France)"
# but I get an encoding issue that cause error after
strsplit(unlist(strsplit(locale, ";", fixed=TRUE)), "-", fixed=TRUE)
#> Warning in strsplit(locale, ";", fixed = TRUE): la chaîne de caractères
#> entrée 1 est incorrecte comme encodage UTF-8
#> [[1]]
#> [1] NA

# If I try to provide raw as text input, no warning
info.csv.txt <- read.csv(text = raw)
# I get a "strange" character
locale <- as.character(info.csv.txt[[20]])
locale
#> [1] "fr;Fran‡ais (France)"
# but it works
strsplit(unlist(strsplit(locale, ";", fixed=TRUE)), "-", fixed=TRUE)
#> [[1]]
#> [1] "fr"
#> 
#> [[2]]
#> [1] "Fran‡ais (France)"

Not sure what is the encoding of the string return by systeminfo /FO csv.
So I tested with what I know

readr::guess_encoding(raw)
#> # A tibble: 2 x 2
#>       encoding confidence
#>          <chr>      <dbl>
#> 1 windows-1252       0.38
#> 2 windows-1250       0.22
stringi::stri_enc_detect(raw)
#> [[1]]
#> [[1]]$Encoding
#> [1] "windows-1252" "windows-1250" "windows-1254" "UTF-16BE"    
#> [5] "UTF-16LE"     "GB18030"      "EUC-JP"       "EUC-KR"      
#> [9] "Big5"        
#> 
#> [[1]]$Language
#> [1] "fr" "ro" "tr" ""   ""   "zh" "ja" "ko" "zh"
#> 
#> [[1]]$Confidence
#> [1] 0.72 0.37 0.14 0.10 0.10 0.10 0.10 0.10 0.10
#> 
#> 
#> [[2]]
#> [[2]]$Encoding
#> [1] "windows-1252" "windows-1250" "UTF-16BE"     "UTF-16LE"    
#> [5] "EUC-JP"       "EUC-KR"       "windows-1254"
#> 
#> [[2]]$Language
#> [1] "fr" "ro" ""   ""   "ja" "ko" "tr"
#> 
#> [[2]]$Confidence
#> [1] 0.22 0.15 0.10 0.10 0.10 0.10 0.06
stringi::stri_enc_detect2(raw, locale = "fr")
#> [[1]]
#> [[1]]$Encoding
#>  [1] "macintosh"          "x-mac-turkish"      "ISO-8859-15"       
#>  [4] "windows-1258"       "ibm-1258_P100-1997" "ibm-1129_P100-1997"
#>  [7] "windows-1252"       "windows-1254"       "ibm-1252_P100-2000"
#> [10] "ibm-1254_P100-1995"
#> 
#> [[1]]$Language
#>  [1] NA NA NA NA NA NA NA NA NA NA
#> 
#> [[1]]$Confidence
#>  [1] 0.7500000 0.7500000 0.5833333 0.5833333 0.5833333 0.5833333 0.4027778
#>  [8] 0.4027778 0.4027778 0.4027778
#> 
#> 
#> [[2]]
#> [[2]]$Encoding
#>  [1] "ISO-8859-15"        "windows-1252"       "windows-1254"      
#>  [4] "windows-1258"       "ibm-1252_P100-2000" "ibm-1254_P100-1995"
#>  [7] "ibm-1258_P100-1997" "ibm-1129_P100-1997" "macintosh"         
#> [10] "x-mac-turkish"     
#> 
#> [[2]]$Language
#>  [1] NA NA NA NA NA NA NA NA NA NA
#> 
#> [[2]]$Confidence
#>  [1] 0.9655172 0.9655172 0.9655172 0.9655172 0.9655172 0.9655172 0.9655172
#>  [8] 0.9655172 0.7413793 0.7413793

If we try to provide encoding in some way

# If I specify the encoding in textConnection, no more warning
info.csv <- read.csv(textConnection(raw, encoding = "UTF-8"))
# I get also another "strange" character
locale <- as.character(info.csv[[20]])
locale
#> [1] "fr;Fran‡ais (France)"
# but it works
strsplit(unlist(strsplit(locale, ";", fixed=TRUE)), "-", fixed=TRUE)
#> [[1]]
#> [1] "fr"
#> 
#> [[2]]
#> [1] "Fran‡ais (France)"

Do you have any idea on this ? What fix could be done ?

Created on 2018-01-10 by the reprex package (v0.1.1.9000).

@bvprod2
Copy link

bvprod2 commented Oct 30, 2018

same issue seems related to non english language

@hfberg
Copy link

hfberg commented Jun 8, 2020

I had the same issue with read.xlsx. It was solved with xlsx::read.xlsx. Thank you!

@Sade154
Copy link

Sade154 commented Apr 11, 2021

Exactly same issue with the french language when deploying to shinyapps

@kevinushey
Copy link
Contributor

I wonder why we use systeminfo here instead of Sys.getlocale(). It looks like this code is now quite old so any memory of why is probably long gone ...

@kippandrew
Copy link
Contributor

My memory is very hazy, but I believe the issue is that Sys.getlocale() is platform specific. In other words, on windows, you'll get a value that isn't meaningful to linux, which shinyapps.io runs under the hood. systeminfo does provide a meaningful locale for linux.

@Sade154
Copy link

Sade154 commented Apr 30, 2021

The issue would be solved, i guess, if we could handle the weard encoding returned by system("systeminfo /FO csv"). On three different windows laptops (from France), I got the same output :

system("systeminfo /FO csv", intern = TRUE, wait = TRUE)

[1] "\"Nom de l'h“te\",\"Nom du systŠme d'exploitation\",\"Version du systŠme\",\"Fabricant du systŠme d'exploitation\",\"Configuration du systŠme d'exploitation\ [TRUNCATED]"

@kippandrew
Copy link
Contributor

That makes sense. One option you could try as a workaround: setting the rsconnect.locale option, which should bypass the automatic detection code.

@szmsu2011
Copy link

That makes sense. One option you could try as a workaround: setting the rsconnect.locale option, which should bypass the automatic detection code.

Could you please give an example on how to set the rsconnect.locale to English?

I have tried all sorts of ways such as options(rsconnect.locale = "en_US") and even changing my Windows locale to English (from Chinese) but the default locale detected by shinyapps.io is still CN.

I found a hacky way by setting Sys.setlocale("LC_ALL", "C") before the app is called, but in such a way, all my date formatting from lubridate broke.

Thank you in advance.

@kevinushey
Copy link
Contributor

You might need to set it within a .rsconnect_profile, especially if you're trying to deploy from RStudio. See ?rsconnect::options for more details.

@hadley hadley added bug an unexpected problem or unintended behavior and removed windows labels Feb 21, 2023
@cderv
Copy link
Contributor Author

cderv commented Feb 27, 2023

FWIW systeminfo /FO csv will return in Latin1 encoding. This is default for CMD output.

Doing this works if we can assumes that it will always return latin1

systemInfo <- function() {
  raw <- system("systeminfo /FO csv", intern = TRUE, wait = TRUE)
  Encoding(raw) <- rep_len("latin1", length(raw))
  info <- read.csv(textConnection(raw))
  return(info)
}

With CMD in Windows, we can also force to output in UTF-8 by changing the default code page.

For example

systemInfo <- function() {
  commands <- c(
    "@ECHO OFF",
    "CHCP 65001 > nul",
    "systeminfo /FO csv"
  )
  bat <- tempfile(fileext = ".bat")
  on.exit(unlink(bat), add = TRUE)
  writeLines(commands, bat, useBytes = TRUE)
  raw <- system(bat, intern = TRUE, wait = TRUE)
  info <- read.csv(textConnection(raw))
  return(info)
}
Git patch
diff --git a/R/locale.R b/R/locale.R
index f2d8b6b..74c7d37 100644
--- a/R/locale.R
+++ b/R/locale.R
@@ -78,7 +78,16 @@ systemLocale <- function() {
 }

 systemInfo <- function() {
-  raw <- system("systeminfo /FO csv", intern = TRUE, wait = TRUE)
+  commands <- c(
+    "@ECHO OFF",
+    "CHCP 65001 > nul",
+    "systeminfo /FO csv"
+  )
+  bat <- tempfile(fileext = ".bat")
+  on.exit(unlink(bat), add = TRUE)
+  writeLines(commands, bat, useBytes = TRUE)
+  raw <- system(bat, intern = TRUE, wait = TRUE)
   info <- read.csv(textConnection(raw))
   return(info)
 }

Hope it helps

@hadley
Copy link
Member

hadley commented Feb 27, 2023

Possible alternative approach from @gaborcsardi:

utils::readRegistry("Control Panel\\International\\User Profile", hive = "HCU")$Languages
#> [1] "en-US" "de-DE" "es-ES" "hu"

We're just exploring how far back in time this key exists.

@cderv
Copy link
Contributor Author

cderv commented Feb 27, 2023

Regarding alternative, when targeting windows only, powershell can be an option

shell("(Get-WinSystemLocale).Name", "powershell", intern = TRUE)
#> [1] "fr-FR"
system2("powershell", c("-Command", "(Get-WinSystemLocale).Name"), stdout = TRUE)
#> [1] "fr-FR"

Available since windows 8/server2012 I think. Powershell is not used that much with R but I think it available by default on windows since some time. I use that in ps1 script but probably reading from registry is better from R. Just sharing in case it can help

@gaborcsardi
Copy link

It needs Windows 8.1 it seems.

But this works on Windows 10 and back to Vista, everywhere:

utils::readRegistry(hive = "HCU", "Control Panel\\International")$LocaleName
#> en-US

hadley added a commit that referenced this issue Feb 27, 2023
hadley added a commit that referenced this issue Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior encoding 🌏
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants