-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP1251 char set in file name #476
Comments
I believe the root cause of this is a change in the encoding handling of We are contemplating what to do to about this and wondering if the change in base R was intended. |
Here is how Lines 9 to 13 in 66df4b9
And here's where it is used (note the UTF-8 you see here pertains to the processing of strings inside the file): Lines 28 to 34 in 66df4b9
@jimhester Would it make sense to get the path prep out of compiled code altogether and deal with it on the R side? |
For example cairo_pdf() crash with 'Out of memory' error in same situation.
----- Original Message -----
From: Jennifer (Jenny) Bryan
To: tidyverse/readxl
Cc: yurasmol ; Author
Sent: Friday, May 04, 2018 1:08 AM
Subject: Re: [tidyverse/readxl] CP1251 char set in file name (#476)
I believe the root cause of this is a change in the encoding handling of base::normalizePath(). I think this is basically the same problem over in readr: tidyverse/readr#834
We are contemplating what to do to about this and wondering if the change in base R was intended.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Why it does not work for *.XLS, but works fine for *.XLSX file extensions?
Where is the difference?
----- Original Message -----
From: Jennifer (Jenny) Bryan
To: tidyverse/readxl
Cc: yurasmol ; Author
Sent: Friday, May 04, 2018 2:45 AM
Subject: Re: [tidyverse/readxl] CP1251 char set in file name (#476)
Here is how base::normalizePath() is made available when ingesting xls:
https://github.com/tidyverse/readxl/blob/66df4b9e65babd7e5b63db2f371fc54d29ef98b8/src/XlsWorkBook.h#L9-L13
And here's where it is used (note the UTF-8 you see here is about the file contents):
https://github.com/tidyverse/readxl/blob/66df4b9e65babd7e5b63db2f371fc54d29ef98b8/src/XlsWorkBook.h#L28-L34
@jimhester Would it make sense to get the path prep out of compiled code altogether and deal with it on the R side?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I'm not sure if that has the same origin or not. Quite a different symptom.
Internally, we call |
As I described in tidyverse/readr#834 (comment), I bet this is intended. as the change of And, I think it's readr's and readxl's responsibility to take care of the encoding of the string in C++ functions. See tidyverse/readr#838. (Sorry, I don't have enough time to explain about the detail for now... Hope the information above is useful for you!) |
On Windows 7 and R version 3.5.0 (3.4.4 works without errors)
with the file included in ZIP (I can not upload .XLS files in this form).
ISSUE.zip
library(readxl)
fName <- "ТЕСТ.xls"
RES <- read_excel(fName, sheet=1, col_names=TRUE, col_types=NULL, na=" ", skip=0)
Error message is:
Error in read_fun(path = path, sheet_i = sheet, limits = limits, ...)
Filed to open ТЕСТ.xls
!?!File with .XLSX extension works without errors:
fName <- "ТЕСТ.xlsx"
But I always prefer Excel 2003 .XLS format.
Session Info:
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=Russian_Russia.1251 LC_CTYPE=Russian_Russia.1251
[3] LC_MONETARY=Russian_Russia.1251 LC_NUMERIC=C
[5] LC_TIME=Russian_Russia.1251
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.0
The text was updated successfully, but these errors were encountered: