-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated and unable to read certain excel files now #496
Comments
libxls, which readxl embeds, got some big updates lately. These have been overwhelmingly positive but we've also seen some regressions. Is that really your error message? Specifically, I'm wondering if you've sanitized it. I'm wondering what the actual path is. I'm busy at the Joint Statistical Meetings this week. But when that's done, I can attempt to read this file with a standalone libxls tool to determine if the problem is in the R code/package or libxls. To get unstuck, consider going backwards (to previous CRAN version) or forwards (to current dev version on GitHub). |
Hi Jenny No, I haven't sanitised the error message at all. They read:
Thanks for the pointer. I'll try to install a different version. |
So your username is truly I really want to know if there is any chance you have a space or other challenging characters in this path.
|
I seem to have a similar issue:
At first I thought the path was wrong, but running:
So for me at least, this is not as simple. /Lasse |
@lasseklokker It looks like your readxl installation is broken. Please try re-installing readxl from a fresh R session. e.g. Restart R, don't load any packages and then |
@jimhester you are absolutely correct! I works now. I had some processes starting R in the background making that realisation hard to make on my own! I have fixed it now, thanks for the suggestion! |
Hi Jenny The username is Dominic, so no special characters I have tried to install the package from a fresh session but that doesn't seem to work for me:
|
This is very peculiar and suggests something is goofy with your readxl installation. Can you run basic examples that just read example sheets that ship with the package? |
I've run all of the basic examples from the example sheets without any issues. More than that, I can import simple spreadsheets that I've created. I used to be able to import data from the Australia Bureau of Statistics using readxl but now, no dice.
I've reinstalled it a few times now. Any advice on next steps would be appreciated. |
Can you apply |
Yes. [1] "iris" "mtcars" "chickwts" "quakes" ` Does this mean that the issue has to do with the ABS spreadsheet that I'm trying to read? Some kind of quirk in that spreadsheet that was catered for in the previous version of readxl that wasn't carried over into the new perhaps. |
I have no idea how the target file could make That is, I don't know how to reconcile these two results:
and
What happens if you try |
`> excel_sheets(readxl_example("datasets.xlsx"))
Looks like one works and the other doesn't, even in the same session. |
Jenny - Can I ask, if you download the file and try to import it, does it work for you? |
OK so the function No, I cannot import this file with the dev version readxl. Yes I suspect the problem is this:
I will know for sure when I pull upstream changes from libxls and create a standalone xls2csv tool and can try to import this file without using readxl at all. More soon ... |
Is there anything that someone with intermediate r skills could do to help here Jenny? |
Almost certainly no 😞 The practical move is to use the previous version of readxl, which embeds an earlier version of libxls. I have tracked down a few of these things in libxls but that is all in C and it can be fairly painful. Once I do a readxl development push, I am fairly hopeful I can solve or heave over the wall to libxls with a good example. But that's not a focus right this moment. It will come back around. |
Dear @dominicshore and @jennybc, I can't open some Excel files too. Please take a look at the following code and output:
The same error appears using R 3.5.0, but I have no problem if I use R version 3.4.4 or 3.4.3. So I think the problem is not inside the readxl function. @jennybc, I can send the Excel file to you if you think it can help you to solve the problem. |
@henrique-andrade I am pretty sure that is a separate issue already fixed in the devel version of readxl #477. It occurs due to a change in how base-R handles file paths with non-ascii characters. |
I had a similar situation
The first two files contain 120 worksheets each, and the last file is the combination version of the first two. I have no problem in reading the first two files, but it gives error when I read the last one I've solve this problem by saving the third file as "xlsx".The file size is reduced from 7,387 KB to 3,975 KB. And when I read the "xlsx" file, it doesn't give any error. |
I am having similar issues where .xls files larger than 7MB aren't read in properly but when saving as .xlsx makes it work. Is there a way to not have to do this? |
No, I didn’t find a better way.
From: Kaushik Mohan
Sent: Wednesday, November 7, 2018 02:00
To: tidyverse/readxl
Cc: action1947; Comment
Subject: Re: [tidyverse/readxl] Updated and unable to read certain excel filesnow (#496)
I am having similar issues where .xls files larger than 7MB aren't read in properly but when saving as .xlsx makes it work. Is there a way to not have to do this?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Dear, @jimhester, |
Having embedded a newer version of libxls in dev readxl, I can now read the original sheet posted by @dominicshore: readxl::read_excel("investigations/6202012.xls")
#> New names:
#> * `` -> `..1`
#> * `` -> `..3`
#> * `` -> `..4`
#> * `` -> `..5`
#> * `` -> `..6`
#> * … and 6 more
#> # A tibble: 755 x 12
#> ..1 `Time Series Wo… ..3 ..4 ..5 ..6 ..7 ..8 ..9 ..10
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 3 <NA> 6202.0 Labour F… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 <NA> Table 12. Labou… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 6 Rela… Summary Publica… Expl… Inqu… <NA> <NA> <NA> <NA> <NA> <NA>
#> 7 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 8 Data… <NA> <NA> Seri… Seri… Seri… Seri… No. … Unit Data…
#> 9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 10 Empl… <NA> <NA> Trend A844… 28522 43252 485 000 STOCK
#> # … with 745 more rows, and 2 more variables: ..11 <chr>, ..12 <chr> Created on 2018-12-13 by the reprex package (v0.2.1.9000) This issue attracted a lot of "me too" comments. If others test their problem sheets with the dev version of readxl and still have problems, please open a new issue and link or attach the problem workbook. |
I'm getting this error when trying to load xlsx file: Error parsing file 'Media data and other information.xlsx'. Any idea how to resolve this?? |
Based on the Perl stuff in your error, I would guess you're using gdata. |
I am having trouble opening/reading excel files that I download from the website of the Australian Bureau of Statistics using readxl.
I've downloaded Table 12 from the website but when I go to read the sheets of the workbook in
r
I get an error message:In previous versions of
readxl
I have had no trouble reading these files intor
but I've recently updated my readxl version, after a hiatus of several months, and now it doesn't work.I have tried to download the file using the
download.file
function taking care to setmode = wb
but that makes no difference to being able to access the data in the workbook either.Grateful for any pointers.
The text was updated successfully, but these errors were encountered: