Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to open a specific .xls: Unable to allocate memory #478

Closed
antuki opened this issue May 5, 2018 · 18 comments
Closed

Failed to open a specific .xls: Unable to allocate memory #478

antuki opened this issue May 5, 2018 · 18 comments
Labels
bug an unexpected problem or unintended behavior libxls xls 👵

Comments

@antuki
Copy link

antuki commented May 5, 2018

Hi,
I'm writing about an issue a bit similar to #373
When i download this file and I unzip it on my windows computer, The following line returns an error :

  > bdd <- read_excel("pop-sexe-age-quinquennal6814.xls", sheet = paste0("COM_", 1975))
  Error in sheets_fun(path) : 
    Failed to open pop-sexe-age-quinquennal6814.xls

But when I open the file in Excel and resave as .xlsx, it works !

   > bdd <- read_excel("pop-sexe-age-quinquennal6814.xlsx", sheet = paste0("COM_", 1975))

What should I do so that it works without opening the file in Excel ?

Thanks a lot !

  > sessionInfo()
  R version 3.4.1 (2017-06-30)
  Platform: i386-w64-mingw32/i386 (32-bit)
  Running under: Windows >= 8 x64 (build 9200)

  Matrix products: default

  locale:
  [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252          LC_MONETARY=French_France.1252
  [4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

  attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     

  other attached packages:
   [1] forcats_0.3.0      stringr_1.3.0      dplyr_0.7.4        purrr_0.2.4        readr_1.1.1       
   [6] tidyr_0.8.0        tibble_1.4.2       ggplot2_2.2.1.9000 tidyverse_1.2.1    readxl_1.1.0      

  loaded via a namespace (and not attached):
   [1] jsonlite_1.5      reshape2_1.4.3    rstudioapi_0.7    magrittr_1.5      gtable_0.2.0     
   [6] rmarkdown_1.9     hms_0.4.2         xml2_1.2.0        pillar_1.2.2      htmltools_0.3.6  
  [11] haven_1.1.1       broom_0.4.4       cellranger_1.1.0  lattice_0.20-35   xaringan_0.6     
  [16] plyr_1.8.4        lubridate_1.7.4   knitr_1.20        pkgconfig_2.0.1   R6_2.2.2         
  [21] digest_0.6.15     xfun_0.1          colorspace_1.3-2  bindrcpp_0.2.2    rprojroot_1.3-2  
  [26] stringi_1.1.7     yaml_2.1.19       lazyeval_0.2.1    evaluate_0.10.1   httr_1.3.1       
  [31] compiler_3.4.1    bindr_0.1.1       cli_1.0.0         withr_2.1.2       backports_1.1.2  
  [36] munsell_0.4.3     psych_1.8.3.3     modelr_0.1.1      Rcpp_0.12.16      parallel_3.4.1   
  [41] assertthat_0.2.0  tools_3.4.1       foreign_0.8-70    scales_0.5.0.9000 crayon_1.3.4     
  [46] glue_1.2.0.9000   rlang_0.2.0.9001  mnormt_1.5-5      nlme_3.1-131      rvest_0.3.2      
  [51] rsconnect_0.8.8   grid_3.4.1
@antuki antuki changed the title Error in sheets_fun(path) with .xlsx but not with .xlsx Error in sheets_fun(path) with .xls but not with .xlsx May 5, 2018
@jennybc
Copy link
Member

jennybc commented May 11, 2018

I'm unable to download the sheet you link to, so I can't tell what the problem is. If it is path related, then I predict b10a1a8 will have fixed it. If you can install the development version of readxl from GitHub and try again, please let me know what happens.

@antuki
Copy link
Author

antuki commented May 11, 2018

Thank you for your answer.

I tried with both devtools and cran versions and i obtain the same issue.

But I tried to guess what is the problem with the file : It's a big file (100 Mo) with around 15 sheets. I've noticed that when I delete enough sheets (and the size is approximately 50 Mo) the function is working again. So I think it's a question of size of the xls file (bugging when the file is too big).

But it is strange because I didn't have this issue with this same file with the last version of readxl I had on my computer (I cannot remember which one it was but I did not have the same issue before updating all my packages last week).

@jennybc jennybc changed the title Error in sheets_fun(path) with .xls but not with .xlsx Failed to open a specific .xls May 11, 2018
@jennybc jennybc added bug an unexpected problem or unintended behavior xls 👵 libxls labels May 11, 2018
@jennybc
Copy link
Member

jennybc commented May 11, 2018

Is it only this file that gives you problems?

I finally succeeded with the download and also fail to read it with readxl.

I get a more informative failure when I use a stand-alone tool built from libxls

jenny@2015-mbp libxls-evanmiller-github $ ./xls2csv ~/rrr/readxl/investigations/pop-sexe-age-quinquennal6814.xlsFILE: /Users/jenny/rrr/readxl/investigations/pop-sexe-age-quinquennal6814.xls
Error reading XLS file: Unable to allocate memory

I can ask over there if the maintainer is interested in acting on this, but yes it looks like the file size is part of the puzzle.

@antuki
Copy link
Author

antuki commented May 11, 2018

I've tried readxl for the first time recently so I didn't try ot with a lot of files but yes it is the only example that I have in mind.

And just for your information, this file is from the Insee (french national institute of statistics) and gives the age of the population for each french municipality.

And yes I hope the maintainer will be interested in fixing this issue even if I can still use other packages dealing with xls file I'm interested in using this one !

Thanks

@cderv
Copy link

cderv commented May 13, 2018

FWIW @antuki, if the file is converted to xlsx with Microsoft Excel, it is readable by {readxl}. The issue is just for xls format.

readxl::excel_sheets("~/../Downloads/pop-sexe-age-quinquennal6814/pop-sexe-age-quinquennal6814.xlsx")
#>  [1] "Présentation"                "Documentation"              
#>  [3] "Modifications_territoriales" "DEP_1968"                   
#>  [5] "DEP_1975"                    "DEP_1982"                   
#>  [7] "DEP_1990"                    "DEP_1999"                   
#>  [9] "DEP_2009"                    "DEP_2014"                   
#> [11] "COM_1968"                    "COM_1975"                   
#> [13] "COM_1982"                    "COM_1990"                   
#> [15] "COM_1999"                    "COM_2009"                   
#> [17] "COM_2014"
readxl::read_excel("~/../Downloads/pop-sexe-age-quinquennal6814/pop-sexe-age-quinquennal6814.xlsx", sheet = paste0("COM_", 1975), skip = 12)[1:5, 1:5]
#> # A tibble: 5 x 5
#>   `Région \r\nen gé~ `Département\r\n~ `Commune\r\nen g~ `Indicateur de s~
#>   <chr>              <chr>             <chr>             <chr>            
#> 1 RR                 DR                CR                STABLE           
#> 2 84                 01                001               1                
#> 3 84                 01                002               1                
#> 4 84                 01                003               0                
#> 5 84                 01                004               1                
#> # ... with 1 more variable: `Département\r\nen géographie 2016` <chr>

Created on 2018-05-13 by the reprex package (v0.2.0).

As this one won't change every often I think, it could be suitable for you to just download then convert manually before importing into R.

However, the bugs remains with xls format.

@cturnerbridger
Copy link

Hi, I am also having this issue with large xls files. I know that you mentioned that it seems to be a libxls issue; however, the bug seemed to occur after the update of R version 3.5.0 (2018-04-23) as I was able to read the same files previously.

Hopefully that is useful information.

Christian

@jennybc
Copy link
Member

jennybc commented May 22, 2018

@cturnerbridger Is the xls in question available?

@antuki
Copy link
Author

antuki commented May 22, 2018

The same as @cturnerbridger : i was also able to read this file with a previous version

@jennybc
Copy link
Member

jennybc commented May 23, 2018

@antuki Is the xls in question available?

The main mechanism for determining exactly what in these files triggers some new behaviour in libxls and, therefore, readxl, is for us to be able to examine the files in question.

@antuki
Copy link
Author

antuki commented May 23, 2018

Yes it concerns the French xls file from the Insee.

@cturnerbridger
Copy link

@jennybc sorry I can't share the files exactly. I will try to set up a similar / reproducible example today (UK time). Thanks for your help

@KZARCA
Copy link

KZARCA commented May 26, 2018

Hi, I have a similar problem (Ubuntu 16.04, R 3.4.4), here is the link to the file. Does not work with readxl versions 1.0.0 and 1.1.0.

@courtiol
Copy link

Same issue here with the file MYE6PE3_mid-2001-mid-2012-unformatted-syoa-data-file.xls that is part of the following Zip file:

https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatesforukenglandandwalesscotlandandnorthernireland/mid2001tomid2010/mid2001tomid2012ukpopulationestimatesv2.zip

In previous version, I used to be able to do:

read_excel("MYE6PE3_mid-2001-mid-2012-unformatted-syoa-data-file.xls",
             sheet = "Mid-2001", col_names = TRUE, skip = 1)

and now it no longer works...
Again saving as *.xlsx is a workaround.

Best,

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices
[4] utils     datasets  methods  
[7] base     

other attached packages:
[1] bindrcpp_0.2.2 stringr_1.3.1 
[3] purrr_0.2.4    ggplot2_2.2.1 
[5] dplyr_0.7.5    readxl_1.1.0  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17    
 [2] rstudioapi_0.7  
 [3] bindr_0.1.1     
 [4] magrittr_1.5    
 [5] tidyselect_0.2.4
 [6] munsell_0.4.3   
 [7] colorspace_1.3-2
 [8] R6_2.2.2        
 [9] rlang_0.2.0     
[10] rematch_1.0.1   
[11] plyr_1.8.4      
[12] tools_3.5.0     
[13] grid_3.5.0      
[14] gtable_0.2.0    
[15] utf8_1.1.3      
[16] cli_1.0.0       
[17] yaml_2.1.19     
[18] lazyeval_0.2.1  
[19] assertthat_0.2.0
[20] tibble_1.4.2    
[21] crayon_1.3.4    
[22] glue_1.2.0      
[23] stringi_1.2.2   
[24] compiler_3.5.0  
[25] pillar_1.2.2    
[26] cellranger_1.1.0
[27] scales_0.5.0    
[28] pkgconfig_2.0.1 

@jennybc jennybc changed the title Failed to open a specific .xls Failed to open a specific .xls: Unable to allocate memory May 30, 2018
@jennybc
Copy link
Member

jennybc commented May 30, 2018

The .xls that fail with with "Unable to allocate memory" may be readable now that libxls/libxls#22 is closed. I'll pull libxls from upstream soon to find out.

@antuki
Copy link
Author

antuki commented Jul 17, 2018

Thank you @jennybc . I tried again with an "Insee" file today and the problem still remains.

@joelgombin
Copy link

FWIW, same problem here with this file. Works with readxl 1.0, doesn't work with readxl 1.1.

@cderv
Copy link

cderv commented Aug 6, 2018

FWIW, @antuki, @joelgombin , I believe the libxls upstream has not been pulled yet into readxl looking at the src folder last changed.
https://github.com/tidyverse/readxl/tree/master/src/libxls
those tests should wait until then to be sure the issue still remains.

@jennybc
Copy link
Member

jennybc commented Dec 14, 2018

Having embedded the latest libxls in the dev version of readxl, I can read the original xls:

readxl::read_excel(
  "investigations/pop-sexe-age-quinquennal6814.xls",
  sheet = paste0("COM_", 1975)
)
#> New names:
#> * `` -> `..2`
#> * `` -> `..3`
#> * `` -> `..4`
#> * `` -> `..5`
#> * `` -> `..6`
#> * … and 40 more
#> # A tibble: 38,227 x 46
#>    `Recensement de… ..2   ..3   ..4   ..5   ..6   ..7   ..8   ..9   ..10 
#>    <chr>            <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 TABLEAU COMMUNA… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  2 France métropol… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  3 Découpage géogr… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  4 <NA>             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  5 Note de lecture… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  6 <NA>             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  7 Documentation    <NA>  Modi… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  8 © Insee          Sour… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  9 <NA>             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#> 10 <NA>             <NA>  <NA>  <NA>  <NA>  AGE   00    00    05    05   
#> # … with 38,217 more rows, and 36 more variables: ..11 <chr>, ..12 <chr>,
#> #   ..13 <chr>, ..14 <chr>, ..15 <chr>, ..16 <chr>, ..17 <chr>,
#> #   ..18 <chr>, ..19 <chr>, ..20 <chr>, ..21 <chr>, ..22 <chr>,
#> #   ..23 <chr>, ..24 <chr>, ..25 <chr>, ..26 <chr>, ..27 <chr>,
#> #   ..28 <chr>, ..29 <chr>, ..30 <chr>, ..31 <chr>, ..32 <chr>,
#> #   ..33 <chr>, ..34 <chr>, ..35 <chr>, ..36 <chr>, ..37 <chr>,
#> #   ..38 <chr>, ..39 <chr>, ..40 <chr>, ..41 <chr>, ..42 <chr>,
#> #   ..43 <chr>, ..44 <chr>, ..45 <chr>, ..46 <chr>

Created on 2018-12-13 by the reprex package (v0.2.1.9000)

There have been several "me too" comments. I invite those folks to install the dev version of readxl and see if it fixes things for their file too. If not, please open a new issue with a link to your xls or attach it in a zip archive.

@jennybc jennybc closed this as completed Dec 14, 2018
@lock lock bot locked and limited conversation to collaborators Dec 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior libxls xls 👵
Projects
None yet
Development

No branches or pull requests

7 participants