You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have attached the title page of an article. The DOI has a line break coinciding with a dash. It is extracted as the first rather than the second:
10.1146/annurev-financial-010421085556 ## missing a dash
10.1146/annurev-financial-010421-085556 ## Correct, returns a bibtex entry on CrossRef.
Here is the error:
> RefManageR::ReadPDFs('page1.pdf')
Getting Metadata for 1 pdfs...
## Ignore the following line, this is an artifact created by extracting the first page using pdftk
Command Line Error: Wrong page range given: the first page (2) can not be after the last page (1).
Getting 1 BibTeX entries from CrossRef...
Server error [404] for doi “10.1146/annurev-financial-010421085556”, you may want to try again, or BibTeX
unavailable for this doi
pdfinfo is version 3.03 and poppler-utils is version 0.86.1-0ubuntu1
sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Hi, thanks for the report. Unfortunately, this issue is occurring in poppler, which you can verify if you run pdftotext from the command line. It must remove the trailing hyphen occuring as the last character on the line. I've pushed some fixes so that your example now runs without error and removes the Command Line Error: Wrong page range given message if you install the latest version of the packge from GitHub. I don't see any way to grab the correct DOI in this case without a change to poppler, sorry.
Thanks. The new error message is definitely an improvement but because the function still writes a Bibtex entry (which I think is the correct decision), would it make sense for the function to note in this case that the entry may be wrong, something like:
Writing 1 (possibly incorrect) Bibtex entries
In any case, ReadPDFs is a fantastic function, thank you.
I have attached the title page of an article. The DOI has a line break coinciding with a dash. It is extracted as the first rather than the second:
10.1146/annurev-financial-010421085556
## missing a dash10.1146/annurev-financial-010421-085556
## Correct, returns a bibtex entry on CrossRef.Here is the error:
pdfinfo
is version 3.03 andpoppler-utils
is version0.86.1-0ubuntu1
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RefManageR_1.3.0
loaded via a namespace (and not attached):
[1] httr_1.4.4 compiler_4.2.1 magrittr_2.0.3 plyr_1.8.7
[5] R6_2.5.1 generics_0.1.3 tools_4.2.1 curl_4.3.2
[9] Rcpp_1.0.9 lubridate_1.8.0 xml2_1.3.3 stringi_1.7.8
[13] stringr_1.4.1 jsonlite_1.8.0 bibtex_0.4.2.3 ```
The text was updated successfully, but these errors were encountered: