Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not fetch PRB articles #1

Open
wenlibin02 opened this issue Dec 12, 2017 · 3 comments
Open

Could not fetch PRB articles #1

wenlibin02 opened this issue Dec 12, 2017 · 3 comments

Comments

@wenlibin02
Copy link

I do have the access to APS journals in my institute. The commands and outputs:

$ ./getpaper.sh -j PRB -v 95 -p 014105
Processing: JOURNAL PRB VOLUME 95 PAGE 014105
Getting BIBCODE from http://adsabs.harvard.edu/cgi-bin/nph-abs_connect?version=1&warnings=YES&partial_bibcd=YES&sort=BIBCODE&db_key=ALL&bibstem=phrvb&volume=95&page=014105&nr_to_return=1&start_nr=1
BIBCODE is 2017PhRvB..95a4105Y
Fetching bibtex file from ADS (http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=2017PhRvB..95a4105Y&data_type=BIBTEXPLUS&db_key=ALL&nocookieset=1)
Processing 2017PhRvB..95a4105Y...
PRB 95 014105 is a valid reference
http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2017PhRvB..95a4105Y&link_type=EJOURNAL&db_key=ALL
Determining URL path for PDF...
Warning: User-Agent string does not contain "Lynx" or "L_y_n_x"!
Downloading PDF from  ...
Warning: User-Agent string does not contain "Lynx" or "L_y_n_x"!
Checking if PDF appears to be valid...
Error in downloading the PDF
Maybe you do not have access
Terminating...
If you confirm the citation information, and the paper is also in ADS, check for a new version of getpaper:
	https://github.com/goatface/getpaper
	(Many repositories are frequently changing their link structure.)
removed '/tmp/PRB.95.014105.pdf'
***********************************************
The following submission(s) failed to download:
JOURNAL	VOLUME	PAGE
PRB	95	014105
***********************************************

I see there was a problem, but let me try something else.

This is APS, so it must not be an open access article.  I'll try again and give you the captcha...
Calling ./getpaper.sh -j PRB -v 95 -p 014105 --retry
Processing: JOURNAL PRB VOLUME 95 PAGE 014105
Getting BIBCODE from http://adsabs.harvard.edu/cgi-bin/nph-abs_connect?version=1&warnings=YES&partial_bibcd=YES&sort=BIBCODE&db_key=ALL&bibstem=phrvb&volume=95&page=014105&nr_to_return=1&start_nr=1
BIBCODE is 2017PhRvB..95a4105Y
Fetching bibtex file from ADS (http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=2017PhRvB..95a4105Y&data_type=BIBTEXPLUS&db_key=ALL&nocookieset=1)
Processing 2017PhRvB..95a4105Y...
PRB 95 014105 is a valid reference
http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2017PhRvB..95a4105Y&link_type=EJOURNAL&db_key=ALL
Determining URL path for PDF...
Warning: User-Agent string does not contain "Lynx" or "L_y_n_x"!
Downloading PDF from  ...
Checking if PDF appears to be valid...
Error in downloading the PDF
Maybe you do not have access
Terminating...
If you confirm the citation information, and the paper is also in ADS, check for a new version of getpaper:
	https://github.com/goatface/getpaper
	(Many repositories are frequently changing their link structure.)
***********************************************
The following submission(s) failed to download:
JOURNAL	VOLUME	PAGE
PRB	95	014105
***********************************************
@goatface
Copy link
Owner

goatface commented Feb 10, 2018

I'm so sorry for the belated reply. I should find a way to get a notification for an issue being raised.

Thanks for your interest in my code.

Probably your lynx does not allow command execution. I also had this trouble on my recent installs, and it is a TODO item to have getpaper be able to identify that lynx is missing the required option.

Let me show you how I test this. Firstly, make a simple script in your path, call it hello.sh for instance

#!/bin/bash
zenity --info --title="Hello World!" --text="This is a test"

Then be sure to chmod +x hello.sh and you're good. Now we should find the lynx configuration. I will show you two options. One is easier but requires super user access and modifying system files, the other is a little more complicated but you may feel safer. I will start with the latter.

Fill a text file with some information, which will be used as a special configuration for lynx in a moment. You can use any text editor but you can just paste this at the command line, after fixing the path to hello.sh from above:

printf ".h1 Internal Behavior\n.h2 SOURCE_CACHE\nSOURCE_CACHE:FILE\n.h1 External Programs\n.h2 EXTERNAL\nEXTERNAL:http:/home/daid/scripts/hello.sh:TRUE" > /tmp/lynx.cfg

The other option is you find /etc/lynx.cfg and use sudo to edit it and find the part with .h2 EXTERNAL which is relatively well commented, but you just need to put this bit about html and a script location. By default, the line often looks like this:

#EXTERNAL:ftp:wget %s &:TRUE

And you should change it to be something similar like

EXTERNAL:http:/home/daid/scripts/hello.sh:TRUE

Now, go to google.com or whatever with lynx to test it. If you just did a local configuration, then

lynx -cfg=/tmp/lynx.cfg www.google.com

Then hit A and y etc to allow all cookies and so on so google loads. Now hit the comma key , which is the default binding to execute a command. If everything is working, you should get your Zenity GUI popup of hello world.

In all probability, you did not get the Zenity popup, proving the issue is with your lynx version that needs command execution build in to it. (Or that I did not explain instructions how to test this very clearly...I am planning to write an automated function to check this in the future.)

Until recently, I believed that it was normal to include this feature with lynx, but I found some Linux distros do not enable it by default. In that sense, you will have to get lynx installed with this option. I don't have specific advice for you because I forget even what search terms I used or what this compile-time flag is called in lynx, but I will try to post an update in the future once I get it together.

There is no way I know of for getpaper to otherwise deal with the APS captcha, because I need to retain the lynx session actively or the URLs to the image become invalid, and this was this trick of command execution with lynx (which launches getpaper again in a special mode) was my workaround.

The problem is that we need the lynx session to stay alive. But how can we launch another command to parse the lynx output 1) Keeping the lynx session active 2) At the correct time. We could launch lynx in the background, but the script lynx is doing has to pause at just the right moment, and then do something correct with whatever we did in bash in the meantime. If I hadn't found this hack with executing commands in lynx I would have concluded I could not resolve the issue with APS captcha, but ideas are welcome to do this another way.

@wenlibin02
Copy link
Author

Thank you for your reply and clear explanation. The configure option to compile lynx with external command support is "--enable-externs", as explained in the INSTALLATION file of lynx source package. I tried to compile lynx manually with external command support but still got some errors. It seems not easy to resolve and maybe I will check some time.

site: http://lynx.invisible-island.net/current/index.html

@goatface
Copy link
Owner

goatface commented Feb 21, 2018

Thanks. Yes you are right. I have added the name of the flag --enable-externs to the readme on the main page.

I am still wondering about a solution, but so far did not find a way to easily test this feature.

There is a -restrictions flag to lynx, which includes restricting externals. I haven't tested it yet, but it could be that it gives a useful error or output if one tries to restrict the feature which is not implemented (but I am doubtful).

Also is your lynx compile error related with ncurses TERMINAL fixed pointers?

Compiling Lynx sources
gcc -DHAVE_CONFIG_H  -DLOCALEDIR=\"/usr/share/locale\" -I. -I.. -Ichrtrans -I./chrtrans -I.. -I../src -I.././WWW/Library/Implementation    -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE  -DLINUX -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong  -c LYShowInfo.c
gcc -DHAVE_CONFIG_H  -DLOCALEDIR=\"/usr/share/locale\" -I. -I.. -Ichrtrans -I./chrtrans -I.. -I../src -I.././WWW/Library/Implementation    -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE  -DLINUX -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong  -c LYStrings.c
LYStrings.c: In function ‘expand_tiname’:
LYStrings.c:1011:14: error: dereferencing pointer to incomplete type ‘TERMINAL {aka struct term}’
  if (cur_term->type.Strings[code] != 0) {
              ^~
make[1]: *** [makefile:103: LYStrings.o] Error 1

Strange bug...feel free to discuss with me here if so (which Linux distro, etc.); I would like to try to help. It's related to my project, so I am interested in workarounds etc. It used to be if your system didn't provide lynx (at all) I could show you how to compile from source. But that is difficult right now it seems.

I found many posts about that issue on various distros but I am not sure the status.

In Arch Linux I found it under lynx-current (compiled with externs). In Gentoo on my old system it just worked. Xubuntu was also working with the apt-get out-of-box version this week. I'm asking my work to install lynx in Scientific Linux now...let's see!

Let me conclude by noting that this is actually two issues:

  • How can getpaper know if lynx was compiled with enable-externs?
  • If lynx was not compiled with the flag, how can we assist the user to install a suitable version of lynx?

I remain interested in resolving both issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants