Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprint search #38

Open
chartgerink opened this issue Nov 10, 2017 · 3 comments
Open

Preprint search #38

chartgerink opened this issue Nov 10, 2017 · 3 comments

Comments

@chartgerink
Copy link
Contributor

A tweet suggested to add preprint search in the fulltext package, which might also perfectly fit here.

@sckott, would you consider it preferable to add it here or in fulltext? I am planning (and have been for too long...) to submit this package to rOpenSci for review anyway when it's somewhat more fleshed out, so would value your input to prevent redundancy.

@sckott
Copy link

sckott commented Nov 10, 2017

hi @chartgerink - glad to see you're adding it here. fulltext is meant to be a single place to search for articles/fetch full text. in that capacity, it would provide functionality to maybe search for preprints on OSF, and definitely fetch their full text via a DOI. in fulltext we probably wouldn't need to touch the OSF API at all if we can get the full text URL from Datacite (though on first look it looks like they don't provide full text URL)

It looks like a bit of a hot mess like all publishers. I didn't see this URL in the Datacite record. Do you get a full text URL via OSF API?

e.g., for https://eartharxiv.org/fsvmw/

curl -v -L 'https://osf.io/project/at82u/files/osfstorage/59f7afbfb83f69026be35a66/?action=download' > my.pdf

then a whole bunch of redirects before finally getting the PDF

*   Trying 23.253.149.196...
* TCP_NODELAY set
* Connected to osf.io (23.253.149.196) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate: *.osf.io
* Server certificate: COMODO RSA Domain Validation Secure Server CA
* Server certificate: COMODO RSA Certification Authority
> GET /project/at82u/files/osfstorage/59f7afbfb83f69026be35a66/?action=download HTTP/1.1
> Host: osf.io
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Date: Fri, 10 Nov 2017 17:41:06 GMT
< Content-Type: text/html
< Content-Length: 178
< Location: https://osf.io/at82u/files/osfstorage/59f7afbfb83f69026be35a66/?action=download
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Expires: Mon, 01 Jan 1990 00:00:00 GMT
< Pragma: no-cache
< Strict-Transport-Security: max-age=16000000; preload;
<
* Ignoring the response-body
* Connection #0 to host osf.io left intact
* Issue another request to this URL: 'https://osf.io/at82u/files/osfstorage/59f7afbfb83f69026be35a66/?action=download'
* Found bundle for host osf.io: 0x7fb36fd0bc00 [can pipeline]
* Re-using existing connection! (#0) with host osf.io
* Connected to osf.io (23.253.149.196) port 443 (#0)
> GET /at82u/files/osfstorage/59f7afbfb83f69026be35a66/?action=download HTTP/1.1
> Host: osf.io
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 302 FOUND
< Date: Fri, 10 Nov 2017 17:41:06 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 459
< Location: https://files.osf.io/v1/resources/at82u/providers/osfstorage/59f7afbfb83f69026be35a66?action=download&version=1&direct
< X-Frame-Options: SAMEORIGIN
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Expires: Mon, 01 Jan 1990 00:00:00 GMT
< Pragma: no-cache
< Strict-Transport-Security: max-age=16000000; preload;
<
* Ignoring the response-body
* Connection #0 to host osf.io left intact
* Issue another request to this URL: 'https://files.osf.io/v1/resources/at82u/providers/osfstorage/59f7afbfb83f69026be35a66?action=download&version=1&direct'
*   Trying 104.239.182.178...
* TCP_NODELAY set
* Connected to files.osf.io (104.239.182.178) port 443 (#1)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate: *.osf.io
* Server certificate: COMODO RSA Domain Validation Secure Server CA
* Server certificate: COMODO RSA Certification Authority
> GET /v1/resources/at82u/providers/osfstorage/59f7afbfb83f69026be35a66?action=download&version=1&direct HTTP/1.1
> Host: files.osf.io
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 10 Nov 2017 17:41:06 GMT
< Content-Length: 441233
< Server: TornadoServer/4.3
< Content-Disposition: attachment;filename="Grohmann_2006_GNews_r_roughness.pdf"
< Content-Type: application/octet-stream

@chartgerink
Copy link
Contributor Author

@sckott The OSF API requires a lot of nested requests, it seems to me (although embedded requests seem to be available to reduce this). With the OSF API I get to the file with two requests (note the dependency on jq here*).

URI=`curl https://api.osf.io/v2/preprints/fsvmw/ | jq '.["data"]["relationships"]["primary_file"]["links"]["related"]["href"]' | sed 's/"//g'`
["href"]'
curl $URI | jq '.["data"]["links"]["download"]'

*npm install -g jq-cli-wrapper

@sckott
Copy link

sckott commented Nov 13, 2017

nice, wish it was easier than that. Hopefully it's consistent across the various preprint entities they have :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants