Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open PDF in Zotero #685

Open
dlukes opened this issue Sep 5, 2022 · 42 comments
Open

Open PDF in Zotero #685

dlukes opened this issue Sep 5, 2022 · 42 comments
Labels
enhancement New feature or request

Comments

@dlukes
Copy link

dlukes commented Sep 5, 2022

Zotero has recently acquired a very nice PDF reader and annotator, which can be invoked via the command line. So I started investigating whether I can make citar use it.

Turns out -- yes, see below. I'm still posting this here because this was the first place I looked to see whether anyone had already figured it out, so I'm hoping it might help other people who want to achieve something similar. Additionally, it might be a source of inspiration to rework some of citar's internals to make this type of setup a bit easier (basically, all that's needed is a some special-casing for URIs starting with zotero://...). But if that's something none of the maintainers consider worth their while at this point, that's perfectly fine too, feel free to close the issue right away :)

Big picture

The Zotero PDF reader is activated via URLs of the general form zotero://open-pdf/library/items/<ZOTERO_KEY>, which you shoud be able to just (xdg-)open on the command line. So we need to:

  1. generate such URLs during export from Zotero
  2. sheperd them safely through citar's attachment files machinery

Zotero side

I'm using Better BibTeX for the exports, because it allows you to customize the export process. You'll need to add a custom script to the export process via the postscript setting:

image

Here's the full script for copy-paste, with explanatory comments:

// I'm using this with BetterCSLJSON, but BetterBib(La)TeX is also possible, see
// https://retorque.re/zotero-better-bibtex/exporting/scripting/.
if (Translator.BetterCSLJSON) {
  entry.file = item.attachments.map(
    a => (
      // If this is a PDF attachment...
      /.pdf$/i.test(a.localPath) ?
        // ... generate a link to open the PDF in Zotero, abusing the query string to
        // provide a human readable file name like so:
        // zotero://open-pdf/library/items/RANDOM_ZOTERO_KEY?human-readable-name.pdf
        `zotero://open-pdf/library/items/${a.key}?${a.localPath.split(a.key)[1].substring(1)}`
        // ... otherwise, just return the normal file path.
        : a.localPath
    // Escape \'s and ;'s to make the individual items play nice with
    // citar-file--parser-default and citar-file--split-escaped-string.
    ).replace(/([\\;])/g, "\\$1")
  ).join(";")
}

Citar side

Citar performs various checks and processing on the file paths associated with bibliography items, which lead to the zotero:// URLs being discarded or mangled. So these modifications need to be disabled (note that I'm using Doom Emacs; in vanilla Emacs, translate to two calls to advice-add, I think):

(defadvice! dlukes/citar-file-trust-zotero (oldfun &rest r)
  "Leave Zotero-generated file paths alone, especially zotero://..."
  :around '(citar-file-open citar-file--find-files-in-dirs)
  (cl-letf (((symbol-function 'file-exists-p) #'always)
            ((symbol-function 'expand-file-name) (lambda (first &rest _) first)))
    (apply oldfun r)))

And obviously, we'll need to tell citar to open PDFs in an external app (again, Doom Emacs):

(after! citar
  (add-to-list 'citar-file-open-functions '("pdf" . citar-file-open-external)))

(As an aside: thank you very much for citar!)

@dlukes dlukes added the enhancement New feature or request label Sep 5, 2022
@bdarcus
Copy link
Contributor

bdarcus commented Sep 5, 2022

So I've not yet looked at this closely, but some quick thoughts.

  1. a number of us, including me, use Zotero, so definitely this is interesting.
  2. I think the technical question on our end is just how best, most generally, to handle this. Your suggestion may indeed be the answer, but any thoughts @aikrahguzar @roshanshariff?
  3. Do you think that BBT script might be general enough to add to BBT itself?
  4. We do have a wiki for ideas like this; feel free to add it there when it makes sense.

@dlukes
Copy link
Author

dlukes commented Sep 5, 2022

Do you think that BBT script might be general enough to add to BBT itself?

That would be nice, but for CSL JSON specifically, the trouble is there's no standard field for putting the attachments AFAIK. And coming up with a non-standard field has backwards compatibility implications. In the course of researching this, I think I saw @retorquere make this argument against including non-standard fields in Better CSL JSON exports, but I can't find the reference at the moment. So that's a potential blocker.

(If/when a standard CSL JSON solution emerges -- an attachments field? -- I'm guessing it won't be a single string with multiple values separated by ;, but an array?)

For Better Bib(La)TeX, all that's needed is an option in the existing exporters to generate zotero://open-pdf/... URLs for PDFs instead of regular paths when exporting files, I think.

We do have a wiki for ideas like this; feel free to add it there when it makes sense.

Thank you! I'll wait a bit to see where the discussion goes first, but I'll keep it in mind.

@bdarcus
Copy link
Contributor

bdarcus commented Sep 5, 2022

I'm not necessarily advocating for this (it was just a question), but note that CSL-JSON now has a "custom" property that could be used to dump non-standard data like this. Am not sure what Zotero does with that on import though.

@retorquere
Copy link

Do you have a sample with these custom fields?

@bdarcus
Copy link
Contributor

bdarcus commented Sep 5, 2022

Yes, there are examples embedded in the schema.

          {
            "short_id": "xyz",
            "other-ids": ["alternative-id"]
          }

@retorquere
Copy link

Zotero doesn't do anything with those.

@bdarcus
Copy link
Contributor

bdarcus commented Feb 19, 2023

Zotero doesn't do anything with those.

I should have asked this earlier, but now that I'm looking at it again:

"Nothing" in this context means "throws out these data, but doesn't cause an error"?

Maybe that's sufficient?

@retorquere
Copy link

Yeah, no action, no error.

@bdarcus
Copy link
Contributor

bdarcus commented Feb 19, 2023

Might be cool, then, for BBT and/or Zotero itself (cc @dstillman) to use that new custom property for exporting this kind of thing as a first step.

@retorquere
Copy link

We'd still have to establish what the custom properties would look like.

Is CSL really the best vehicle for this?

@bdarcus
Copy link
Contributor

bdarcus commented Feb 19, 2023

IDK; just seems the primary export format for Zotero (edit: and more generally has picked up traction beyond it).

But certainly it need not stay that way.

@retorquere
Copy link

If the custom fields appear in Zotero, BBT CSL formats will inherit them automatically, so that would kill two birds with one stone.

@bdarcus
Copy link
Contributor

bdarcus commented Apr 6, 2023

@dlukes - FYI, a few weeks ago I merged #736 with an option to open a Zotero entry via citar-open-entry.

Here's that function:

citar/citar.el

Lines 1514 to 1520 in ed53e67

(defun citar-open-entry-in-zotero (citekey)
"Open a reference item for CITEKEY in Zotero.
This function assumes a setup where the bibliographic data,
including the citekeys, is maintained in Zotero with Better BibTeX."
(citar-file-open-external
(concat "zotero://select/items/@" citekey)))

The basic idea could be extended for files. But the current ability to open the entry should still be useful for now.

And yeah, the current file opening functions would need some adjustment.

@retorquere - is it yet possible to open PDFs using the "select" links?

@retorquere
Copy link

No, the relevant change hasn't (to my knowledge) been made to Zotero yet.

@bdarcus
Copy link
Contributor

bdarcus commented Apr 6, 2023

Is the issue using the citekeys to open the PDFs, or opening the PDFs at all?

Like, should this work?

zotero://open-pdf/library/items/TPUJP37W

It doesn't for me, but that doesn't necessarily mean anything.

@dlukes
Copy link
Author

dlukes commented Apr 6, 2023

FYI, a few weeks ago I merged #736 with an option to open a Zotero entry via citar-open-entry.

Thanks, I happened to notice through a lucky coincidence quite soon after the merge and configured it as my citar-open-entry-function :)

It doesn't for me

It does for me -- running xdg-open zotero://open-pdf/library/items/TPUJP37W opens the associated PDF in Zotero.

@retorquere
Copy link

The problem is that Zotero has two functions (Zotero.API.getResultsFromParams and Zotero.DataObjects.prototype.parseLibraryKey(Hash)) that translate the TPUJP37W part to an item ID without being given any context. I don't know what in the URL precedes it (technically, I don't even know where I'm called from, whether these url-handling calls or entirely different places where these functions are called), so I don't know whether you are opening an item or an attachment, and when you send zotero://open-pdf/library/items/@citekey, I don't know whether I should translate that to the item itself or to one of the attachments. Zotero said they would at some point add this context, but I don't see in the code that this has happened yet.

@dlukes
Copy link
Author

dlukes commented Apr 6, 2023

running xdg-open zotero://open-pdf/library/items/TPUJP37W opens the associated PDF in Zotero.

Oh right, sorry, I forgot part of the context here -- it doesn't work for opening the associated PDF of TPUJP37W as a parent item; it works if TPUJP37W is already the ID of a PDF item. Sorry for the confusion.

@bdarcus
Copy link
Contributor

bdarcus commented Apr 6, 2023

Oh right, sorry, I forgot part of the context here -- it doesn't work for opening the associated PDF of TPUJP37W as a parent item; it works if TPUJP37W is already the ID of a PDF item. Sorry for the confusion.

That is explains why it didn't work for me. It makes sense though.

... so I don't know whether you are opening an item or an attachment, and when you send zotero://open-pdf/library/items/@citekey, I don't know whether I should translate that to the item itself or to one of the attachments.

So really we'd need to find whatever PDFs are associated with the citekey-identified Zotero parent, and list those as discrete (Zotero) "file" resources in citar-open-*?

In that case BBT, wouldn't do the interpretation of what to open; the user would choose which one here.

Perhaps, then, best to include the Zotero item ID in the bibtex etc file?

Or, per the OP, include the actual select links to them in files?

EDIT: maybe we refactor a bit so default files use file://, and the checks only apply to those, but allow other schemes?

I guess, then, we might retitle the issue something like "Allow non-file URL schemes for library files"?

@retorquere
Copy link

BBT could offer a json-rpc call that translates a citekey to the item + attachments. The caller can then open the attachment using the native zotero IDs in the url.

@bdarcus
Copy link
Contributor

bdarcus commented Apr 6, 2023

That sounds perfect @retorquere.

@retorquere
Copy link

Does this do what you need if you give it the betterbibtex json translator? https://retorque.re/zotero-better-bibtex/exporting/json-rpc/#itemexportcitekeys-translator-libraryid

@bdarcus
Copy link
Contributor

bdarcus commented Apr 6, 2023

What does the relevant returned JSON look like?

A list of items, each of which has a list of attachment Zotero IDs (and type and such)?

E.g. we could generate a list of zotero select URIs from it directly, for each attached PDF?

If yes, yes!

@bdarcus
Copy link
Contributor

bdarcus commented Apr 6, 2023

@dlukes - it might be that we could do a little citar-zotero thing here, relying on the json-rpc, and bundle that functionality there? Sort of an analog to citar-file.

There is a built-in jsonrpc package, though I haven't yet got it working.

If you feel like giving a go at a PR, let me know.

Otherwise, I'll take a look when I get a chance.

EDIT: there are other emacs zotero packages, but they seem to focus on the Zotero web API?

@bdarcus
Copy link
Contributor

bdarcus commented Apr 6, 2023

With this (two attachments, one HTML and one PDF), how do I know which are the PDFs, other than looking at the file extensions?

curl http://localhost:23119/better-bibtex/json-rpc -X POST -H "Content-Type: application/json" -H "Accept: application/json" --data-binary '{"jsonrpc": "2.0", "method": "item.attachments", "params": {"citekey": "toly2017"} }'
{"jsonrpc":"2.0","result":[{"open":"zotero://open-pdf/library/items/PKZ88BS7","path":"/home/bruce/Zotero/storage/PKZ88BS7/14747731.2016.html"},{"open":"zotero://open-pdf/library/items/4XRNSQEB","path":"/home/bruce/Zotero/storage/4XRNSQEB/Toly_2017_Brexit, global cities, and the future of world order.pdf","annotations":[]}],"id":null}⏎  

@bdarcus
Copy link
Contributor

bdarcus commented Apr 6, 2023

Does this do what you need if you give it the betterbibtex json translator?

What's the right value for the translator property? jzon?

@retorquere
Copy link

jzon will do it, as will BetterBibTeX JSON.

@retorquere
Copy link

What does the relevant returned JSON look like?

A list of items, each of which has a list of attachment Zotero IDs (and type and such)?

I thought it did, but they're stripped out. I'll see what I can do about that.

@dlukes
Copy link
Author

dlukes commented Apr 6, 2023

Sorry, evening routine with the kids, then fell asleep trying to get the older one to hit the hay :)

maybe we refactor a bit so default files use file://, and the checks only apply to those, but allow other schemes?

Sounds good to me!

it might be that we could do a little citar-zotero thing here, relying on the json-rpc, and bundle that functionality there?

I think that's a good option too, although it feels somewhat more complicated than just having the URLs listed in the exported bibliography file, especially since Citar will still need to have that file anyway. But I understand the reticence to add nonstandard fields to the CSL export willy-nilly.

From a performance perspective, just to make sure I'm understanding this correctly -- would this mean that whenever I search my bibliography, Citar would initiate either one JSON-RPC call exporting all the citekeys in the bibliography with item.export, or multiple JSON-RPC calls getting item.attachments for each citekey?

How fast/slow is either of these expected to be? It also seems slightly wasteful to redo this each time, so some sort of caching should probably be involved, which is of course notoriously tricky to get right. Whereas if this information was part of the exported bibliography file, then all of this would be implicitly handled by just keeping track of whether the file needs to be reloaded, which Citar already does.

If you feel like giving a go at a PR, let me know.

I've got a lot on my plate right now, and I'm not particularly comfortable in Elisp. Plus the performance worries I detailed above. But if it turns out they're unfounded, I might try and cobble something together at some point. If I do, I'll post here to save on duplicate work.

@dlukes
Copy link
Author

dlukes commented Apr 7, 2023

would this mean that whenever I search my bibliography, Citar would initiate either one JSON-RPC call exporting all the citekeys in the bibliography with item.export, or multiple JSON-RPC calls getting item.attachments for each citekey?

I ran a simple test of both these options:

import time
import json
from pathlib import Path

import httpx

with Path("~/.cache/zotero/My Library.json").expanduser().open("rb") as file:
    bib = json.load(file)
citekeys = [item["citation-key"] for item in bib]

def item_attachments():
    for citekey in citekeys:
        httpx.post(
            "http://localhost:23119/better-bibtex/json-rpc",
            json={
                "jsonrpc": "2.0",
                "method": "item.attachments",
                "params": {"citekey": citekey},
            },
            timeout=None,
        )

def item_export():
    httpx.post(
        "http://localhost:23119/better-bibtex/json-rpc",
        json={
            "jsonrpc": "2.0",
            "method": "item.export",
            "params": {"citekeys": citekeys, "translator": "jzon"},
        },
        timeout=None,
    )

print(f"My Library currently has {len(citekeys)} items.")
for test in (item_attachments, item_export):
    print(f"Timing {test.__name__}...")
    start = time.perf_counter()
    test()
    elapsed = time.perf_counter() - start
    print(f"  -> Ran in {elapsed:.2f} seconds.")

And unless I misunderstood or I'm doing something wrong, I'm afraid it looks unworkable from a performance standpoint:

My Library currently has 773 items.
Timing item_attachments...
  -> Ran in 75.50 seconds.
Timing item_export...
  -> Ran in 44.38 seconds.

@bdarcus
Copy link
Contributor

bdarcus commented Apr 7, 2023 via email

bdarcus added a commit that referenced this issue Apr 7, 2023
* add citar-file-scheme-skip to define URI file schemes to pass on
  without validating or normalizing
* citar-file--find-files-in-dirs: use citar-file-scheme-skip

Close: #685
bdarcus added a commit that referenced this issue Apr 7, 2023
* add citar-file-scheme-skip to define URI file schemes to pass on
  without validating or normalizing
* citar-file--find-files-in-dirs: use citar-file-scheme-skip

Close: #685
bdarcus added a commit that referenced this issue Apr 7, 2023
* add citar-file-scheme-skip to define URI file schemes to pass on
  without validating or normalizing
* citar-file--find-files-in-dirs: use citar-file-scheme-skip

Close: #685
@retorquere
Copy link

I thought it did, but they're stripped out. I'll see what I can do about that.

v6.7.68 will drop in 10 minutes or so, and that has a straight dump of the Zotero objects. All keys are in there, but it also has all the URIs ready to go.

@bdarcus
Copy link
Contributor

bdarcus commented Apr 8, 2023

Hmm ... when I use the jzon translator value, it works as expected.

But if I do this, it appears to hang; like will not complete after a minute or so, at which point I cancel.

curl http://localhost:23119/better-bibtex/json-rpc -X POST -H "Content-Type: application/json" -H "Accept: application/json" --data-binary '{"jsonrpc": "2.0", "method": "item.export", "params": {"citekeys": ["toly2017"], "translator": "BetterBibTeX JSON" }}'

@retorquere
Copy link

A new release is building that fixes that.

@tbdcit
Copy link

tbdcit commented May 17, 2024

Is there any progress on this issue or workaround?

I have added

if (Translator.BetterCSLJSON) {
entry.file = item.attachments.map(
        a => (
		/.pdf$/i.test(a.localPath) ?
		`zotero://open-pdf/library/items/${a.key}?${a.localPath.split('/').pop()}`
		: a.localPath
	    ).replace(/([\\;])/g, "\\$1")
	).join(';')
}

to the BB export which successfully adds the zotero links to the exported json. I am struggling to get citar to open these links though. I am not using doom and I am not sure how to write an equivalent advise to the one suggest in the first post.

Any suggestions? If I can get something working I would be happy to add this to the Wiki.

@bdarcus
Copy link
Contributor

bdarcus commented May 18, 2024

@tbdcit - No progress. I haven't really looked at this myself since my last comment.

But that advise in the OP is not specific to doom.

Edit: actually, it is specific to doom; sorry.

@dschaehi
Copy link

dschaehi commented Nov 30, 2024

I found the following bash script [1], which takes as input a better-bibtex citekey and opens the PDF. I think one can use this for Citar. What do you think @bdarcus? xdg-open should be replaced with an OS-specific command, of course (e.g., open in MacOS).

#!/usr/bin/env bash

if test -z "$1"
then
	echo >&2 "First arg is empty. Has to be the citation key."
	exit 1
fi

# see https://github.com/retorquere/zotero-better-bibtex/issues/1347
URL=$(curl http://localhost:23119/better-bibtex/json-rpc -X POST  \
	-H "Content-Type: application/json" \
	-H "Accept: application/json" \
	--data-binary '{"jsonrpc": "2.0", "method": "item.attachments", "params": ["'"$1"'"] }' \
	| jq -r '.result[0].open'
)
RES=$?
[[ -n $DEBUG ]] && echo >&2 ",zot-open-pdf: URL for \`$1\` from better-bibtex = \`$URL\`"
if [[ ! $RES == 0 ]] 
then
	echo >&2 "zot-open-pdf: pipeline exited with $RES. Failing."
	exit $RES
fi
xdg-open "$URL"

[1] https://github.com/abgruszecki/dotfiles/blob/8ec410c2ff43b3f9f831e3b7b9056f9692e59c1e/bin-scripts/bin/%2Czot-open-pdf

@dschaehi
Copy link

dschaehi commented Nov 30, 2024

I am not an elisp expert, so I used Claude 3.5 Sonnet to convert the script above to elisp, which seems to work on MacOS (using MacOS open):

(require 'json)
  (require 'url)

  (defcustom zot-max-retries 3
    "Maximum number of retries for connecting to Zotero."
    :type 'integer
    :group 'zotero)

  (defcustom zot-retry-delay 3
    "Delay in seconds between retries."
    :type 'integer
    :group 'zotero)

  (defun zot--is-zotero-running ()
    "Check if Zotero is running by attempting to connect to its API."
    (condition-case nil
        (with-current-buffer
            (url-retrieve-synchronously "http://localhost:23119/better-bibtex" t t 1)
          t)
      (error nil)))

  (defun zot--start-zotero ()
    "Start Zotero application."
    (start-process "zotero" nil "open" "-a" "Zotero"))

  (defun zot-open-pdf (citekey)
    "Open PDF for given CITEKEY using Zotero Better BibTeX."
    (unless citekey
      (error "Citation key cannot be empty"))

    (unless (zot--is-zotero-running)
      (message "Zotero is not running. Starting Zotero...")
      (zot--start-zotero)
      (let ((retries 0))
        (while (and (< retries zot-max-retries)
                    (not (zot--is-zotero-running)))
          (message "Waiting for Zotero to start (attempt %d/%d)..."
                   (1+ retries) zot-max-retries)
          (sleep-for zot-retry-delay)
          (setq retries (1+ retries)))
        (unless (zot--is-zotero-running)
          (error "Could not connect to Zotero after %d attempts" zot-max-retries))))

    (let* ((url "http://localhost:23119/better-bibtex/json-rpc")
           (json-payload (json-encode
                          `(("jsonrpc" . "2.0")
                            ("method" . "item.attachments")
                            ("params" . [,citekey]))))
           (url-request-method "POST")
           (url-request-extra-headers
            '(("Content-Type" . "application/json")
              ("Accept" . "application/json")))
           (url-request-data json-payload))

      (with-current-buffer
          (url-retrieve-synchronously url)
        (goto-char url-http-end-of-headers)
        (let* ((json-object-type 'alist)
               (json-response (json-read))
               (pdf-url (alist-get 'open
                                   (aref (alist-get 'result json-response) 0))))
          (when (getenv "DEBUG")
            (message "zot-open-pdf: URL for `%s` from better-bibtex = `%s`"
                     citekey pdf-url))
          (if pdf-url
              (start-process "open-pdf" nil "open" pdf-url)
            (error "Could not find PDF URL for citation key %s" citekey))))))

I think this can be easily integrated into Citar.

--
EDIT: I added error handling for the case when Zotero is not launched.

@dschaehi
Copy link

dschaehi commented Nov 30, 2024

(setq citar-open-entry-function 'citar-open-entry-in-zotero)
(setq citar-at-point-function 'embark-act)
(map! :map citar-embark-citation-map
      :n
      "<return>" nil
      "<return>" #'zot-open-pdf)

--
EDIT: Updated the config for Doom Emacs.

@bdarcus
Copy link
Contributor

bdarcus commented Nov 30, 2024

I haven't looked at this closely, @dschaehi , but did you see this earlier comment, and my reply?

@dschaehi
Copy link

dschaehi commented Dec 1, 2024

Oh, I only read the post quickly and didn't see the code because it was hidden.

I my case, the code I suggested above runs without any noticeable delay (I have more than 3000 items in my Zotero libray).

@bdarcus
Copy link
Contributor

bdarcus commented Dec 1, 2024

The delay would be expected if there's an indicator for the Zotero files (that needs to constantly update). That seems not the case ATM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants