-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize and generalize has-notes
#623
Comments
Just using a hash table would work, but perhaps the API of (defun citar-org-roam--has-note ()
(let ((keys (make-hash-table :test #'equal)))
;; Store keys that have notes in hash table
(dolist (record (org-roam-db-query
[:select ref :from refs :where (= type "cite")]))
(puthash (car record) t keys))
;; Return predicate that queries hash table for given key
(lambda (citekey _entry)
(gethash citekey keys)))) The advantage is that the use of a hash table is just an internal implementation detail of this function, and is not part of its public API in any way. I hope that makes sense? I can clarify if needed. |
Just to clarify, more general than a file; an associated note, which may be a file, but could also be a sub-file node. Your example makes sense to me, but how would I hook that up to citar? Would do something like this, and have (setq citar-has-note-functions '(citar-file--has-note citar-org-roam--has-note)) |
Yes, exactly, |
Ok, I'll whip up a PR. Thanks. I'm thinking I'll make those functions public, though, since we'd want to encourage their use for this sort of thing. |
Revert the keys-with-notes defcustom and code, and instead generalize 'citar-has-note', which now iterates through 'citar-note-backends'. Fix #623
citar-keys-with-notes
has-notes
@roshanshariff - follow-up on performance. So this is much better and faster than the earlier code. But it's still a fair bit slower than checking the contents of the entry, after So with this: (defun my/np1 ()
(lambda (_key entry)
(assoc "has-note" entry))) ... this result, and comparison to using ELISP> (benchmark-run-compiled 500 (citar--get-candidates nil (my/np1)))
(0.26864172399999997 0 0.0)
ELISP> (benchmark-run-compiled 500 (citar--get-candidates nil (citar-has-note)))
(5.593660889000001 15 2.1500094049999987) I think the performance difference might be noticeable with large candidate lists and a lot of associated files, and filtering with Any thoughts on what, if anything, to do with this? Like would it make sense to have two of these functions; one designed for creating the candidates, and the other for filtering after-the-fact? Maybe I can name them for the contexts they're optimized for; like |
I'll have to think about it some more, but that benchmark isn't entirely fair. I think the call to (let ((filter (citar-has-note)))
(benchmark-run-compiled 500 (citar--get-candidates nil filter))) Though even then I'm not sure if the I think the question really comes down to whether you want the has-note/has-file metadata to be updated every time you filter, or only when the candidate cache is updated. Currently the cache has the formatted candidates, which necessarily includes the has-note and has-file tags. My feeling is this is a bad idea, because it forces a lot more cache invalidations; for correctness you have to invalidate whenever a note or file is created/deleted. This needs file watches on directories, or somehow knowing when a new Org-Roam note is created, which strikes me as unnecessary complexity. I suspect that the slowest part of recreating the cache is actually parsing the bib file. If this is the case, then it would be better to just cache the parsed bib file and generate the rest of the metadata as-needed. Or perhaps include the string representation in the cache, but tack on the note and file indicators as-needed. You would only have to invalidate the cache when the actual bib file changes, which would require re-parsing anyway. It'll take some benchmarks to figure out whether the candidate formatting is fast enough to not cache, and how much time it takes compared to parsing the bib file. One nice feature of formatting the candidates on the fly is that that you can fill the width of the emacs window, without having to regenerate the cache every time the window is resized... |
The original code had no cache, which led to an early feature request #68 #69. It definitely got much more responsive with the cache. But yeah, there's that trade-off. With the corrected benchmark: ELISP> (benchmark-run-compiled 500 (citar--get-candidates nil filter))
(1.157143517 4 0.4320387100000005)
I agree.
So in this scenario the cache would be the hash created by parsebib (where key = Adding the conses to the hash values seems straightforward: (defvar citar-cache--db
(parsebib-parse citar-bibliography))
(defun citar-cache--get-entry (citekey)
"Return entry data for CITEKEY."
(gethash citekey citar-cache--db))
(defun citar-cache--get-value (citekey field)
"Return FIELD value for CITEKEY."
(let ((entry (citar-cache--get-entry citekey)))
(cdr (assoc field entry))))
(defun citar-cache--has-files-p (citekey)
(citar-cache--get-value citekey "has-files"))
(defun citar-cache--has-notes-p (citekey)
(citar-cache--get-value citekey "has-notes"))
(defun citar-cache--entry-update-metadata (citekey)
"Update the cached bibliographic data CITEKEY with additional metadata."
(let* ((hasnotes (when (citar-cache--has-notes-p citekey) "has-notes"))
(hasfiles (when (citar-cache--has-files-p citekey) "has-files")))
(dolist (metafield (list hasnotes hasfiles))
(when metafield
(push (cons metafield t) ; needs to be smarter
(gethash citekey citar-cache--db))))))
(defun citar-cache--update-metadata ()
"Update the cached bibliographic data with additional metadata."
(let ((keys (hash-table-keys citar-cache--db)))
(dolist (key keys)
(citar-cache--entry-update-metadata key))))
By "string representation", you mean the candidates created by My impulse says the first approach is better, mostly because the API becomes more intuitive and consistent, and it would isolate or abstract things specific to OTOH, not so sure it's that easy to adapt The affixation content isn't actually included in the candidates. Not sure, but that might provide an opportunity? I wonder if this case is suitable for programmed completion? |
I'm experimenting on #625. |
Sorry for this breaking change, but I wanted to get the foundations right before tagging 1.0. This completely restructures the core of citar to borrow some code and ideas from the org-mode oc-basic package. In particular, it changes to using two primary caches: - bibliography - completion Both of these now use hash tables, rather than lists. Caching functionality is also changed, and the API now focuses on citekeys as arguments for key functions. Finally, citar--parse-bibliography should re-parse bibliography files upon change. Fix #623 Close #627
Sorry for this breaking change, but I wanted to get the foundations right before tagging 1.0. This completely restructures the core of citar to borrow some code and ideas from the org-mode oc-basic package. In particular, it changes to using two primary caches: - bibliography - completion Both of these now use hash tables, rather than lists. Caching functionality is also changed, and the API now focuses on citekeys as arguments for key functions. Finally, citar--parse-bibliography should re-parse bibliography files upon change. Fix #623 Close #627
Sorry for this breaking change, but I wanted to get the foundations right before tagging 1.0. This completely restructures the core of citar to borrow some code and ideas from the org-mode oc-basic package. In particular, it changes to using two primary caches: - bibliography - completion Both of these now use hash tables, rather than lists. Caching functionality is also changed, and the API now focuses on citekeys as arguments for key functions. Finally, citar--parse-bibliography should re-parse bibliography files upon change. Fix #623 Close #627
Sorry for this breaking change, but I wanted to get the foundations right before tagging 1.0. This completely restructures the core of citar to borrow some code and ideas from the org-mode oc-basic package. In particular, it changes to using two primary caches: - bibliography - completion Both of these now use hash tables, rather than lists. Caching functionality is also changed, and the API now focuses on citekeys as arguments for key functions. Finally, citar--parse-bibliography should re-parse bibliography files upon change. Fix emacs-citar#623 Close emacs-citar#627
Sorry for this breaking change, but I wanted to get the foundations right before tagging 1.0. This completely restructures the core of citar to borrow some code and ideas from the org-mode oc-basic package. In particular, it changes to using two primary caches: - bibliography - completion Both of these now use hash tables, rather than lists. Caching functionality is also changed, and the API now focuses on citekeys as arguments for key functions. Finally, citar--parse-bibliography should re-parse bibliography files upon change. Fix emacs-citar#623 Close emacs-citar#627
@roshanshariff - I got the notes API stuff merged (via #603), and it works, but the performance discrepancy here is obvious.
After a bit of experimenting, I think the issue is really just list vs hash performance. If I simply convert
citar-keys-with-notes
from list to hash, the performance is much better again.So my impulse is just to do that.
Do you have any better suggestions?
For example, possible to generalize
citar-has-note
more?TIA.
The text was updated successfully, but these errors were encountered: