Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to reduce long waits on bibtex file parsing #279

Closed
titaniumbones opened this issue Sep 8, 2021 · 11 comments
Closed

Trying to reduce long waits on bibtex file parsing #279

titaniumbones opened this issue Sep 8, 2021 · 11 comments
Labels
bug Something isn't working

Comments

@titaniumbones
Copy link
Contributor

Describe the bug

It takes quite a long time for bibtex files to be parsed when updated. I notice this as a Zotero user whose bibtex files are auto-exported when new sources get added. The parsing takes on the order of 30 seconds on a Thinkpad T490 running Arch Linux, on a main bibliography of about 2100 references (so, large, but not immense).

This period of time is long enough that I often forget what references I intend to insert :-)

I'm wondering if there are efficiencies somewhere that can improve parsing time.

To Reproduce
Steps to reproduce the behavior:

  1. Add a reference to bibtex library
  2. Perform some action that requires reparsing of the bibtex file
  3. Wait, and wait

Expected behavior
I suppose the right label here would actually be "desired behaviour" and hte answer would be "something that allows work to proceed while parsing is happening in the background, or else, cleverly reduces the number of parsing operations so that the whole cache doesn't have to be rebuilt from scratch.

Really, a proper database is what's asked for here, I guess, and that is difficult.

Emacs version:
28.0.50, org-mode and bibtex-actions both from git, though also both a week or two out of date.

@titaniumbones titaniumbones added the bug Something isn't working label Sep 8, 2021
@bdarcus
Copy link
Contributor

bdarcus commented Sep 8, 2021

Are you sure you're using bibtex-actions there?

That sounds like the sort of thing you might see with the oc-basic org insert processor, but not this package, which is now just using parsebib-parse directly. You should be able to test performance by just doing this from IELM:

(parsebib-parse '("/path/refs.bib"))

@bdarcus
Copy link
Contributor

bdarcus commented Sep 8, 2021

I just parsed this 25MB file:

https://github.com/fabrice102/bibboost/blob/master/test/crypto.bib

Seems to have about 50k entries.

This took awhile; 21 seconds to be precise. But it's a lot bigger than 2k.

My recollection is a 2k file on my laptop takes maybe a second, but probably less; certainly nothing close to 30.

Aside: with now using parsebib 3.0, bibtex-actions also supports CSL JSON. My guess is it will parse faster, but I haven't tested it.

Note to self: this package might be useful?

https://github.com/jwiegley/emacs-async

@bdarcus
Copy link
Contributor

bdarcus commented Sep 12, 2021

Also, see #278 (comment)

@andersjohansson
Copy link
Contributor

Same problem for me, but just parsing with parsebib-parse seems quick. Is it because formatting is applied to every entry in some way?

I've tested with both biblatex and json exports of my full library from zotero.

This runs quickly (3 seconds) and returns a hash-table with 8000 entries (my library)
(parsebib-parse '("~/bibliotek/bibliotek.json"))

This is slower (10 seconds) but seem to return a similar hash-table:
(parsebib-parse '("~/bibliotek/bibliotek.bib"))

A profiler sample when parsing json (trying to insert a citation which starts off parsing from bibtex-actions). This was canceled after about 20 seconds, since it never seems to be finished:

        3398  82% - command-execute
        3398  82%  - funcall-interactively
        3388  82%   - org-cite-insert
        3388  82%    - #<compiled 0x1d2987bf2bfd799e>
        3388  82%     - #<compiled -0x16a4b0e1abc7ba14>
        3388  82%      - oc-bibtex-actions-insert
        3388  82%       - bibtex-actions-select-keys
        3388  82%        - bibtex-actions--get-candidates
        3388  82%         - bibtex-actions-refresh
        3388  82%          - bibtex-actions--format-candidates
        2839  69%           - #<compiled 0x1dadd379ad466f22>
        2839  69%            - bibtex-actions-file--files-for-key
        2839  69%             - bibtex-actions-file--possible-names
        2839  69%              - bibtex-actions-get-entry
        2839  69%               - bibtex-actions--get-candidates
        2839  69%                - bibtex-actions-refresh
        2839  69%                 - bibtex-actions--format-candidates
        2539  61%                  - #<compiled 0x1dadd379ad466f22>
        2539  61%                   - bibtex-actions-file--files-for-key
        2539  61%                    - bibtex-actions-file--possible-names
        2539  61%                     - bibtex-actions-get-entry
        2539  61%                      - bibtex-actions--get-candidates
        2539  61%                       - bibtex-actions-refresh
        2539  61%                        - bibtex-actions--format-candidates
        1419  34%                         - parsebib-parse
        1419  34%                          - #<compiled -0x15195927c7afdef9>
        1399  34%                           - parsebib-parse-json-buffer
        1319  32%                            - parsebib-stringify-json
        1319  32%                             - #<compiled -0x58407f65c76a519>
        1309  31%                              - parsebib-stringify-json-field
        1259  30%                               - parsebib--json-stringify-name-field
        1249  30%                                - #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_13>
        1229  29%                                 - parsebib--process-template
         709  17%                                  - replace-regexp-in-string
         560  13%                                     #<compiled -0xa52f4eea2d86d02>
          80   1%                            + seq-filter
          20   0%                           + insert-file-contents
        1120  27%                         - #<compiled 0x1dadd379ad466f22>
        1120  27%                          - bibtex-actions-file--files-for-key
        1120  27%                           - bibtex-actions-file--possible-names
        1120  27%                            - bibtex-actions-get-entry
        1120  27%                             - bibtex-actions--get-candidates
        1120  27%                              - bibtex-actions-refresh
        1120  27%                               - bibtex-actions--format-candidates
        1120  27%                                - parsebib-parse
        1120  27%                                 - #<compiled 0x1893a25d62502107>
         940  22%                                  - parsebib-parse-json-buffer
          30   0%                                     parsebib-stringify-json
          20   0%                                     seq-filter
         300   7%                  - parsebib-parse
         300   7%                   - #<compiled -0x15195927c7afdef9>
         260   6%                    - parsebib-parse-json-buffer
          10   0%                       seq-filter
         549  13%           - parsebib-parse
         549  13%            - #<compiled -0x15c73bce1aafdef9>
         480  11%             - parsebib-parse-json-buffer
         260   6%              - parsebib-stringify-json
         260   6%               - #<compiled -0x58407f65c76a519>
         250   6%                - parsebib-stringify-json-field
         240   5%                 - parsebib--json-stringify-name-field
         240   5%                  - #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_13>
         240   5%                   - parsebib--process-template
         150   3%                    - replace-regexp-in-string
         110   2%                       #<compiled -0xa52f4eea2d86d02>
          20   0%              + seq-filter
          10   0%             + insert-file-contents
         660  16% + ...
          52   1% + timer-event-handler

Similar for bib-file. Canceled after 30 seconds:

       17051  89% - command-execute
       10661  55%  - funcall-interactively
       10584  55%   - org-cite-insert
       10584  55%    - #<compiled 0x1d2987bf2bfd799e>
       10584  55%     - #<compiled -0x16a4b0e1abc7ba14>
       10584  55%      - oc-bibtex-actions-insert
       10584  55%       - bibtex-actions-select-keys
       10584  55%        - bibtex-actions--get-candidates
       10584  55%         - bibtex-actions-refresh
       10584  55%          - bibtex-actions--format-candidates
        9574  50%           - #<compiled 0x1dadd379ad466f22>
        9574  50%            - bibtex-actions-file--files-for-key
        9574  50%             - bibtex-actions-file--possible-names
        9574  50%              - bibtex-actions-get-entry
        9574  50%               - bibtex-actions--get-candidates
        9574  50%                - bibtex-actions-refresh
        9574  50%                 - bibtex-actions--format-candidates
        9075  47%                  - #<compiled 0x1dadd379ad466f22>
        9075  47%                   - bibtex-actions-file--files-for-key
        9075  47%                    - bibtex-actions-file--possible-names
        9075  47%                     - bibtex-actions-get-entry
        9075  47%                      - bibtex-actions--get-candidates
        9075  47%                       - bibtex-actions-refresh
        9075  47%                        - bibtex-actions--format-candidates
        9025  47%                         - #<compiled 0x1dadd379ad466f22>
        9025  47%                          - bibtex-actions-file--files-for-key
        9025  47%                           - bibtex-actions-file--possible-names
        9025  47%                            - bibtex-actions-get-entry
        9025  47%                             - bibtex-actions--get-candidates
        9025  47%                              - bibtex-actions-refresh
        9005  47%                               - bibtex-actions--format-candidates
        9005  47%                                - parsebib-parse
        9005  47%                                 - #<compiled 0x18d43f40af502107>
        8736  45%                                  - parsebib-parse-bib-buffer
        8437  44%                                   - parsebib-read-entry
        6138  32%                                    - parsebib--parse-bibtex-field
        3629  18%                                     - parsebib--parse-bib-value
        1919  10%                                      - parsebib--expand-strings
        1889   9%                                       - #<compiled 0x104dd3f8f877f5fe>
          80   0%                                        - replace-regexp-in-string
          30   0%                                           #<compiled 0xa330a6151e313>
        1560   8%                                      - parsebib--match-delim-forward
        1360   7%                                       - parsebib--match-brace-forward
        1160   6%                                          forward-sexp
        1420   7%                                    - parsebib--match-paren-forward
        1390   7%                                     - parsebib--match-brace-forward
        1300   6%                                        forward-sexp
          40   0%                                     parsebib-find-next-item
          20   0%                               + bibtex-actions--local-files-to-cache
          50   0%                         + parsebib-parse
         499   2%                  + parsebib-parse
        1010   5%           + parsebib-parse
          41   0%   + eval-defun
          10   0%   + profiler-report
          10   0%   + buffer-flip
          10   0%     execute-extended-command
        6390  33%  + byte-code
        1865   9% + ...
         138   0% + timer-event-handler
          43   0%   mouse-fixup-help-message
          26   0% + redisplay_internal (C function)
          12   0% + flyspell-post-command-hook
          10   0%   org-table-header-set-header

@andersjohansson
Copy link
Contributor

BTW, this is on main: 75a7ffa

@bdarcus
Copy link
Contributor

bdarcus commented Sep 13, 2021

Thanks.

I'm working on changes (started as a bug fix, but is broader now) on this branch that might address this.

#281

@bdarcus
Copy link
Contributor

bdarcus commented Sep 13, 2021

Just to clarify, @andersjohansson, you see the same even without org-cite? E.g. if you do bibtex-actions-refresh?

I haven't fully tested #281, but if you have time to test that and let me know, that'd be great.

@andersjohansson
Copy link
Contributor

andersjohansson commented Sep 13, 2021

Oh, that seemed to work fine (loading in about 8 seconds) but now I can’t reproduce it so easily either. Launching emacs and creating an org-buffer (without a file) and then doing org-cite-insert loads the cache and lets me choose a citation after less than 10 seconds. Then I tried starting emacs and opening an org-file and then doing org-cite-insert (usual use case), and that went into the never-finishing loading of data...

I'll try out #281, I saw that you were doing something there after my first comment earlier, but was unsure of if it might solve this 😄

@andersjohansson
Copy link
Contributor

andersjohansson commented Sep 13, 2021

For my first test, it seems to work without getting stuck, but see #284.

@titaniumbones
Copy link
Contributor Author

Are you sure you're using bibtex-actions there?

That sounds like the sort of thing you might see with the oc-basic org insert processor, but not this package, which is now just using parsebib-parse directly. You should be able to test performance by just doing this from IELM:

So, it turned out that my configuration code was not getting loaded for silly reasons related to a bad regexp in my init file. Yes, in my case oc-basic was responsible for the extremely long waits.

It seems way more responsive now, though I haven't done any formal testing & it sounds like @andersjohansson is on the track of something more substantive.

I'll refrain from closing, as the conversation is of more value than my report...

@bdarcus
Copy link
Contributor

bdarcus commented Sep 14, 2021

I merged #281 given the serious bug, but it could use some more testing.

@bdarcus bdarcus closed this as completed Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants