-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I search ISSN, ISBN and DOI in a web-page, Not only URL? #280
Comments
Hi! It's an interesting idea, definitely in the spirit of Promnesia! For the backend it should be relatively easy, although will require some rethinking because currently it's aiming URLs mainly. But hopefully extracting ISBN/DOI is much easier than url and should be a simple regex. Possible problems I can think of are mainly on the frontend:
But DOI detection could be opt-in to start with so I don't find these too concerning :) Let me know if you want any guidance, there might be some rought edges, especially with all the extension shenanigans. And by the way you'll be very welcome in https://memex.zulipchat.com/ -- there are spaces there to discuss Promnesia in particular and you might get some input from other people as well (you can login with github -- so won't need to create a new account!) also related: #271 |
Depending on the site, these are very often in a <meta content="9780191776267" property="book:isbn"/>
<meta content="10.1093/actrade/9780192840943.001.0001" name="dc.identifier"/> An article from ACM similarly has: <meta name="dc.Identifier" scheme="doi" content="10.1145/953051.801372"> This is typical for journal publishers' sites. It's less convenient if you're looking at other pages, but e.g. Abebooks has |
However, I do get |
Right -- I guess this is because the URL extractor is on the relaxed side: we'd rather detect some non-URLs than not detect some URLs, since extra broken URLs only result in minor database bloat. |
Actually, I already in the memex chat. But I have no enough time to make an implementation now because of work.
Using |
I'm writing an indexer for org-roam and BibTeX to link between org-roam to web-browser.
Some org-file has citation syntax like below.
The bib file would be like this.
BibTeX can have ISBN or ISSN or DOI or URL.
The Indexer parse the BibTeX files first and links
URL
toROAM_REFS
andCUSTOM_ID
of the Org file.I think this quite works well.
However, some entries are books which have only ISBN.
I think Promnesia extension needs to scrape identifiers(ISBN, DOI) in web-page to link it to org-roam files.
Book sites except Amazon Kindle provide ISBN in open-graph meta of their web-page.
But I don't think it is a good idea. It means Promnesia extension needs some identifier parsers or using extra scraping in the indexer.
Can I add it to Promnesia to scrape identifiers in a web-page? Will it be a good idea?
The text was updated successfully, but these errors were encountered: