-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
⚠ This is a bigger "first issue". Only take it if you have enough time for it. ⚠
Context
There are more and more fake references. JabRef has the infrastructure to check it, but it needs to be wired together.
A whole bib file should be checked.
There is a Python-script "RefChecker" doing it, but we want to do it integrated in JabRef
Related Work
See https://github.com/markrussinovich/refchecker
1. Implement logic
Goal
For each BibEntry, fetch authoritative metadata via its own identifiers and compare local vs fetched to classify into groups.
Groups to ensure (create if missing)
refcheck
├─ real paper
├─ unsure
└─ fake paper
Implementation can mirror org.jabref.gui.groups.GroupTreeViewModel#addSuggestedGroups.
Algorithm (per BibEntry)
-
Convert text to BibEntry
- In Prefernces > Web Search, there is "Default plain citation parser" configured. This one should be used.
- Use
org.jabref.logic.importer.plaincitation.SeveralPlainCitationParserto turn a text into a List of BibEntries
-
Resolve by DOI (preferred)
- If
StandardField.DOIpresent:
fetchauthoritativeEntryviaorg.jabref.logic.importer.fetcher.DoiFetcher#performSearchById(doi). - Else try to find a DOI via
org.jabref.logic.importer.fetcher.CrossRef#findIdentifier(entry); if found, fetch viaDoiFetcherand store asauthoritativeEntry
- If
-
Fallback: resolve by arXiv (
authoritativeEntrystill null)- If arXiv ID present or found via
org.jabref.logic.importer.fetcher.ArXivFetcher#findIdentifier(entry), fetch its metadata and store in ``authoritativeEntry`
- If arXiv ID present or found via
-
Compare: local vs
authoritativeEntryUse
org.jabref.logic.database.DuplicateCheck#isDuplicateto determine if local is a duplicate ofauthoritativeEntryIf yes: Add to group
real paper. If not: Add to groupfake paperreturn
Now: authoritativeEntry is null
-
Search paper using fetcher
Look up paper using
org.jabref.logic.importer.fetcher.CompositeSearchBasedFetcher.If something found: check if any entry is a duplicate of
local. If yes: If yes: Add to groupreal paper. If not: Add to groupfake paper
The current proposal does not make use of the group "unsure". Maybe, the DuplicateCheck class needs to be adapted accordingly.
2. Add test
For 1, tests need to be crafted. Think of TDD - and add tests before/while coding
3. Wire into CLI
A. Include refcheck --online/--offline <file.bib> in org.jabref.cli.ArgumentProcessor
B. Include refcheck --online/--offline <file.pdf> in org.jabref.cli.ArgumentProcessor
Note that --online and --offline are optional. If not given, the default plain citation parser is used.
For B
Import references from PDF into .bib using "New library based on references". Users can do --online and --offline (with --online being the default if AI is available. Error if --online and no AI available)
4. Wire into GUI
Create "Tools" > "Ref Checker"
Content:
Tab "Citations" and Tab "PDF File"
Tab Citations: Text field with citations
Tab "PDF File": Filename with "Browse" button
At the end of each tab: "Check". Then the functionality is called. On success, a new library is created in JabRef.
Code hints
Similar comparions are done at
- org.jabref.gui.mergeentries.newmergedialog.FieldRowViewModel#autoSelectBetterValue // most similar approach
- org.jabref.logic.database.DuplicateCheck#isDuplicate / org.jabref.logic.database.DuplicateCheck#compareFieldSet // but cannot be used as we really want to rely on a "high quality" BibEntry
Metadata
Metadata
Assignees
Labels
Type
Projects
Status