Skip to content

Implement RefChecker in JabKit #13604

@koppor

Description

@koppor

⚠ This is a bigger "first issue". Only take it if you have enough time for it. ⚠

Context

There are more and more fake references. JabRef has the infrastructure to check it, but it needs to be wired together.

A whole bib file should be checked.

There is a Python-script "RefChecker" doing it, but we want to do it integrated in JabRef

Related Work

See https://github.com/markrussinovich/refchecker

(LinkedIn-Post: https://www.linkedin.com/posts/markrussinovich_github-markrussinovichrefchecker-a-tool-activity-7355654490076696576-WycH?utm_source=share&utm_medium=member_desktop&rcm=ACoAAACCUVQBYmlu_A9exTiDRiuXB95v-LNYD4c)

1. Implement logic

Goal

For each BibEntry, fetch authoritative metadata via its own identifiers and compare local vs fetched to classify into groups.

Groups to ensure (create if missing)

refcheck
 ├─ real paper
 ├─ unsure
 └─ fake paper

Implementation can mirror org.jabref.gui.groups.GroupTreeViewModel#addSuggestedGroups.

Algorithm (per BibEntry)

  1. Convert text to BibEntry

    • In Prefernces > Web Search, there is "Default plain citation parser" configured. This one should be used.
    • Use org.jabref.logic.importer.plaincitation.SeveralPlainCitationParser to turn a text into a List of BibEntries
  2. Resolve by DOI (preferred)

    • If StandardField.DOI present:
      fetch authoritativeEntry via org.jabref.logic.importer.fetcher.DoiFetcher#performSearchById(doi).
    • Else try to find a DOI via org.jabref.logic.importer.fetcher.CrossRef#findIdentifier(entry); if found, fetch via DoiFetcher and store as authoritativeEntry
  3. Fallback: resolve by arXiv (authoritativeEntry still null)

    • If arXiv ID present or found via org.jabref.logic.importer.fetcher.ArXivFetcher#findIdentifier(entry), fetch its metadata and store in ``authoritativeEntry`
  4. Compare: local vs authoritativeEntry

    Use org.jabref.logic.database.DuplicateCheck#isDuplicate to determine if local is a duplicate of authoritativeEntry

    If yes: Add to group real paper. If not: Add to group fake paper

    return


Now: authoritativeEntry is null

  1. Search paper using fetcher

    Look up paper using org.jabref.logic.importer.fetcher.CompositeSearchBasedFetcher.

    If something found: check if any entry is a duplicate of local. If yes: If yes: Add to group real paper. If not: Add to group fake paper


The current proposal does not make use of the group "unsure". Maybe, the DuplicateCheck class needs to be adapted accordingly.

2. Add test

For 1, tests need to be crafted. Think of TDD - and add tests before/while coding

3. Wire into CLI

A. Include refcheck --online/--offline <file.bib> in org.jabref.cli.ArgumentProcessor
B. Include refcheck --online/--offline <file.pdf> in org.jabref.cli.ArgumentProcessor

Note that --online and --offline are optional. If not given, the default plain citation parser is used.

For B

Import references from PDF into .bib using "New library based on references". Users can do --online and --offline (with --online being the default if AI is available. Error if --online and no AI available)

Image

4. Wire into GUI

Create "Tools" > "Ref Checker"

Content:

Tab "Citations" and Tab "PDF File"

Tab Citations: Text field with citations

Tab "PDF File": Filename with "Browse" button

At the end of each tab: "Check". Then the functionality is called. On success, a new library is created in JabRef.

Code hints

Similar comparions are done at

  • org.jabref.gui.mergeentries.newmergedialog.FieldRowViewModel#autoSelectBetterValue // most similar approach
  • org.jabref.logic.database.DuplicateCheck#isDuplicate / org.jabref.logic.database.DuplicateCheck#compareFieldSet // but cannot be used as we really want to rely on a "high quality" BibEntry

Metadata

Metadata

Assignees

Labels

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions