Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Improve informativeness of "characters of junk seen at toplevel" warnings #488

Open
tvercaut opened this issue Jan 17, 2025 · 4 comments

Comments

@tvercaut
Copy link

tvercaut commented Jan 17, 2025

Transitioning from bibtex to biber, I experience some warnings related to "spurious" comments or characters in bib files. bibtex was not complaining about it. I understand why the warnings are being issued and would like to address them. However, I found it extremely difficult to track down where the warnings were coming from when dealing with large bibtex files. Ideally, the warning message would indicate the line number in the bibtex file and even highlight the offending character.

Below is a minimal reproducing example (running biber --tool on the bibtex file directly also shows the warning):

\documentclass{article}
\usepackage{biblatex}
\addbibresource{refs.bib}

\begin{document}
\nocite{*}
\printbibliography
\end{document}
@article{knuth:1984,
  title={Literate Programming},
  author={Donald E. Knuth},
  journal={The Computer Journal},
  volume={27},
  number={2},
  pages={97--111},
  year={1984},
  publisher={Oxford University Press}
}
}

@inproceedings{lesk:1977,
  title={Computer Typesetting of Technical Journals on {UNIX}},
  author={Michael Lesk and Brian Kernighan},
  booktitle={Proceedings of American Federation of
             Information Processing Societies: 1977
             National Computer Conference},
  pages={879--888},
  year={1977},
  address={Dallas, Texas}
}

leading to the following warning

WARN - BibTeX subsystem: /tmp/biber_tmp_mCT5/8fe4c9854184545302f21ebf0d47516c_15.utf8, line 13, warning: 1 characters of junk seen at toplevel
INFO - WARNINGS: 1
Biber warning: [173] Biber.pm:131> WARN - BibTeX subsystem: /tmp/biber_tmp_mCT5/8fe4c9854184545302f21ebf0d47516c_15.utf8, line 13, warning: 1 characters of junk seen at toplevel
@plk
Copy link
Owner

plk commented Jan 18, 2025

We have very limited things we can do with this because we use an ancient C library for parsing bibtex files which doesn't use modern STDIO etc. However, see that last line which mentions the tmp file - can you look in that file at the line number and find the issue there? If the file doesn't exist, use the --noremove-tmp-dir option to leave it in place after a run so you can see the intermediate files to which this refers.

@tvercaut
Copy link
Author

Thanks. The temp files was indeed automatically removed. The --noremove-tmp-dir option allows to keep it but the error message is quite difficult to understand with it. The temp file content is as such:

@article{knuth:1984,
  title = {Literate Programming},
  author = {Donald E. Knuth},
  journal = {The Computer Journal},
  volume = {27},
  number = {2},
  pages = {97--111},
  year = {1984},
  publisher = {Oxford University Press},

}

@inproceedings{lesk:1977,
  title = {Computer Typesetting of Technical Journals on {UNIX}},
  author = {Michael Lesk and Brian Kernighan},
  booktitle = {Proceedings of American Federation of Information Processing Societies: 1977 National Computer Conference},
  pages = {879--888},
  year = {1977},
  address = {Dallas, Texas},

}

The "junk" character (}) does not appear in the temp file anymore despite what the warning message says (line 13 refers to @inproceedings{lesk:1977, in the temp file).

@plk
Copy link
Owner

plk commented Jan 21, 2025

This is quite hard to fix because biber reads the data in, re-encodes it to UTF8 and that's the file you are seeing. The warning comes from the ancient underlying C library which doesn't prevent the file being parsed and converted. There is not a lot we can do about this currently without using a different library for parsing the initial data and that's not a trivial change.

@tvercaut
Copy link
Author

Ok, thanks for the heads up. Feel free to close this feature request as "won't fix" or similar or leave it open for future reference in case the C library gets update later down the road.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants