Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will not open file (Charset Exception when reading file) #8870

Closed
2 tasks done
CodeSJS opened this issue May 30, 2022 · 15 comments · Fixed by #8947
Closed
2 tasks done

Will not open file (Charset Exception when reading file) #8870

CodeSJS opened this issue May 30, 2022 · 15 comments · Fixed by #8947
Labels
component: import-load component: unicode unicode related issues [outdated] type: bug Confirmed bugs or reports that are very likely to be bugs

Comments

@CodeSJS
Copy link

CodeSJS commented May 30, 2022

JabRef version

Latest development branch build (please note build date below)

Operating system

Windows

Details on version and operating system

jabref 5.6 or 5.7 on WIndows 10

Checked with the latest development build

  • I made a backup of my libraries before testing the latest development version.
  • I have tested the latest development version and the problem persists

Steps to reproduce the behaviour

I moved a bib file to a new location and suddenly the file will not load. I will attach the log file - it's some java error that I cannot decipher. I opened the bib file in emacs, and had emacs check the syntax, and it passed the test. I beautified the file in emacs and jabref still will not load the file.

If I remove two-thirds of the file, jabref will load the file. I thought there might be a malformed entry. By trial and error, I determined the exact stopping point. However, when I remove that entry, jabref still fails.

The problem arose with jabref 5.6. I installed jabref 5.7 and the same problem arose.

jabref opens another file without a problem.

Appendix

Paste an excerpt of your log file here

Error opening file 'C:\Users\singer.2\OneDrive - The Ohio State University\Documents\Projects\TauFibrils\Notes\Bibliography\PeptideFibrils1.bib'
java.nio.charset.MalformedInputException: Input length = 1
	at java.base/java.nio.charset.CoderResult.throwException(Unknown Source)
	at java.base/sun.nio.cs.StreamDecoder.implRead(Unknown Source)
	at java.base/sun.nio.cs.StreamDecoder.read(Unknown Source)
	at java.base/java.io.InputStreamReader.read(Unknown Source)
	at java.base/java.io.BufferedReader.fill(Unknown Source)
	at java.base/java.io.BufferedReader.read(Unknown Source)
	at java.base/java.io.FilterReader.read(Unknown Source)
	at java.base/java.io.PushbackReader.read(Unknown Source)
	at [email protected]/org.jabref.logic.importer.fileformat.BibtexParser.read(Unknown Source)
	at [email protected]/org.jabref.logic.importer.fileformat.BibtexParser.skipWhitespace(Unknown Source)
	at [email protected]/org.jabref.logic.importer.fileformat.BibtexParser.parseFileContent(Unknown Source)
	at [email protected]/org.jabref.logic.importer.fileformat.BibtexParser.parse(Unknown Source)
	at [email protected]/org.jabref.logic.importer.fileformat.BibtexImporter.importDatabase(Unknown Source)
	at [email protected]/org.jabref.logic.importer.fileformat.BibtexImporter.importDatabase(Unknown Source)
	at [email protected]/org.jabref.logic.importer.OpenDatabase.loadDatabase(Unknown Source)
	at [email protected]/org.jabref.cli.ArgumentProcessor.importAndOpenFiles(Unknown Source)
	at [email protected]/org.jabref.cli.ArgumentProcessor.processArguments(Unknown Source)
	at [email protected]/org.jabref.cli.ArgumentProcessor.<init>(Unknown Source)
	at [email protected]/org.jabref.gui.JabRefMain.start(Unknown Source)
	at [email protected]/com.sun.javafx.application.LauncherImpl.lambda$launchApplication1$9(Unknown Source)
	at [email protected]/com.sun.javafx.application.PlatformImpl.lambda$runAndWait$12(Unknown Source)
	at [email protected]/com.sun.javafx.application.PlatformImpl.lambda$runLater$10(Unknown Source)
	at java.base/java.security.AccessController.doPrivileged(Unknown Source)
	at [email protected]/com.sun.javafx.application.PlatformImpl.lambda$runLater$11(Unknown Source)
	at [email protected]/com.sun.glass.ui.InvokeLaterDispatcher$Future.run(Unknown Source)
	at [email protected]/com.sun.glass.ui.win.WinApplication._runLoop(Native Method)
	at [email protected]/com.sun.glass.ui.win.WinApplication.lambda$runLoop$3(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)


@Siedlerchr
Copy link
Member

Hi, thanks for the report. This seems to be some problem with determining the right charset for reading. Can you check which charset it shows in emacs? JabRef always tries to guess the charset and ideally uses UTF8
Can you send us the file so we can use it for debugging purposes? (You can send it to [email protected], if you do not want to upload it:

@Siedlerchr Siedlerchr added the [outdated] type: bug Confirmed bugs or reports that are very likely to be bugs label May 30, 2022
@Siedlerchr Siedlerchr changed the title Will not open file Will not open file (Charset Exception when reading file= May 30, 2022
@Siedlerchr Siedlerchr changed the title Will not open file (Charset Exception when reading file= Will not open file (Charset Exception when reading file) May 30, 2022
@CodeSJS
Copy link
Author

CodeSJS commented May 30, 2022 via email

@ThiloteE
Copy link
Member

This problem goes both ways: Emacs could have changed their "undecided unix" export.
What happens if you import or open the non-working file with older versions of JabRef?

@ThiloteE
Copy link
Member

ThiloteE commented May 31, 2022

More info needed. Since you used One-drive, a file transfer could have destroyed some parts of your file; Old HDDs can wear out and lead to bits and bytes flipping; etc. There are a multitude of reasons why files can stop working.

  • Could you please provide the faulty .bib file?
  • What emacs version are you working with?

I installed Emacs 26.3 to debug this and was not able to change the encoding from UTF-8 (default) to something else. How did you do it? I tried the multi language settings... but it seems I have to enter some commands. Also, could not find the "undecided-unix" option.

From JabRef side: I remember the UTF-8 comment was removed in one of the recent versions. Might have been 5.5 or 5.6.

@CodeSJS
Copy link
Author

CodeSJS commented May 31, 2022 via email

@ThiloteE
Copy link
Member

ThiloteE commented Jun 1, 2022

Thank you that helped a lot.

Two things:

  • I was not able to reproduce. Doing as you suggested for a small library of mine and saving in undecided-unix with Emacs 26.3 still allowed me to open the file with JabRef 5.7 development version. So I suspect it really is something specific in your library.

  • You tried sending the library-file via e-mail, but GitHub does not support receiving files apparently. I think you will have to login into GitHub and upload the file directly. If it is sensitive information you also can send it to [email protected]

@CodeSJS
Copy link
Author

CodeSJS commented Jun 1, 2022

I will have to email file to [email protected]. github will not accept a bib file.

@CodeSJS
Copy link
Author

CodeSJS commented Jun 1, 2022

I will make the subject line attn ThiloteE

@Siedlerchr
Copy link
Member

You can simply add .txt or something similar, e.g. .bib.txt will be accepted by github

@ThiloteE
Copy link
Member

ThiloteE commented Jun 1, 2022

So I was able to reproduce. Could not open your file.

At which line do you think is the error?

Btw.:

The coding systems unix, dos, and mac are aliases for undecided-unix, undecided-dos, and undecided-mac, respectively. These coding systems specify only the end-of-line conversion, and leave the character code conversion to be deduced from the text itself.

Source: https://www.gnu.org/software/emacs/manual/html_node/emacs/Coding-Systems.html

Btw. my text editor (xed 3.2.2) also was not able to detect the encoding of the file.

@CodeSJS
Copy link
Author

CodeSJS commented Jun 1, 2022 via email

@ThiloteE
Copy link
Member

JabRef behaviour as explained in #8895 (comment) might be the cause for this issue here?

@ThiloteE
Copy link
Member

ThiloteE commented Jun 14, 2022

Other cause could be:

JabRef#75 (comment)
refs #8506

@Siedlerchr
Copy link
Member

I digged into this, know where the problem comes from. Need to explicity check for utf16be in the charset detector on import

Siedlerchr added a commit that referenced this issue Jul 3, 2022
Siedlerchr added a commit that referenced this issue Jul 11, 2022
* Fix charset detection with utf16 and others

Fixes #8895
Fixes #8870

* checkstyöe

* Fix typo in method names

* change newlines

* get bytes

* Set newline character to LF

* Revert "get bytes"

This reverts commit 1082f8a.

* progress

* switch line sep to LF

* Please work

* Try jitpack

* Add manual build of icu4j

* Check if we have ascii in the list of charsets

* fix checkstyle

* Update external-libraries.md

* Enocde with UTF-16BE

* Fix umlaut

* Hack to get test running

* Also compare meta data

* Add enforced ignorance of malformed characters

Source: http://biercoff.com/malformedinputexception-input-length-1-exception-solution-for-scala-and-java/

Co-authored-by: Christoph <[email protected]>

* checkstyle

* IntelliJ now also renders the file correctly

* Add test

Additionally

- Replace unknown characters
- Remove obsolete wrapping classes in test

* Refine CHANGELOG.md

* Remove non-working jpackage reference

Co-authored-by: Oliver Kopp <[email protected]>
Co-authored-by: Houssem Nasri <[email protected]>
@CodeSJS
Copy link
Author

CodeSJS commented Oct 11, 2022 via email

@koppor koppor moved this to Done in Prioritization Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: import-load component: unicode unicode related issues [outdated] type: bug Confirmed bugs or reports that are very likely to be bugs
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants