You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used this java -cp cermine-impl-1.13-jar-with-dependencies.jar pl.edu.icm.cermine.ContentExtractor -path data_raw/pdfs on a nested folder of pdfs, getting a null pointer exception.
java version "1.8.0_251"
Java(TM) SE Runtime Environment (build 1.8.0_251-b08)
Java HotSpot(TM) Client VM (build 25.251-b08, mixed mode)
Full error msg:
Exception in thread "main" java.lang.NullPointerException
at com.itextpdf.text.pdf.parser.PdfImageObject.decodeImageBytes(PdfImageObject.java:298)
at com.itextpdf.text.pdf.parser.PdfImageObject.<init>(PdfImageObject.java:199)
at com.itextpdf.text.pdf.parser.PdfImageObject.<init>(PdfImageObject.java:168)
at com.itextpdf.text.pdf.parser.ImageRenderInfo.prepareImageObject(ImageRenderInfo.java:150)
at com.itextpdf.text.pdf.parser.ImageRenderInfo.getImage(ImageRenderInfo.java:140)
at pl.edu.icm.cermine.structure.ITextCharacterExtractor$BxDocumentCreator.renderImage(ITextCharacterExtractor.java:366)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor$ImageXObjectDoHandler.handleXObject(PdfContentStreamProcessor.java:1311)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.displayXObject(PdfContentStreamProcessor.java:375)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.access$6100(PdfContentStreamProcessor.java:83)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor$Do.invoke(PdfContentStreamProcessor.java:1023)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.invokeOperator(PdfContentStreamProcessor.java:310)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:448)
at pl.edu.icm.cermine.structure.ITextCharacterExtractor.extractCharacters(ITextCharacterExtractor.java:112)
at pl.edu.icm.cermine.ExtractionUtils.extractCharacters(ExtractionUtils.java:60)
at pl.edu.icm.cermine.InternalContentExtractor.doWork(InternalContentExtractor.java:346)
at pl.edu.icm.cermine.InternalContentExtractor.getImages(InternalContentExtractor.java:169)
at pl.edu.icm.cermine.ContentExtractor.getImages(ContentExtractor.java:290)
at pl.edu.icm.cermine.ContentExtractor.getImages(ContentExtractor.java:307)
at pl.edu.icm.cermine.ContentExtractor.main(ContentExtractor.java:805)
The text was updated successfully, but these errors were encountered:
I just remembered filing a similar issue (#36) a few years ago. back then I asked whether there was a way for exception handling built into CERMINE - is that the case? otherwise I would try to run it from python to skip erroneous attempts.
this is a great tool btw, we are just about to submit our first publication based entirely on results obtained from CERMINE
I used this
java -cp cermine-impl-1.13-jar-with-dependencies.jar pl.edu.icm.cermine.ContentExtractor -path data_raw/pdfs
on a nested folder of pdfs, getting a null pointer exception.PDF with the issue:
https://www.cloud.luerig.net/index.php/s/CKQRnDePF9aRFwo
My java version (Windows 10 machine):
Full error msg:
The text was updated successfully, but these errors were encountered: