Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error and crash: Exception in thread "main" java.lang.IllegalArgumentException: Illegal group reference: group index is missing #36

Closed
mluerig opened this issue Dec 21, 2016 · 2 comments

Comments

@mluerig
Copy link

mluerig commented Dec 21, 2016

I am running CERMINE through 10000 or so pdfs, but some of them throw this error, and the program stops running. Can I somehow fix this, or tell CERMINE to skip errors an continue?

File processed: /home/moritz/Desktop/pdf_extraction/pdfs/Other/��ztun΍ et al. 1991 - A new haloether from Laurencia possessing a lauroxacyclododecane ring. Structure and conformational studies.pdf Exception in thread "main" java.lang.IllegalArgumentException: Illegal group reference: group index is missing at java.util.regex.Matcher.appendReplacement(Matcher.java:819) at pl.edu.icm.cermine.content.cleaning.ContentCleaner.cleanHyphenationAndBreaks(ContentCleaner.java:180) at pl.edu.icm.cermine.content.cleaning.ContentCleaner.cleanAllAndBreaks(ContentCleaner.java:236) at pl.edu.icm.cermine.metadata.model.DocumentMetadata.clean(DocumentMetadata.java:277) at pl.edu.icm.cermine.metadata.EnhancerMetadataExtractor.extractMetadata(EnhancerMetadataExtractor.java:106) at pl.edu.icm.cermine.metadata.EnhancerMetadataExtractor.extractMetadata(EnhancerMetadataExtractor.java:36) at pl.edu.icm.cermine.ExtractionUtils.cleanMetadata(ExtractionUtils.java:101) at pl.edu.icm.cermine.InternalContentExtractor.doWork(InternalContentExtractor.java:341) at pl.edu.icm.cermine.InternalContentExtractor.doWork(InternalContentExtractor.java:320) at pl.edu.icm.cermine.InternalContentExtractor.getContentAsNLM(InternalContentExtractor.java:286) at pl.edu.icm.cermine.ContentExtractor.getContentAsNLM(ContentExtractor.java:612) at pl.edu.icm.cermine.ContentExtractor.getContentAsNLM(ContentExtractor.java:628) at pl.edu.icm.cermine.ContentExtractor.main(ContentExtractor.java:724)

��ztun΍ et al. 1991 - A new haloether from Laurencia possessing a lauroxacyclododecane ring. Structure and conformational studies.pdf

@dtkaczyk
Copy link

Hi @mluerig
Thanks for reporting this. I just committed a bug fix, it should be included in the newest snapshot available here If you still experience any problems, please let us know.

@dtkaczyk
Copy link

dtkaczyk commented Jan 5, 2017

I am closing this now assuming the bug is fixed, please reopen if you still experience any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants