Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--auto-hybrid-regex bad offset into UTF string #143

Open
InvisOn opened this issue Aug 22, 2022 · 0 comments
Open

--auto-hybrid-regex bad offset into UTF string #143

InvisOn opened this issue Aug 22, 2022 · 0 comments

Comments

@InvisOn
Copy link

InvisOn commented Aug 22, 2022

--auto-hybrid-regex should produce the same output as --pcre2 if one passes in a PCRE2 pattern. However, it throws an error on some files.

I've included the following examples and files.

With --auto-hybrid-regex:

rga '(?=.*biotic interaction)(?=.*plant)' 'Diversity and Distributions - 2008 - Catford - Reducing redundancy in invasion ecology by integrating hypotheses into a.pdf' --auto-hybrid-regex

Faulty output:

Diversity and Distributions - 2008 - Catford - Reducing redundancy in invasion ecology by integrating hypotheses into a.pdf: preprocessor command failed: '"/home/aj/.cargo/bin/rga-preproc" "Diversity and Distributions - 2008 - Catford - Reducing redundancy in invasion ecology by integrating hypotheses into a.pdf"': PCRE2: error matching: bad offset into UTF string

With --pcre2:

rga '(?=.*biotic interaction)(?=.*plant)' 'Diversity and Distributions - 2008 - Catford - Reducing redundancy in invasion ecology by integrating hypotheses into a.pdf' --pcre2

Correct output:

Page 18: Vázquez, D.P. (2006) Biotic interactions and plant invasions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant