-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lines: support "rules" field for multiple sets of parsing regexes #463
Conversation
rmilecki
commented
Feb 4, 2023
This is a refactored version of #407. Rebased, with updated description and test case added. I think that added test provides a good idea why we may need this feature. The thing is it's not always possible to parse lines of varying columns with multiple regexes. Look at the example for included test case:
If some company changes orders of columns, we need different sets of rules for parsing such lines. It's bacause it's impossible to write a generic Let me know what do you think about such feature. |
Thanks for this PR. It does what it is supposed to do. 👍 🎉 Maybe it was not the intention of this PR. The possibility to parse multiple sets of regexes while applying the extraction rules (fist_line, last_line) and retaining the order of the input string in the output. Technical Background behind previous pr 378 Then there was the nested for loop applied on that particular line. (If it was not met detailed feedback was provided. Line: XXX matched, but is not between firstline and lastline, so ignoring) The fundamental difference seems to be that the code in this pr is considering the order of rules how they where written in the template file. (assumption). Instead of considering the order from the optimized string from the input file. What to do with this pr?? We can merge it?? |
Sometimes companies use more than 1 format for line-parseable data. They may randomly generate invoices with e.g. 1. Some extra columns that are used occasionally 2. Rearrange columns order Such format changes may be too invasive to support parsing with e.g. multiple "line" regexes. This commit adds "rules" field support to the "lines" parser. It allows defining multiple sets or regexes ("start", "end", "line" & friends) for a single upper field. Usage of "rules" is optional. Backward compatibility wiht existing templates is preserved. Signed-off-by: Rafał Miłecki <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functional Tests 👍