-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add string containment functions #256
feat: add string containment functions #256
Conversation
extensions/functions_string.yaml
Outdated
description: The substring to search for. | ||
return: i8 | ||
- | ||
name: regexp_strpos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these regular expression ones, I'm not sure whether the input and regex pattern can have mismatched types, so i just kept it consistent for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that many of my comments apply all over the place, this list is not exhaustive. Regexes in particular are a huge can of worms to open if we want any usable level of precision as to what needs to be supported and what doesn't. Maybe it's better to postpone the regex stuff to a later PR.
extensions/functions_string.yaml
Outdated
- options: [ CASE_SENSITIVE, CASE_INSENSITIVE ] | ||
required: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regular expressions generally have a lot more options than just case sensitivity. Dot-all/multiline comes to mind. I'm not sure how else we could represent the options, but if we do it this way and ever we'd want to add more options here we'd have to break the signature of the function.
extensions/functions_string.yaml
Outdated
- | ||
name: regexp_strpos | ||
description: >- | ||
Return the position of the first occurrence of a regular expression pattern in another |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a regular expression pattern
Which flavor? There are many, and some vary greatly in terms of performance.
extensions/functions_string.yaml
Outdated
- value: "string" | ||
name: "pattern" | ||
description: The regular expression pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to consider requiring the regex to be constant. If not, we need to consider what happens if the pattern fails to compile on a row-by-row basis.
extensions/functions_string.yaml
Outdated
- value: "string" | ||
name: "replacement" | ||
description: The string to replace the regular expression match with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support backreferences in replacements? Named or only numbered?
Thanks for all the input! It definitely seems like there should be a longer discussion/writeup before the regex stuff is attempted. I'll remove those from this PR. And we can use your initial comments on here later on. |
extensions/functions_string.yaml
Outdated
return: i64 | ||
- name: replace | ||
description: >- | ||
Replace all occurrence of the substring with the replacement string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace all occurrence of the substring with the replacement string. | |
Replace all occurrences of the substring with the replacement string. |
I'll leave the regex discussions as unresolved, but aside from above grammar nit and overlapping/non-overlapping option/description for |
Updated both! I decided to stick with just the description update for the overlap/non-overlap for now, since I couldn't actually find anything like this that provides that option as part of a function. |
extensions/functions_string.yaml
Outdated
- value: "varchar<L1>" | ||
name: "input" | ||
description: The input string. | ||
- value: "varchar<L1>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the input string need to be the same length? If not, l1 shouldn't be repeated.
extensions/functions_string.yaml
Outdated
- value: "varchar<L1>" | ||
name: "input" | ||
description: Input string. | ||
- value: "varchar<L1>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated both
3dd34eb
to
13eb466
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @richtia !
Looks like a check is failing. @richtia , can you address? Thx! |
cac2771
to
391224f
Compare
@jacques-n Yep. Just updated one of my commit messages due to linting issues. |
extensions/functions_string.yaml
Outdated
- value: "varchar<L3>" | ||
name: "replacement" | ||
description: The replacement string. | ||
return: "varchar<L1 - L2 + L3>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I agree with that one. The actual length depends on the number of replacements, this arbitrarily assumes there was one replacement. I'd just put L1 there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh...good point. Updated. Thanks!
PR to add definitions for string containment functions.