Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for lookaheads in the regex parser #30433

Open
tspearconquest opened this issue Jan 28, 2022 · 6 comments
Open

Support for lookaheads in the regex parser #30433

tspearconquest opened this issue Jan 28, 2022 · 6 comments
Labels
enhancement new new issue not yet triaged

Comments

@tspearconquest
Copy link

tspearconquest commented Jan 28, 2022

Current Terraform Version

Terraform v1.1.3
on darwin_amd64
+ provider registry.terraform.io/hashicorp/azurerm v2.92.0

Use-cases

Many times it is required to input an IP or subnet somewhere in order to build a resource. Often, these inputs are unsanitized but it is desirable to perform input sanitization for variables containing IPs by using a regex.

I found the following post which provides a number of examples of regex patterns to return only correct matches for IPs and so I thought I'd try using length(regexall()) with a slight variation on the second shortest example:

https://stackoverflow.com/questions/5284147/validating-ipv4-addresses-with-regexp

My attempt to use it in a validation block:

variable "vnet_address_space" {
  description = "Virtual network address space"
  type        = string

  validation {
    condition     = length(regexall("^((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])([.](?!$)|$)){4}[/](3[0-2]|[12][0-9]|[0-9])$", var.vnet_address_space)) > 0
    error_message = "The vnet address space must be a valid network address."
  }
}

But I got an error:

│ Error: Invalid function argument
│
│   on variables.tf line 26, in variable "vnet_address_space":
│   26:     condition     = length(regexall("^((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])([.](?!$)|$)){4}[/](3[0-2]|[12][0-9]|[0-9])$", var.vnet_address_space)) > 0
│
│ Invalid value for "pattern" parameter: invalid regexp pattern: invalid or unsupported Perl syntax in (?!.

In the error above, the part: ^((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])([.](?! is underlined, meaning the error is being caused by the capture group ([.](?!$)|$) which leads me to believe that the negation operator for a lookahead is failing

Attempted Solutions

I googled to see if the other way to negate a match would work. That is... [^] with the thing you don't want to match inside (the end of line character $ for me.) It doesn't match because the $ is literal instead of an end of line character when it is used inside a character class.

Proposal

References

@tspearconquest tspearconquest added enhancement new new issue not yet triaged labels Jan 28, 2022
@tspearconquest tspearconquest changed the title Support for negative lookaheads in the regex parser Support for lookaheads in the regex parser Jan 28, 2022
@tspearconquest
Copy link
Author

I wanted to see if I could work around this with a positive lookahead, but that also fails.

@apparentlymart
Copy link
Contributor

Hi @tspearconquest,

Terraform is using the "RE2" regular expression engine, which does indeed omit several of the more "expensive" features from other regex engines like Perl's.

I don't expect we will aim to grow the regex support here because regular expressions are not Terraform's focus and we don't wish to risk changing behavior of existing modules using the current engine.

However, focusing on your underlying use-case rather than the regex solution to it (thanks for including that context!), perhaps another way to get there would be to use cidrnetmask, which should succeed if given a valid IPv4 prefix and fail otherwise:

can(cidrnetmask(var.vnet_address_space))

This function parses its argument as CIDR notation in order to do its job, and so it already has a definition of the expected syntax built in to it.

@tspearconquest
Copy link
Author

Hi, thanks for pointing that out. I hadn't thought of it. This should fill the gap for the example, certainly.

Now unfortunately I must say the azurerm provider is inconsistent in that some resources may take an IP without a CIDR mask, while some others may require the CIDR mask, and some may take an IP in either form. This may be done because the underlying infrastructure purchased by the provider may accept a CIDR mask in some places while in others it doesn't.

We would like to be able to take either form in some variables where both forms are supported. The use of cidrnetmask() would prevent that due to it requiring a CIDR mask.

I could open an issue with the provider to ask for this support, but want to make the case that it could be a useful feature to add a function for this purpose to Terraform. Something which can validate both forms, as well as possibly take an argument in the form of a list of IP ranges considered valid, so that you can (for example) prevent ranges outside of your subnets from being allowed via the variable validation rule.

Would you be open to accepting a feature request for this?

@apparentlymart
Copy link
Contributor

apparentlymart commented Feb 1, 2022

Hi @tspearconquest,

The functions built in to the Terraform language are typically ones that have broad applicability, and the existing cidr... family of functions is already arguably a bit too specialized but they happened to come early in Terraform's life where we hadn't yet formed a strong opinion about what belongs in the core language and what doesn't. A function with as much flexibility and configurability as you're describing seems like it would be quite complicated and not really of a scale that we typically expect for built-in functions.

Our typical answer for the possibility of user-defined behavior like this today is to encourage writing a custom provider which offers a data source that provides whatever information is needed, but that approach won't help in this particular case because it isn't valid to refer to a data resource from a validation rule, and even if it were it would cause a dependency cycle to have the data resource refer to the variable and the variable also refer to the data resource.

Given all of that, I think what you've raised here would be a good additional example use-case for #2771, where we're considering how to allow provider plugins to contribute extension functions to the Terraform language. We're not currently actively working on that, but I'm going to create a link between these two issues so that we can see this additional use-case when considering that other issue in future, and take it into account when prioritizing what to work on next.

In the meantime, it is true that cidrnetmask does require full CIDR notation rather than just an IP address, but it does accept a /32 suffix to indicate that the entire address space is covered:

> cidrnetmask("10.1.1.1/32")
"255.255.255.255"

While I do see that this is getting considerably more fiddly than my original suggestion, you could potentially support both by trying both forms, like this:

can(cidrnetmask(var.vnet_address_space)) || can(cidrnetmask("${var.vnet_address_space}/32"))

Thanks!

@tspearconquest
Copy link
Author

Thank you as well. This looks like a suitable workaround for now. Thanks also for considering the idea and the additional insight. Do you want me to keep this open or close it out?

@AlexAtkinson
Copy link

I just destroyed my brain trying to get standard lookaheads working with Terraform regex before I realized the function is incorrectly named (perhaps change it to regexlite?).

For anyone looking for more complex validation reqs, have a look at this example that satisfies RDS Postgres; and the following CIDR examples I slapped in just to help out anyone here for that particular nugget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new new issue not yet triaged
Projects
None yet
Development

No branches or pull requests

3 participants