Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rules] [strings] Add support for some core Strings theory rules that reason about Regular expressions #46

Merged
merged 6 commits into from
Oct 16, 2024

Conversation

vinisilvag
Copy link
Contributor

@vinisilvag vinisilvag commented Oct 3, 2024

This PR adds the re_inter, re_unfold_neg and re_unfold_neg_concat_fixed rules from the Strings theory.
For simplification reasons, the re_unfold_neg_concat_fixed rule was splitted in a prefix version and suffix version (resp. re_unfold_neg_concat_fixed_prefix and re_unfold_neg_concat_fixed_suffix), just like it was done for other previously implemented rules.

Future PR's will add the re_unfold_pos rule.

Rules reference can be found in the cvc5 documentation and in the CPC specification.

@vinisilvag vinisilvag changed the title [rules] [strings] Add support for some core Strings theory rules that reasons about Regular expressions [rules] [strings] Add support for some core Strings theory rules that reason about Regular expressions Oct 3, 2024
Comment on lines 350 to 357
if let Some(s_1) = args.first() {
match s_1.as_ref() {
Term::Const(Constant::String(s)) => Ok(s.len()),
_ => Err(CheckerError::LengthCannotBeEvaluated(r.clone())),
}
} else {
unreachable!()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here it's simpler to use .first().unwrap(), since you're just pattern-matching on Some.

Suggested change
if let Some(s_1) = args.first() {
match s_1.as_ref() {
Term::Const(Constant::String(s)) => Ok(s.len()),
_ => Err(CheckerError::LengthCannotBeEvaluated(r.clone())),
}
} else {
unreachable!()
}
let s_1 = args.first().unwrap();
match s_1.as_ref() {
Term::Const(Constant::String(s)) => Ok(s.len()),
_ => Err(CheckerError::LengthCannotBeEvaluated(r.clone())),
}

Copy link
Contributor Author

@vinisilvag vinisilvag Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright!

Comment on lines 340 to 345
if let [r_1, r_2 @ ..] = &args[..] {
let r_2_concat = concat(pool, r_2.to_vec());
Ok(str_fixed_len_re(pool, r_1.clone())? + str_fixed_len_re(pool, r_2_concat)?)
} else {
unreachable!()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? If r_2.len() >= 2, concat will return a str.++ term, which will lead to an error when you call str_fixed_len_r(pool, r_2_concat). If it was instead an re.++ term, I suppose it would work. Either way, I think it would be simpler to just call str_fixed_len_re to each argument directly, and sum the results, e.g.

Suggested change
if let [r_1, r_2 @ ..] = &args[..] {
let r_2_concat = concat(pool, r_2.to_vec());
Ok(str_fixed_len_re(pool, r_1.clone())? + str_fixed_len_re(pool, r_2_concat)?)
} else {
unreachable!()
}
let mut lengths = args.iter().map(|a| str_fixed_len_re(pool, a.clone()));
lengths.try_fold(0, |acc, x| Ok(acc + x?))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not correct, indeed :(

I confused this function with another one I previously implemented and ended up calling the wrong one. Also, I did not call str_fixed_len_re for each argument directly because I got too focused on how CPC represents re.++ and str.++ terms ((str.++ a b c) is transformed into (str.++ a (str.++ b (str.++ c "")))). However, with the representation in arrays that we have internally, it is, in fact, the same as applying str_fixed_len_re directly to each argument.

I am going to change that there and in other places where a similar behavior occurs.

Comment on lines 359 to 397
Term::Op(Operator::ReUnion, args) => {
if let [r_1, r_2 @ ..] = &args[..] {
let n = str_fixed_len_re(pool, r_1.clone())?;
if r_2.len() == 1 {
match r_2.first().unwrap().as_ref() {
Term::Op(Operator::ReNone, _) => Ok(n),
_ => Err(CheckerError::LengthCannotBeEvaluated(r.clone())),
}
} else {
let new_r_2 = pool.add(Term::Op(Operator::ReUnion, r_2.to_vec()));
if str_fixed_len_re(pool, new_r_2)? == n {
Ok(n)
} else {
Err(CheckerError::LengthCannotBeEvaluated(r.clone()))
}
}
} else {
unreachable!()
}
}
Term::Op(Operator::ReIntersection, args) => {
if let [r_1, r_2 @ ..] = &args[..] {
let n = str_fixed_len_re(pool, r_1.clone())?;
if r_2.len() == 1 {
match r_2.first().unwrap().as_ref() {
Term::Op(Operator::ReAll, _) => Ok(n),
_ => Err(CheckerError::LengthCannotBeEvaluated(r.clone())),
}
} else {
let new_r_2 = pool.add(Term::Op(Operator::ReIntersection, r_2.to_vec()));
if str_fixed_len_re(pool, new_r_2)? == n {
Ok(n)
} else {
Err(CheckerError::LengthCannotBeEvaluated(r.clone()))
}
}
} else {
unreachable!()
}
Copy link
Collaborator

@bpandreotti bpandreotti Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the ReConcat case, in these two cases you are constructing a re.union/re.inter term with the remaining arguments just to call str_fixed_re_len recursively, which is not very efficient (for each argument, you allocate a vector with all the remaining arguments). I think it would be best to call str_fixed_re_len on each argument directly to get their lenghts, and then check if they are all the same.

Also, since these two cases are almost identical, differing only in which terms they ignore (re.none or re.all), it might make sense to put this logic in a separate function and call it in each case.

@bpandreotti bpandreotti merged commit fdd8ac7 into ufmg-smite:main Oct 16, 2024
3 checks passed
Mallku2 pushed a commit that referenced this pull request Nov 24, 2024
… reason about Regular expressions (#46)

* Add some core Strings theory rules that reason about RegEx

* Rename prefix and suffix functions

* Replace sanity checks with unreachable!()

* PR modifications

* Modifications in str_fixed_len_re

* Fix linting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants