-
-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add min_fold_count_limit optimization #423
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some thoughts
// We do not apply the min_fold_count_limit optimization if we have an upper bound, | ||
// in the form of max_fold_count_limit, because if we have a required upperbound, | ||
// then we can't stop at the lower bound, if we are required to observe whether we hit the | ||
// upperbound, it's no longer safe to stop at the lower bound. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it too much to repeat the same exact thing another time at the end here after the comma?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a bit hard to read. Can you try rephrasing it a bit, perhaps splitting into two or more sentences with clearer structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded to now be:
If we must collect the fold up to our upperbound of `max_fold_count_limit`,
then we won't use our lowerbound of `min_fold_count_limit`, as by definition
the upperbound will be larger than the lowerbound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting tripped up on the "by definition" bit. What definition? Don't we get the upper and lower bounds from separate filters, and it's technically possible that we have a nonsensical query with self-contradicting filters that end up violating this "definition"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t that be an impossible filter, and since we solve for the solution space of filters, it would be impossible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being impossible means it produces no results, not that it can't be executed by a user. The code still has to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can assert on it anyways, if that would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can assert on it — that's what I'm saying. A query with self-contradicting filters is valid. It's okay to execute it, and Trustfall must not crash on it. We may one day choose to trigger a lint on such queries, but even then, the query is allowed to be executed, and must still produce the correct — empty — result.
Come to think of it, it's probably best to make some query test cases like that. How about these:
{
Number(min: 30, max: 30) {
... on Composite {
value @output
primeFactor @fold @transform(op: "count") @filter(op: ">=", value: ["$min"]) @filter(op: "<=", value: ["$max"])
}
}
}
args:
{
"min": 2,
"max": 3,
}
- This one should return
{ value: 30 }
because the fold count is 3, which is >= 2 and <= 3. - The new fold count optimization shouldn't kick in because we still need to check the other filter.
{
Number(min: 30, max: 30) {
... on Composite {
value @output
primeFactor @fold @transform(op: "count") @filter(op: ">=", value: ["$min"]) @filter(op: "<=", value: ["$max"])
}
}
}
args:
{
"min": 3,
"max": 2,
}
- This one should return no results because the filter conditions are
>= 3 && <= 2
. - The new fold count optimization probably shouldn't kick in here either, for the same reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nitpicks on wording. Writing clear and concise comments is an art worth practicing.
The code change looks broadly correct and the test case changes look great.
// We do not apply the min_fold_count_limit optimization if we have an upper bound, | ||
// in the form of max_fold_count_limit, because if we have a required upperbound, | ||
// then we can't stop at the lower bound, if we are required to observe whether we hit the | ||
// upperbound, it's no longer safe to stop at the lower bound. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a bit hard to read. Can you try rephrasing it a bit, perhaps splitting into two or more sentences with clearer structure?
Co-authored-by: Predrag Gruevski <[email protected]>
Co-authored-by: Predrag Gruevski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nitpicks to make the Rust more idiomatic, otherwise this is good to go.
Co-authored-by: Predrag Gruevski <[email protected]>
… into min_fold_count_limit
&& tagged_fold_count.kind == FoldSpecificFieldKind::Count | ||
}) | ||
}); | ||
let safe_to_skip_part_of_fold = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is safe_to_skip_part_of_fold
too non-specific as to which part we can skip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name isn't great. But see the comment above on an idea how we can skip it entirely.
Meta: try naming things for what they represent and not for what should be done with them. Right now, you tend to give a lot of imperative names like that, but those tend to age poorly because they might end up being used differently than originally intended. Names that describe what the value represents, not how it's going to be used, will age much better. Try thinking of some names like that and maybe put a few ideas in reply if you want feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, to name this variable something better, I think fold_values_not_observed
or fold_values_ignored
would suffice since that's what this boolean represents is that the fold values are not observed, neither the final count of values not the individual values by @fold
'ing on them.
// This optimization is only valid if we know that every | ||
// filter applied to the folded element count is a comparison | ||
// that we can determine the number of elements needed to satisfy | ||
// the filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this portion of this comment doesn't really make sense to me:
a comparison that we can determine the number of elements needed to satisfy the filter.
To simplify, you can out "this optimization is only valid if" and replace it with "every filter on the fold count must ..."
But also, I'm not sure this comment is at the right level of abstraction. This comment makes a general statement about "this optimization" where the optimization is never defined — the function's docs say "further optimizations." It doesn't say anything about this match branch, so it would be hard for the reader to connect all the clues to figure out what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, you're right. Maybe something like If we find a filter that we can't partially evaluate, fail early.
would be a better comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better! Small nit: "partial evaluation" is also the term for an unrelated idea, which might confuse the reader. Perhaps "filter that would require evaluating the fold past its minimum count" or something?
I don't love my suggestion, perhaps you can come up with something even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're saying, If we find a filter that requires us to evaluate the entire fold, fail early.
is better I think since it doesn't require knowing what we are partially evaluating. Although this gives the mistaken impression that every fold count filter that gets here will need to be fully evaluated which isn't necessarily the case. Another try would be If we don't know how many elements of the fold are required to satisfy this filter, fail early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter is good, I like that. "Requires us to evaluate the entire fold" is technically incorrect as you point out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay great, updated to that
// Queries that do not observe the fold count nor any fold contents may be able to | ||
// be optimized by only partially expanding the fold, just enough to check any filters | ||
// that may be applied to the fold count. | ||
// | ||
// For example, if `@filter(op: ">", value: ["$ten"])` is our only filter on the count | ||
// of the fold, we can stop computing the rest of the fold after seeing we have 11 elements. | ||
Some(min_fold_count_limit) if safe_to_skip_part_of_fold => { | ||
iterator.take(*min_fold_count_limit).collect() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth tweaking the implementation here a bit to simplify the logic?
Here's what I noticed: safe_to_skip_part_of_fold = false
with min_fold_count_limit = Some(...)
is the same as safe_to_skip_part_of_fold = true
with min_fold_count_limit = None
. Also, safe_to_skip_part_of_fold
has a pretty awkward name, which is usually a sign that the concept it represents is muddy as well.
In that light, it seems weird that we're splitting the logic so that the caller of this function prepares the two values, but isn't allowed to combine them. Perhaps we can eliminate safe_to_skip_part_of_fold
completely by making the caller turn the min_fold_count_limit
to None
if the fold expansion couldn't be short-circuited?
In that case, the bulk of this comment would be best moved outside into the caller as well, since that's where most of the logic is on whether this optimization can be applied. This function just ends up doing what it's told.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I removed this if safe_to_skip_part_of_fold
from collect_fold()
and put it in the caller, and combined the boolean checks, this way collect_fold()
only has to execute on the option, it doesn't have to deal with extra conditions if the option already has one built in.
&& tagged_fold_count.kind == FoldSpecificFieldKind::Count | ||
}) | ||
}); | ||
let safe_to_skip_part_of_fold = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name isn't great. But see the comment above on an idea how we can skip it entirely.
Meta: try naming things for what they represent and not for what should be done with them. Right now, you tend to give a lot of imperative names like that, but those tend to age poorly because they might end up being used differently than originally intended. Names that describe what the value represents, not how it's going to be used, will age much better. Try thinking of some names like that and maybe put a few ideas in reply if you want feedback?
Stop expanding
@fold
early if the fold contains no@output
fields and the count of the fold is not outputted, while also not invaliding other filters by skipping the rest of the iterator such as filtering for an upperbound. (@filter(< 10)
)