-
Notifications
You must be signed in to change notification settings - Fork 46
Add type-driven fuzzer #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK. Will ACK the top commit after the likely rebase of #104 that is waiting on response to review.
I looked into optimizing the smart fuzz target by getting rid of the loop in place of looking up available fragments for a set of types and properties. I could get a 3x increase in execution per second, with so far (after running both targets a few hours) a comparable coverage.
It's available as a new commit there https://github.com/darosior/bitcoin/tree/miniscript_pr_105 (on top of this branch added to a rebased version of bitcoin/bitcoin#24149).
The current mapping definition may be a bit clunky, but generally what do you think of this approach?
int candidates = 0; | ||
|
||
// Rules for constructing leaf nodes | ||
if ("Bzud"_mst << typ && ++candidates && !(fragcode--)) return {{Fragment::JUST_0}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit strange that any type is considered a subtype of the empty type by operator<<
. (just a random comment, not asking for any change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way to think about it is that types are sets of potential values a variable can take. A subtype is called so because it corresponds to a subset of the potential values. Since the empty type here corresponds to every possible valid expression, every other type is indeed a subset of it.
I added a commit to https://github.com/darosior/bitcoin/tree/miniscript_pr_105 which aims to cover more properties ( |
@darosior I've been working on a generalization of the approach you're using here, to build a full table of the form "type X can be constructed using any recipe of the form "fragment Y with subtypes Z1/Z2/Z3/...", for all possible types X, automatically using the existing ComputeType rules. These are the rules I'm inferring; can you see anything obviously missing?
|
Interesting observation: many of these rules are bad; e.g. |
Here is the pruned list of rules (removing all rules which require a subnode of type that is non-constructible, and removing all rules that contruct a type that is never needed in order to produce a
|
8ee9655
to
9329230
Compare
Rebased on top of #104, and switched to an approach inspired by @darosior's mentioned above. The code now infers now automatically (based on the ComputeType behavior) rules for which fragments/subtypes to use for constructing any desired type. This should result in constructing type-valid miniscript nodes for 100% of all fuzzer iterations. |
1805865
to
d14fd08
Compare
Rebased. |
I plan to get to this soon.
------- Original Message -------
Le mardi 5 avril 2022 à 4:47 PM, Pieter Wuille ***@***.***> a écrit :
… Rebased.
—
Reply to this email directly, [view it on GitHub](#105 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AFLK3F7DF57UQTJZE737Y3DVDRHBZANCNFSM5PRHZMNA).
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I think you are missing
I manually reconstructed the list of constructed type in the same fashion from my (incorrect btw) mapping in https://github.com/darosior/bitcoin/tree/miniscript_pr_105. I did not find any other missing instance, and all the "recipes" from your curated list are comprehensive as far as my own list goes. |
@darosior Nothing in my list needs |
Oh, right. No, i don't. But we should fuzz them too as they are possibly-top-level fragment that could be used? |
@darosior That's unnecessary - they can all be constructed using just the To be clear, the fuzzer aims to produce either a B, a V, a K, or a W. That together covers every potentially type-valid miniscript expression. It's just that the rules for producing those may require more specific types, and those are the ones listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's awesome. ACK 9a1c84f.
It makes sense and the bit of complexity introduced for the smarter node generation is well documented (thanks for that). From my testing there are still some small gaps in the coverage after a few hours of run, but i think it's due to the need for more cycles in order to pas the IsValidTopLevel()
check:
miniscript/bitcoin/test/fuzz/miniscript_random.cpp
Lines 393 to 394 in fd55524
// The rest of the checks only apply when testing a valid top-level script. | |
if (!node->IsValidTopLevel()) return; |
* are added. */ | ||
std::sort(types.begin(), types.end()); | ||
|
||
/** Helper function to determine whether a recipe a is a superset of recipe b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the logic, but i find the term "superset" from the comment confusing. This function will return true
for A = {frag, "B"_mst}
and B = {frag, "Bd"_mst}
(since "Bd"_mst
is a subtype of "B"_mst
. A here is not a superset of B.
I don't have any good suggestion. Maybe reusing the wording from the call site:
Check if the recipe a is "super" to b (i.e., same fragment type, but the same [...]
@darosior I think the cause of the gaps may be a bug I just found: the order of the types in the recipe is used in reverse order when constructing miniscript objects - which often results in not obtaining the desired type at all. Fixing. |
Arf, i hadn't looked for that.. Will give the new coverage when you push the fix.
…-------- Original Message --------
On Apr 13, 2022, 7:45 PM, Pieter Wuille wrote:
***@***.***(https://github.com/darosior) I think the cause of the gaps may be a bug I just found: the order of the types in the recipe is used in reverse order when constructing miniscript objects - which often results in not obtaining the desired type at all. Fixing.
—
Reply to this email directly, [view it on GitHub](#105 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AFLK3FYVJ63X43KO2N47TRDVE4BZZANCNFSM5PRHZMNA).
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Pushed a new version. Changes:
|
Diff: diff --git a/bitcoin/test/fuzz/miniscript_random.cpp b/bitcoin/test/fuzz/miniscript_random.cpp
index 6005391..f6115d4 100644
--- a/bitcoin/test/fuzz/miniscript_random.cpp
+++ b/bitcoin/test/fuzz/miniscript_random.cpp
@@ -313,7 +313,10 @@ std::optional<NodeInfo> ConsumeNodeStable(FuzzedDataProvider& provider) {
/* This structure contains a table which for each "target" Type a list of recipes
* to construct it, automatically inferred from the behavior of ComputeType.
- * Each recipe is a Fragment together with a list of required types for its subnodes.
+ * Note that the Types here are not the final types of the constructed Nodes, but
+ * just the subset that are required. For example, a recipe for the "Bo" type
+ * might construct a "Bondu" sha256() NodeInfo, but cannot construct a "Bz" older().
+ * Each recipe is a Fragment together with a list of required types for its subnodes.
*/
struct SmartInfo
{
@@ -322,7 +325,7 @@ struct SmartInfo
SmartInfo()
{
- /* Construct a set of interesting types to reason with (sections of BKVWzondu). */
+ /* Construct a set of interesting type requirements to reason with (sections of BKVWzondu). */
std::vector<Type> types;
for (int base = 0; base < 4; ++base) { /* select from B,K,V,W */
Type type_base = base == 0 ? "B"_mst : base == 1 ? "K"_mst : base == 2 ? "V"_mst : "W"_mst;
@@ -346,17 +349,12 @@ struct SmartInfo
}
}
- /* Sort the types. That has the effect that if type A is a subtype of type B,
- * type B will always be ordered before type A. As we'll be constructing recipes
- * using these types, in order, in what follows, more general recipes will
- * be constructed before more specific ones, and we can avoid the need to
- * retroactive delete earlier more specific recipes when more general ones
- * are added. */
- std::sort(types.begin(), types.end());
-
- /** Helper function to determine whether a recipe a is a superset of recipe b
- * (i.e., same fragment type, but the same or weaker type requirements for each
- * of its subnodes). */
+ /* We define a recipe a to be a super-recipe of recipe b if they use the same
+ * fragment, the same number of subexpressions, and each of a's subexpression
+ * types is a supertype of the corresponding subexpression type of b.
+ * Within the set of recipes for the construction of a given type requirement,
+ * no recipe should be a super-recipe of another (as the super-recipe is
+ * applicable in every place the sub-recipe is, the sub-recipe is redundant). */
auto is_super_of = [](const recipe& a, const recipe& b) {
if (a.first != b.first) return false;
if (a.second.size() != b.second.size()) return false;
@@ -366,16 +364,23 @@ struct SmartInfo
return true;
};
+ /* Sort the type requirements. Subtypes will always sort later (e.g. Bondu will
+ * sort after Bo or Bu). As we'll be constructing recipes using these types, in
+ * order, in what follows, we'll construct super-recipes before sub-recipes.
+ * That means we never need to go back and delete a sub-recipe because a
+ * super-recipe got added. */
+ std::sort(types.begin(), types.end());
+
// Iterate over all possible fragments.
for (int fragidx = 0; fragidx <= int(Fragment::MULTI); ++fragidx) {
- int sub_count = 0; //!< How minimum number of child nodes this recipe has
- int sub_range = 1; //!< How many different consecutive couns of child nodes this recipe has.
+ int sub_count = 0; //!< The minimum number of child nodes this recipe has.
+ int sub_range = 1; //!< The maximum number of child nodes for this recipe is sub_count+sub_range-1.
size_t data_size = 0;
size_t n_keys = 0;
uint32_t k = 0;
Fragment frag{fragidx};
- // Figure out how many subnodes the recipe needs, and k/data/keys args to pass to ComputeType. */
+ // Based on the fragment, determine #subs/data/k/keys to pass to ComputeType. */
switch (frag) {
case Fragment::PK_K:
case Fragment::PK_H:
@@ -421,7 +426,7 @@ struct SmartInfo
sub_count = 3;
break;
case Fragment::THRESH:
- // Thresh is run for 1 and 2 arguments. Larger numbers use ad-hoc code to extend.
+ // Thresh logic is executed for 1 and 2 arguments. Larger numbers use ad-hoc code to extend.
sub_count = 1;
sub_range = 2;
k = 1;
@@ -445,23 +450,19 @@ struct SmartInfo
if ((res << "K"_mst) + (res << "V"_mst) + (res << "B"_mst) + (res << "W"_mst) != 1) continue;
recipe entry{frag, subt};
+ auto super_of_entry = [&](const recipe& rec) { return is_super_of(rec, entry); };
// Iterate over all supertypes of res (because if e.g. our selected fragment/subnodes result
// in a Bondu, they can form a recipe that is also applicable for constructing a B, Bou, Bdu, ...).
for (Type s : types) {
if ((res & "BKVWzondu"_mst) << s) {
auto& recipes = table[s];
- // Now determine if we already have a recipe that's "super" to this, because then adding
- // a more specific one is unnecessary.
- bool have_super = false;
- for (const auto& recipe : recipes) {
- if (is_super_of(recipe, entry)) {
- have_super = true;
- break;
- }
+ // If we don't already have a super-recipe to the new one, add it.
+ if (!std::any_of(recipes.begin(), recipes.end(), super_of_entry)) {
+ recipes.push_back(entry);
}
- if (!have_super) recipes.push_back(entry);
}
}
+
if (subs <= 2) break;
}
if (subs <= 1) break;
@@ -499,8 +500,8 @@ struct SmartInfo
/* Find which types are constructible. A type is constructible if there is a leaf
* node recipe for constructing it, or a recipe whose subnodes are all constructible.
* Types can be non-constructible because they have no recipes to begin with,
- * can only be constructed using recipes that involve otherwise non-constructible
- * types, or because they require infinite recursion. */
+ * because they can only be constructed using recipes that involve otherwise
+ * non-constructible types, or because they require infinite recursion. */
std::set<Type> constructible_types{};
auto known_constructible = [&](Type type) { return constructible_types.count(type) != 0; };
// Find the transitive closure by adding types until the set of types does not change.
@@ -629,9 +630,14 @@ std::optional<NodeInfo> ConsumeNodeSmart(FuzzedDataProvider& provider, Type type
/**
* Generate a Miniscript node based on the fuzzer's input.
+ *
+ * - ConsumeNode is a function object taking a Type, and returning an std::optional<NodeInfo>.
+ * - root_type is the required type properties of the constructed NodeRef.
+ * - strict_valid sets whether ConsumeNode is expected to guarantee a NodeInfo that results in
+ * a NodeRef whose Type() matches the type fed to ConsumeNode.
*/
template<typename F>
-NodeRef GenNode(F ConsumeNode, Type root_type = ""_mst) {
+NodeRef GenNode(F ConsumeNode, Type root_type = ""_mst, bool strict_valid = false) {
/** A stack of miniscript Nodes being built up. */
std::vector<NodeRef> stack;
/** The queue of instructions. */
@@ -647,7 +653,11 @@ NodeRef GenNode(F ConsumeNode, Type root_type = ""_mst) {
auto subtypes = std::move(node_info)->subtypes;
todo.back().second = std::move(node_info);
todo.reserve(todo.size() + subtypes.size());
- for (auto type : subtypes) todo.emplace_back(type, std::nullopt);
+ // As elements on the todo stack are processed back to front, construct
+ // them in reverse order (so that the first subnode is generated first).
+ for (size_t i = 0; i < subtypes.size(); ++i) {
+ todo.emplace_back(*(subtypes.rbegin() + i), std::nullopt);
+ }
} else {
// The back of todo has fragment and number of children decided, and
// those children have been constructed at the back of stack. Pop
@@ -671,7 +681,11 @@ NodeRef GenNode(F ConsumeNode, Type root_type = ""_mst) {
node = MakeNodeRef(info.fragment, std::move(info.keys), info.k);
}
// Verify acceptability.
- if (!node || !node->IsValid() || !(node->GetType() << type_needed)) return {};
+ if (!node || !(node->GetType() << type_needed)) {
+ assert(!strict_valid);
+ return {};
+ }
+ if (!node->IsValid()) return {};
// Move it to the stack.
stack.push_back(std::move(node));
todo.pop_back();
@@ -845,5 +859,5 @@ FUZZ_TARGET_INIT(miniscript_random_smart, initialize_miniscript_random)
FuzzedDataProvider provider(buffer.data(), buffer.size());
TestNode(GenNode([&](Type needed_type) {
return ConsumeNodeSmart(provider, needed_type);
- }, PickValue(provider, BASE_TYPES)), provider);
+ }, PickValue(provider, BASE_TYPES), true), provider);
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK a9e8533
Diff looks good to me, two nits but nothing important. The coverage is now outstanding.
// Remove all recipes which involve non-constructible types. | ||
type_it->second.erase(std::remove_if(type_it->second.begin(), type_it->second.end(), | ||
[&](const recipe& rec) { | ||
return !std::all_of(rec.second.begin(), rec.second.end(), known_constructible); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There is a spurious space at the end of this line.
// As elements on the todo stack are processed back to front, construct | ||
// them in reverse order (so that the first subnode is generated first). | ||
for (size_t i = 0; i < subtypes.size(); ++i) { | ||
todo.emplace_back(*(subtypes.rbegin() + i), std::nullopt); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Seems simpler to just keep the loop in order here
for (auto type : subtypes) todo.emplace_back(type, std::nullopt);
And instead iterate back to front when popping the childs off the stack below?
// Gather children from the back of stack. Since they are added back to front to the stack, iterate the same way here.
std::vector<NodeRef> sub;
sub.reserve(info.n_subs);
for (size_t i = 0; i < info.n_subs; ++i) {
sub.push_back(std::move(*(stack.end() - 1 - i)));
}
Anyways not a big deal, and good catch it completely slipped my review when we were working on the "not smart" version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but didn't as that'd mean the data in the fuzzer seed for the last subexpression would come first. Not that it really matters, but I found it nicer to do everything left to right.
// Sort recipes for determinism, and place those using fewer subnodes first. | ||
// This avoids runaway expansion (when reaching the end of the fuzz input, | ||
// all zeroes are read, resulting in the first available recipe being picked). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we need to be wary not to introduce recipes with "inter dependencies" as the ones with fewer subnodes, right?
For instance if we were to only read zeroes, and the smallest recipe to create a Bd
was and_b(Bd,Wd)
then we'd have an infinite loop. As far as i can tell they would not be removed above as Bd
would be known_constructible
by other means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed - if the smallest one was recursive like that, we'd have a problem. That happens to not be the case for any type, currently.
I think if the types/fragments ever change, and this were to become the case, that can still be addressed then. It should be very obvious once it happens, I think.
c65113d Rename miniscript_random fuzz test -> miniscript (Pieter Wuille) f990c4b Add miniscript_script fuzz test (Pieter Wuille) 6f49714 Add miniscript_string fuzz test (Pieter Wuille) 2cf7c86 Switch pubkey encoding to 2 hex characters (their idx) (Pieter Wuille) b4c84ba Simplify miniscript_random initialization (Pieter Wuille) Pull request description: Builds on top of #105. This adds script and string roundtrip fuzz tests, all in the same source file (reusing some logic), and unified the names of the tests: * miniscript_stable (the old miniscript_random) * miniscript_smart (the old miniscript_random_smart) * miniscript_script (round-trip from/to script) * miniscript_string (round-trip from/to string) ACKs for top commit: darosior: ACK c65113d Tree-SHA512: e348506996a24d28ac2c282e59b6cb8b1c6dd3e2d89527ea5a3c91a62518d8c54d7c7242a03e78116324a6d28703b162d75b119dba6a0d6975c5c4fd7f471fa3
Builds on top of #104.
This adds a second random node fuzzer, which uses type requirements to more efficiently explore the space of valid miniscripts. The other one is kept, as it is less complicated and thus easier to make sure it doesn't miss anything.