Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add faillible version of size_hint to properly handle recursive structures #185

Merged
merged 2 commits into from
Sep 20, 2024

Conversation

sosthene-nitrokey
Copy link
Contributor

The default implementation forwards to the current size_hint, so that it can be relied upon also for structures that still use the default implementation.

The default implementation of size_hint is not changed, so that it does not break existing implementations

Fix #144

@fitzgen
Copy link
Member

fitzgen commented Jul 25, 2024

I'd have hoped that LLVM would inline the recursive size_hint calls up to the depth limit and effectively boiled it all away at compile time. Can you try adding inline(always) to the derived size_hint methods and seeing if that triggers the hoped-for behavior? I'd prefer not to complicate the trait interface anymore than it already is, if possible.

@fitzgen
Copy link
Member

fitzgen commented Jul 25, 2024

My other thought is that, if we do need to define a new size_hint method, we don't need to make the whole thing fallible, just the potentially-recursive parts fallible. That means we can still provide a minimum bound (not a maximal minimal bound, but a minimal bound) even when we've reached the recursion depth. This would involve splitting the size hint into something like my_size_hint and members_size_hint (terrible names) where the former includes only the things that are guaranteed never to be recursive and the latter includes anything that might be recursive. Sort of like in a GC the difference between tracing your own shallow edges, vs recursively tracing the transitive closure of everything reachable from yourself, if that makes sense.

@sosthene-nitrokey
Copy link
Contributor Author

sosthene-nitrokey commented Jul 26, 2024

It's unlikely that LLVM would be able to inline a function call that is that slow, even though it could just be a constant calculation,
and it would not happen in all optimization profiles. I just tested and #[inline(always)] does not solve the issue, even though it makes the computation faster.

Regarding keeping the lower bound: yes that is possible, by keeping it in the error type, but that would not be that useful, since this would still just return the first value that fails, and it would also complicate the implementation.

@sosthene-nitrokey
Copy link
Contributor Author

Rebased on top of main.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo some inline comments that should be addressed before this can land. Thanks!

Comment on lines 14 to 17
#[inline]
fn size_hint(depth: usize) -> (usize, Option<usize>) {
<A as Arbitrary<'a>>::size_hint(depth)
Self::faillible_size_hint(depth).unwrap_or_default()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old size_hint can be default-implemented with fallible_size_hint or the default right? No need to repeat this incantation in every implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, because of backwards compatibility.

If we make the default size_hint delegate to faillible_size_hint, this means we can't provide a default implementation for faillible_size_hint that calls back to size_hint, otherwise there would be an infinite loop.

We could have a default faillible_size_hint implementation that returns Ok((0, None)), but then, when libraries migrate to calling faillible_size_hint, it will break for out of date dependencies that only implement size_hint.

src/foreign/core/num.rs Outdated Show resolved Hide resolved
Comment on lines +237 to +239
/// ## How to implement this
///
/// If the size hint calculation is a trivial constant and does not recurse into any other `size_hint` call, you should implement it in `size_hint`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not muddy the waters by suggesting that sometimes people use the old size hint method and sometimes the new one.

Let's deprecate the old one and always recommend using the new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #185 (comment)

We could say that this is "just" an optimization and that it's ok to break it with a minor release.
Arguably a lot of users use the derive macro or don't implement size_hint, so it could be justified. And marking it as deprecated would make this pretty visible to downstream.

I also don't like the duplicate API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other comment about defaults and backwards compat makes sense, thanks for pointing that out.

That said, we should suggest that everyone move over to try_size_hint and ignore size_hint and/or implement it with try_size_hint().unwrap_or_default() and then avoid mentioning the deprecated method again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things with my approach is that it does not require dependents to change what they call.

For example libfuzzer_sys does not care about the size hint being fallible, so it would just have been able size_hint and let the implementation deal with recursion internally.

I think the ideal setup would be to only have try_size_hint be part of the trait, and have an infallible size_hint in an extension trait (so users are not given the opportunity to override it) that does not even take a depth parameter for use in libfuzzer_sys for example. But that seems like adding more complexity and bring in naming conflicts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backward compatibility issues will also mean that users will need to implement size_hint to call try_size_hint instead, and have to silence the deprecated warning to do so.

I think the issue this fixes most likely only affects users of the derive macros, and therefore I wanted to minimize the impact on other users. Especially given that a lot of types that can implement arbitrary will likely implement it on single, non recursive types and therefore they could still implement only size_hint.

Also, it's still possible to add deprecation attributes in the future if this does not happen to be the case.

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@sosthene-nitrokey sosthene-nitrokey force-pushed the wide-recursive branch 2 times, most recently from 8562300 to a8907c9 Compare September 13, 2024 16:14
…tures

The default implementation forwards to the current size_hint, so that
it can be relied upon also for structures that still use the default implementation.

The default implementation of `size_hint` is not changed, so that it does not break existing
implementations
src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@fitzgen fitzgen merged commit 1cc0e46 into rust-fuzz:main Sep 20, 2024
5 checks passed
maurer added a commit to maurer/arbitrary that referenced this pull request Jan 7, 2025
`Arbitrary` had a new `try_size_hint` method added in rust-fuzz#185 - while this
was semver safe for `arbitrary` because a default implementation was
provided which forwarded along, the new version of `arbitrary-derive`
which follows the same versioning as `arbitrary` started generating
`try_size_hint` methods, which causes v1.2.3 to fail to compile, as it
tries to grab `arbitrary-derive` "1.2.3" from crates.io, which gets
responded to with a "1.4.0" version.

Given that:
1. It is reasonable to evolve your traits this way, and it doesn't break
   semver on the base package
2. The patchlevel is specified on `arbitrary-derive`, so it seems the
   intent is to require a matching version
3. You likely want to keep the same version for your proc-macro as your
   main crate

This proposes using exact equality in the `Cargo.toml` so that future
trait updates of this nature won't break the ability to build older
packages via Cargo.
maurer added a commit to maurer/arbitrary that referenced this pull request Jan 7, 2025
`Arbitrary` had a new `try_size_hint` method added in rust-fuzz#185 - while this
was semver safe for `arbitrary` because a default implementation was
provided which forwarded along, the new version of `arbitrary-derive`
which follows the same versioning as `arbitrary` started generating
`try_size_hint` methods, which causes v1.2.3 to fail to compile, as it
tries to grab `arbitrary-derive` "1.2.3" from crates.io, which gets
responded to with a "1.4.0" version.

Given that:
1. It is reasonable to evolve your traits this way, and it doesn't break
   semver on the base package
2. The patchlevel is specified on `arbitrary-derive`, so it seems the
   intent is to require a matching version
3. You likely want to keep the same version for your proc-macro as your
   main crate

This proposes using exact equality in the `Cargo.toml` so that future
trait updates of this nature won't break the ability to build older
packages via Cargo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

size_hint computation time blows up for "wide" recursive enums and structs
2 participants