Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mocking the system clock #5123

Merged
merged 3 commits into from
Nov 11, 2021
Merged

Mocking the system clock #5123

merged 3 commits into from
Nov 11, 2021

Conversation

yordanmadzhunkov
Copy link
Contributor

@yordanmadzhunkov yordanmadzhunkov commented Nov 3, 2021

I accidentally invoke everybody on the review. Feel free to leave, if you feel there are enough colleagues already invoked. For those who want to participate, I will explain the motivation:

We want to test our code using fuzz tests, because it provides benefits, using like random input

  • to maximize code coverage
  • identify bugs in the code and recording the randomized input, so later a developer can replay/inspect that specific test

In order to get those benefits, we need repeatable tests with require high level of determinism. Putting the Utc::now in blockchain header and computing hash is not deterministic. There are other source of not-deterministic behaviour like threads, random seed which will not be addressed in this review.

The goal of this PR is to make it possible to inject values into system clock before invoking the production code. The actual code in production should have no idea that we are mocking the system clock.

Mocking the system clock.

@matklad
Copy link
Contributor

matklad commented Nov 3, 2021

Oh, fascinating, we are just having an interesting discussion of what time we should use in the first place at #5097 cc @pmnoxx

core/primitives/src/time.rs Outdated Show resolved Hide resolved
core/primitives/src/time.rs Outdated Show resolved Hide resolved
core/primitives/src/time.rs Outdated Show resolved Hide resolved
core/primitives/src/time.rs Outdated Show resolved Hide resolved
core/primitives/src/time.rs Outdated Show resolved Hide resolved
core/primitives/src/time.rs Outdated Show resolved Hide resolved
core/primitives/src/time.rs Outdated Show resolved Hide resolved
core/primitives/src/time.rs Outdated Show resolved Hide resolved
@matklad
Copy link
Contributor

matklad commented Nov 3, 2021

🤔 I have mixed feelings here.

On the positive side, yess, mockable/virtualized time is an essential feature of the system, and we absolutely must have it. That being said, I have a bunch of design-level doubts regarding this specific implementation.

In an ideal world, for virtualizing time I would love us to have just a single call to "get current time" at the start of the event loop, and then to pass the current time as a parameter to any function that needs it. That way, we don't even need to mock the time, we just pass it into the system as a parameter.

Obviously, in our code base we are calling ::now all over the place, and that ivory tower architecture isn't applicable. Still, I guess as a first step I would prefer to see a refactor which just minimizes the number of calls to now, and replaces them with taking now: Instant as a parameter.

Things like

            timer: DoomslugTimer {
                started: Instant::now(),
                last_endorsement_sent: Instant::now(),

look quite suboptimal to me -- logically, this is a single point in time.

Specific design issues with the current solution:

  1. My understanding is that, to use the mocking infra, one pushes several time moments to the singleton, which are then popped via now_or_mocked calls. The problem here is that it creates very tight coupling between tests and implementations. For example, if I refactor the above code to call now once, I can subtly break the tests, because the sequence of calls now shifts.

  2. Similarly, the logic is "if we have mocked time, return it, otherwise return a real time". This makes it impossible to find the cases where time should be mocked, but isn't. Ie, if I add a new ::now call, the last one now becomes silently unmocked. A better logic would be "if time is mocked { get mocked instance or panic } else { return real time}". Ie, if you mock time once, you should always mock it. For both 1) and 2), I think a better API would be not to specify the the sequence of time up-front, but a way to directly control current time. Something a-la https://docs.rs/tokio/1.13.0/tokio/time/fn.pause.html and https://docs.rs/tokio/1.13.0/tokio/time/fn.advance.html.

  3. Current impl mocks time unconditionally and adds a lot of runtime logic to each call of now (mutex lock-unlock). I am not sure about the performance implications, but the runtime complexity obviously goes up. I suggest implementing "fast path" for unmocked time. We can do this by hiding mocking under compile-time feature flag or (I'd probably default to this) very simple runtime check, like a relaxed load of atomic bool.

  4. We mock our own time, but we do nothing with time used elsewhere. As a specific example, the following sleep is still a real sleep:
    https://github.com/near/nearcore/pull/5123/files#diff-230419b9bd9ab591be1524325d3c9ef96c379196c067d5783d2fcbaa66c88b8fL1752
    similarly, we don't seem to plug into Tokio's advance / pause. Not sure what's the right solution here

  5. This currently makes mocking thread-local, but we use quite a bit of threading in various parts of code. The solution here is simple in theory, but a chore to implement -- rather than relying on a singleton global time function, pass in the source of time explicitly throughout.

Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the title of your PR to a descriptive one. Thanks!

@yordanmadzhunkov yordanmadzhunkov changed the title yordan/mock-time-wip Mocking the system clock Nov 4, 2021
@yordanmadzhunkov
Copy link
Contributor Author

@matklad
I also have mixed feelings.
Specific design issues with the current solution:

  1. You understood it correctly. Refactoring the example code to call now once, I can break the tests, (only if you check number of times timer is called)

  2. I will implement
    "if time is mocked { get mocked instance or panic } else { return real time}"

  3. Good point. Looks like the real clock ::now method also has build in mutex. At this point I will not be worried about performance issues

  4. I am also not sure, what is the right approach

  5. The motivation to make time mocking local to specific thread is the fact that test run in parallel in separate threads. If time mocking is not thread specific, different tests interfere between each other.

@yordanmadzhunkov
Copy link
Contributor Author

@matklad
Actually, on point 4. I think the correct approach is to create a clock interface and use dependency injection. This way, we can mock tests and in production we can simply inject real system clock. This will require a lot of work to pass the instance to the clock between functions, classes and etc.

@pmnoxx
Copy link
Contributor

pmnoxx commented Nov 4, 2021

I like the idea of adding mocking to time. While writing tests I had to add a bit of code to support testing code, which gets executed rarely, having mocks for time would be useful.

However:

  • A test usually spawns multiple threads. Most likely you will have to change code that starts all actors in the code, in such a way, that child thread uses the same time mock as it's parent.
    I would use something like:
thread_local! {
    pub static Singletor: Optional<Arc<RefCell<MockClockPerThread>>>> = ...;
};

Everytime you start new actor, yo would make it point to the same Arc<RefCell>>` as it parent.

  • We also don't mock time for Actix, timeouts, etc. It would be great to be able to add an option to increase speed of progression of the time to test timeouts, etc.

  • I'm starting to think that it may be a good idea refactor our code to use one library for time. That would make code much cleaner, we wouldn't have to mock different type of times.
    I would be in favor of having our own clock.

struct Clock { 
    // number of ns since 1970 / utc time
    ns: u64
}
struct Duration { 
    // difference in ns
    ns: i64
}

function:

  • now()
  • duration_since(...)
  • sub(...) // singled difference
  • we can add BorshSerialize / BorshDeserialize / this can be used to replace u64 as time storage.

This would allow us to use single source of time for all code, and also it would be easier to mock.
Sometimes we use u64, sometimes Instant::now, Datetime<Utc>::new(), etc.

Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yordanmadzhunkov after a discussion with @matklad we agree that the design in this implementation is not optimal. However, to not get blocked on this for fuzzing, I suggest that you figure out the minimal amount of change we need to make the fuzzing possible (deterministic). There are likely a few places where we can refactor the functions to have time passed in as a parameter so that we can avoid doing a big change like this at once.

runtime/runtime-params-estimator/src/main.rs Outdated Show resolved Hide resolved
@@ -4,6 +4,7 @@ use crate::testbed::RuntimeTestbed;
use indicatif::{ProgressBar, ProgressStyle};
use near_crypto::{InMemorySigner, KeyType};
use near_primitives::hash::CryptoHash;
use near_primitives::time::Instant;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here: basically, all the code in runtime-params-estimator should be exempt from time mocking. This is essentially benchmarking code, and it has its' own requrenments for time, unrelated to the rest of the code.

Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock, but there are three fixes we need to do here:

  • remove unneded cargo dependency
  • remove changes to the estimator
  • make sure that is_mock can't leak between tests

The thread_local suggestion is optional -- I do like to minimize the amount of machinery we use, but as this is impl details, this doesnt' really matter.

Two other nice-to-haves:

  • are we sure that this change is enough to enable the fuzzing we want? Given that we are not mocking tokio's time or thread::sleeps I wouldn't be surprised if this isn't actually enough.
  • are we sure that this is the minimal change? From my conversation with @bowenwang1996, I recall the idea was to just refactor some specific part of the code-base to take time as an argument, without introducing thread-locals, but also without fixing every usage of time. If that turns out to be infeasible, I think it's OK to do what we are doing here, but, still, let's try to minimize the blast radius of the design which we agree is not the long-term one.

core/primitives/src/time.rs Outdated Show resolved Hide resolved
core/primitives/src/time.rs Outdated Show resolved Hide resolved
core/primitives/src/time.rs Outdated Show resolved Hide resolved
core/primitives/Cargo.toml Outdated Show resolved Hide resolved
core/primitives/src/time.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve to unblock, but please address Aleksey's comments.

test-utils/loadtester/src/stats.rs Outdated Show resolved Hide resolved
test-utils/loadtester/src/stats.rs Outdated Show resolved Hide resolved
@pmnoxx
Copy link
Contributor

pmnoxx commented Nov 9, 2021

I created my own implementation Clock for purpose of mocking time. Please, consider an alternative:
#5175

});
}

pub fn utc(&mut self) -> DateTime<chrono::Utc> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we change signature to pub fn utc_now() -> DateTime<chrono::Utc>

  • self is not used - This will allow us to use Clock::utc_now() without having to create Clock {}
  • changing utc to utc_now should make code more readable. Without prior knowledge we will know that we are getting current time.

Copy link
Contributor

@pmnoxx pmnoxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The current design will work fine to mock time for tests executing on the same thread. Though, it's easy for someone to start using Instant::now() or Utc::now() by accident, that's why it may be a good idea to introduce one wrapper.

Though, this is not a blocking issue.

})
}

pub fn instant(&mut self) -> Instant {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we change signature to pub fn instant_now() -> DateTime<chrono::Utc>

@pmnoxx
Copy link
Contributor

pmnoxx commented Nov 10, 2021

@yordanmadzhunkov I pushed a few huge changes to chain/network, you may have to resolve merge conflicts.

Good job on the PR.
I look forward to seeing the next PR, where you add propagation of Clock to throughout our code base.
After that we can work on cleaning up the code, to make sure we only have one structure representing time instead of 4.

@pmnoxx
Copy link
Contributor

pmnoxx commented Nov 10, 2021

@chefsale @frol Can you take a look?

Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Couple of more nitpicly coments:

  • some pub can still be removed
  • Rather than making the user import MockClockGuard, the API could look like this:
let _guard: MockClockGuard = Clock::mock();

🤔 thinking more about this, I think it even makes sense to make add_utc and such methods of the guard object, rather than of the Clock itself.

Anyway, I don't think we should try to polish Rust API in this PR (we must polish it sometime later though, time mocking is a fundamental fascility used thourgouht the codebase, its importatn that the API is just right, otherwise changing it later would be a pain).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants