Mocking the system clock #5123

yordanmadzhunkov · 2021-11-03T13:19:42Z

I accidentally invoke everybody on the review. Feel free to leave, if you feel there are enough colleagues already invoked. For those who want to participate, I will explain the motivation:

We want to test our code using fuzz tests, because it provides benefits, using like random input

to maximize code coverage
identify bugs in the code and recording the randomized input, so later a developer can replay/inspect that specific test

In order to get those benefits, we need repeatable tests with require high level of determinism. Putting the Utc::now in blockchain header and computing hash is not deterministic. There are other source of not-deterministic behaviour like threads, random seed which will not be addressed in this review.

The goal of this PR is to make it possible to inject values into system clock before invoking the production code. The actual code in production should have no idea that we are mocking the system clock.

Mocking the system clock.

matklad · 2021-11-03T13:25:19Z

Oh, fascinating, we are just having an interesting discussion of what time we should use in the first place at #5097 cc @pmnoxx

core/primitives/src/time.rs

matklad · 2021-11-03T14:42:07Z

🤔 I have mixed feelings here.

On the positive side, yess, mockable/virtualized time is an essential feature of the system, and we absolutely must have it. That being said, I have a bunch of design-level doubts regarding this specific implementation.

In an ideal world, for virtualizing time I would love us to have just a single call to "get current time" at the start of the event loop, and then to pass the current time as a parameter to any function that needs it. That way, we don't even need to mock the time, we just pass it into the system as a parameter.

Obviously, in our code base we are calling ::now all over the place, and that ivory tower architecture isn't applicable. Still, I guess as a first step I would prefer to see a refactor which just minimizes the number of calls to now, and replaces them with taking now: Instant as a parameter.

Things like

            timer: DoomslugTimer {
                started: Instant::now(),
                last_endorsement_sent: Instant::now(),

look quite suboptimal to me -- logically, this is a single point in time.

Specific design issues with the current solution:

My understanding is that, to use the mocking infra, one pushes several time moments to the singleton, which are then popped via now_or_mocked calls. The problem here is that it creates very tight coupling between tests and implementations. For example, if I refactor the above code to call now once, I can subtly break the tests, because the sequence of calls now shifts.
Similarly, the logic is "if we have mocked time, return it, otherwise return a real time". This makes it impossible to find the cases where time should be mocked, but isn't. Ie, if I add a new ::now call, the last one now becomes silently unmocked. A better logic would be "if time is mocked { get mocked instance or panic } else { return real time}". Ie, if you mock time once, you should always mock it. For both 1) and 2), I think a better API would be not to specify the the sequence of time up-front, but a way to directly control current time. Something a-la https://docs.rs/tokio/1.13.0/tokio/time/fn.pause.html and https://docs.rs/tokio/1.13.0/tokio/time/fn.advance.html.
Current impl mocks time unconditionally and adds a lot of runtime logic to each call of now (mutex lock-unlock). I am not sure about the performance implications, but the runtime complexity obviously goes up. I suggest implementing "fast path" for unmocked time. We can do this by hiding mocking under compile-time feature flag or (I'd probably default to this) very simple runtime check, like a relaxed load of atomic bool.
We mock our own time, but we do nothing with time used elsewhere. As a specific example, the following sleep is still a real sleep:
https://github.com/near/nearcore/pull/5123/files#diff-230419b9bd9ab591be1524325d3c9ef96c379196c067d5783d2fcbaa66c88b8fL1752
similarly, we don't seem to plug into Tokio's advance / pause. Not sure what's the right solution here
This currently makes mocking thread-local, but we use quite a bit of threading in various parts of code. The solution here is simple in theory, but a chore to implement -- rather than relying on a singleton global time function, pass in the source of time explicitly throughout.

bowenwang1996

Please change the title of your PR to a descriptive one. Thanks!

yordanmadzhunkov · 2021-11-04T13:22:59Z

@matklad
I also have mixed feelings.
Specific design issues with the current solution:

You understood it correctly. Refactoring the example code to call now once, I can break the tests, (only if you check number of times timer is called)
I will implement
"if time is mocked { get mocked instance or panic } else { return real time}"
Good point. Looks like the real clock ::now method also has build in mutex. At this point I will not be worried about performance issues
I am also not sure, what is the right approach
The motivation to make time mocking local to specific thread is the fact that test run in parallel in separate threads. If time mocking is not thread specific, different tests interfere between each other.

yordanmadzhunkov · 2021-11-04T13:26:06Z

@matklad
Actually, on point 4. I think the correct approach is to create a clock interface and use dependency injection. This way, we can mock tests and in production we can simply inject real system clock. This will require a lot of work to pass the instance to the clock between functions, classes and etc.

pmnoxx · 2021-11-04T20:54:52Z

I like the idea of adding mocking to time. While writing tests I had to add a bit of code to support testing code, which gets executed rarely, having mocks for time would be useful.

However:

A test usually spawns multiple threads. Most likely you will have to change code that starts all actors in the code, in such a way, that child thread uses the same time mock as it's parent.
I would use something like:

thread_local! {
    pub static Singletor: Optional<Arc<RefCell<MockClockPerThread>>>> = ...;
};

Everytime you start new actor, yo would make it point to the same Arc<RefCell>>` as it parent.

We also don't mock time for Actix, timeouts, etc. It would be great to be able to add an option to increase speed of progression of the time to test timeouts, etc.
I'm starting to think that it may be a good idea refactor our code to use one library for time. That would make code much cleaner, we wouldn't have to mock different type of times.
I would be in favor of having our own clock.

struct Clock { 
    // number of ns since 1970 / utc time
    ns: u64
}
struct Duration { 
    // difference in ns
    ns: i64
}

function:

now()
duration_since(...)
sub(...) // singled difference
we can add BorshSerialize / BorshDeserialize / this can be used to replace u64 as time storage.

This would allow us to use single source of time for all code, and also it would be easier to mock.
Sometimes we use u64, sometimes Instant::now, Datetime<Utc>::new(), etc.

bowenwang1996

@yordanmadzhunkov after a discussion with @matklad we agree that the design in this implementation is not optimal. However, to not get blocked on this for fuzzing, I suggest that you figure out the minimal amount of change we need to make the fuzzing possible (deterministic). There are likely a few places where we can refactor the functions to have time passed in as a parameter so that we can avoid doing a big change like this at once.

runtime/runtime-params-estimator/src/main.rs

matklad · 2021-11-05T16:04:56Z

runtime/runtime-params-estimator/src/testbed_runners.rs

@@ -4,6 +4,7 @@ use crate::testbed::RuntimeTestbed;
 use indicatif::{ProgressBar, ProgressStyle};
 use near_crypto::{InMemorySigner, KeyType};
 use near_primitives::hash::CryptoHash;
+use near_primitives::time::Instant;


ditto here: basically, all the code in runtime-params-estimator should be exempt from time mocking. This is essentially benchmarking code, and it has its' own requrenments for time, unrelated to the rest of the code.

matklad

Approving to unblock, but there are three fixes we need to do here:

remove unneded cargo dependency
remove changes to the estimator
make sure that is_mock can't leak between tests

The thread_local suggestion is optional -- I do like to minimize the amount of machinery we use, but as this is impl details, this doesnt' really matter.

Two other nice-to-haves:

are we sure that this change is enough to enable the fuzzing we want? Given that we are not mocking tokio's time or thread::sleeps I wouldn't be surprised if this isn't actually enough.
are we sure that this is the minimal change? From my conversation with @bowenwang1996, I recall the idea was to just refactor some specific part of the code-base to take time as an argument, without introducing thread-locals, but also without fixing every usage of time. If that turns out to be infeasible, I think it's OK to do what we are doing here, but, still, let's try to minimize the blast radius of the design which we agree is not the long-term one.

core/primitives/src/time.rs

core/primitives/Cargo.toml

core/primitives/src/time.rs

bowenwang1996

Approve to unblock, but please address Aleksey's comments.

test-utils/loadtester/src/stats.rs

chain/network/src/routing.rs

pmnoxx · 2021-11-09T08:25:03Z

I created my own implementation Clock for purpose of mocking time. Please, consider an alternative:
#5175

pmnoxx · 2021-11-09T10:29:24Z

core/primitives/src/time.rs

+        });
+    }
+
+    pub fn utc(&mut self) -> DateTime<chrono::Utc> {


How about we change signature to pub fn utc_now() -> DateTime<chrono::Utc>

self is not used - This will allow us to use Clock::utc_now() without having to create Clock {}

changing utc to utc_now should make code more readable. Without prior knowledge we will know that we are getting current time.

pmnoxx

LGTM. The current design will work fine to mock time for tests executing on the same thread. Though, it's easy for someone to start using Instant::now() or Utc::now() by accident, that's why it may be a good idea to introduce one wrapper.

Though, this is not a blocking issue.

pmnoxx · 2021-11-09T10:29:58Z

core/primitives/src/time.rs

+        })
+    }
+
+    pub fn instant(&mut self) -> Instant {


How about we change signature to pub fn instant_now() -> DateTime<chrono::Utc>

pmnoxx · 2021-11-10T02:55:06Z

@yordanmadzhunkov I pushed a few huge changes to chain/network, you may have to resolve merge conflicts.

Good job on the PR.
I look forward to seeing the next PR, where you add propagation of Clock to throughout our code base.
After that we can work on cleaning up the code, to make sure we only have one structure representing time instead of 4.

pmnoxx · 2021-11-10T02:56:18Z

@chefsale @frol Can you take a look?

matklad

👍

Couple of more nitpicly coments:

some pub can still be removed
Rather than making the user import MockClockGuard, the API could look like this:

let _guard: MockClockGuard = Clock::mock();

🤔 thinking more about this, I think it even makes sense to make add_utc and such methods of the guard object, rather than of the Clock itself.

Anyway, I don't think we should try to polish Rust API in this PR (we must polish it sometime later though, time mocking is a fundamental fascility used thourgouht the codebase, its importatn that the API is just right, otherwise changing it later would be a pain).

yordanmadzhunkov requested review from ailisp, bowenwang1996, chefsale, frol, khorolets, matklad, mfornet, mzhangmzz, olonho and pmnoxx as code owners November 3, 2021 13:19

matklad reviewed Nov 3, 2021

View reviewed changes

core/primitives/src/time.rs Outdated Show resolved Hide resolved

bowenwang1996 reviewed Nov 3, 2021

View reviewed changes

yordanmadzhunkov requested a review from mm-near as a code owner November 4, 2021 10:17

yordanmadzhunkov changed the title ~~yordan/mock-time-wip~~ Mocking the system clock Nov 4, 2021

bowenwang1996 reviewed Nov 4, 2021

View reviewed changes

matklad reviewed Nov 5, 2021

View reviewed changes

matklad approved these changes Nov 5, 2021

View reviewed changes

bowenwang1996 approved these changes Nov 5, 2021

View reviewed changes

test-utils/loadtester/src/stats.rs Outdated Show resolved Hide resolved

test-utils/loadtester/src/stats.rs Outdated Show resolved Hide resolved

matklad reviewed Nov 8, 2021

View reviewed changes

chain/network/src/routing.rs Outdated Show resolved Hide resolved

pmnoxx reviewed Nov 9, 2021

View reviewed changes

pmnoxx approved these changes Nov 9, 2021

View reviewed changes

matklad approved these changes Nov 10, 2021

View reviewed changes

Mock time for tests

343f155

yordanmadzhunkov force-pushed the yordan/mock-time-wip branch from 0afbdb2 to 343f155 Compare November 10, 2021 14:07

yordanmadzhunkov closed this Nov 10, 2021

yordanmadzhunkov reopened this Nov 10, 2021

yordanmadzhunkov requested a review from matklad November 10, 2021 14:58

mzhangmzz approved these changes Nov 10, 2021

View reviewed changes

bowenwang1996 added 2 commits November 10, 2021 15:01

Merge branch 'master' into yordan/mock-time-wip

85dc8aa

Merge branch 'master' into yordan/mock-time-wip

abfb38c

bowenwang1996 merged commit 28c1037 into master Nov 11, 2021

bowenwang1996 deleted the yordan/mock-time-wip branch November 11, 2021 00:04

This was referenced Nov 15, 2021

Implement time travel for NEAR testing infrastructure #3661

Closed

[Sandbox] Allow sandbox to produce blocks or "fast-forward" #4686

Open

pmnoxx mentioned this pull request Nov 20, 2021

please ignore #5397

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mocking the system clock #5123

Mocking the system clock #5123

yordanmadzhunkov commented Nov 3, 2021 •

edited

Loading

matklad commented Nov 3, 2021

matklad commented Nov 3, 2021

bowenwang1996 left a comment

yordanmadzhunkov commented Nov 4, 2021

yordanmadzhunkov commented Nov 4, 2021

pmnoxx commented Nov 4, 2021

bowenwang1996 left a comment

matklad Nov 5, 2021

matklad left a comment

bowenwang1996 left a comment

pmnoxx commented Nov 9, 2021

pmnoxx Nov 9, 2021

pmnoxx left a comment •

edited

Loading

pmnoxx Nov 9, 2021

pmnoxx commented Nov 10, 2021

pmnoxx commented Nov 10, 2021

matklad left a comment

Mocking the system clock #5123

Mocking the system clock #5123

Conversation

yordanmadzhunkov commented Nov 3, 2021 • edited Loading

matklad commented Nov 3, 2021

matklad commented Nov 3, 2021

bowenwang1996 left a comment

Choose a reason for hiding this comment

yordanmadzhunkov commented Nov 4, 2021

yordanmadzhunkov commented Nov 4, 2021

pmnoxx commented Nov 4, 2021

bowenwang1996 left a comment

Choose a reason for hiding this comment

matklad Nov 5, 2021

Choose a reason for hiding this comment

matklad left a comment

Choose a reason for hiding this comment

bowenwang1996 left a comment

Choose a reason for hiding this comment

pmnoxx commented Nov 9, 2021

pmnoxx Nov 9, 2021

Choose a reason for hiding this comment

pmnoxx left a comment • edited Loading

Choose a reason for hiding this comment

pmnoxx Nov 9, 2021

Choose a reason for hiding this comment

pmnoxx commented Nov 10, 2021

pmnoxx commented Nov 10, 2021

matklad left a comment

Choose a reason for hiding this comment

yordanmadzhunkov commented Nov 3, 2021 •

edited

Loading

pmnoxx left a comment •

edited

Loading