Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Using scheduled start-time instead of actual start-time #56

Closed

Conversation

pveentjer
Copy link
Contributor

@pveentjer pveentjer commented Jan 27, 2025

Currently the FailoverTestRug uses the actual starting time to determine the latency.

  int workCount = 0;
            final long now = clock.nanoTime();

            if (moreToGenerate && now - nextMessageAt >= 0)
            {
                generationTimestamps[freePosition++] = now; <---

            ...
            }

 private int trySend()
    {
        if (!synced || sendPosition >= freePosition)
        {
            return 0;
        }

        final int sequence = sendPosition;
        final long timestamp = generationTimestamps[sendPosition]; <------

        if (transceiver.trySendEcho(sequence, timestamp))
        {
            sendPosition++;

            return 1;
        }

        return 0;
    }

The consequence is that the system can still be suffering from a form of coordinated omission and therefor the latency numbers will look more positive than they actually are.

Explanation:

The load generator does 100 request/second, so every 10 ms there will be a request.

Every request to the remote system is handled in exactly 3 ms.

Imagine that the clocks in ms and currently is at 50000 and the next request is scheduled at 50010. If there is a stall of 100 ms just before calling now, then now will return 50100, so the value 50100 is stored in the generationTimestamps and that is the value passed to the echo message. Since a request takes 3 ms, the latency of that request will be be 50103-50100=3 ms. This is because it is based on the actual starting time of that request.

But the scheduled starting time of the request was 50010, so the actual latency is 50103-50010=93 ms. That is a 30x difference.

It doesn't only affect this call, but all calls that should have been made during the 100 ms pause. These calls still get made (which is good) but assuming there are no further stalls, than with the current code, the measured latency for the 9 other calls will be 3,3,3,...,3,3,3,3. But in reality it should be 87,77,67,...,13,3. So the measured latencies of calls that should have happened during the stall, are incorrect.

This can be seen as a form of coordinated omission. The main difference is that it isn't caused by the remote system, but is caused by the local system. And you can get stalls in the local system.

What should be done is to use the scheduled starting time of a request to determine the latency and not the actual starting time of the request.

@pveentjer pveentjer changed the title Using intended start time instead of actual start time [WIP] Using intended start time instead of actual start time Jan 27, 2025
@@ -256,8 +266,6 @@ private void runTest(final int durationSeconds, final int messageRate)

workCount += transceiver.receive();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this second trySend done? You only want to send when a request is scheduled and that is taken care of by the above if-statement.

@pveentjer pveentjer changed the title [WIP] Using intended start time instead of actual start time [WIP] Using scheduled start-time instead of actual start-time Jan 28, 2025
@pveentjer pveentjer requested a review from vyazelenko January 28, 2025 07:30
@pveentjer pveentjer closed this Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant