-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Http Stress Status Report #42211
Comments
Tagging subscribers to this area: @dotnet/ncl |
Fixes: stress client double read of content fixed fixed stress client hangs at start and stop leveraged HttpVersionPolicy increased pipeline timeout since we doubled the runs fixed base docker images to avoid missing IO.Pipelines Kestrel exception. Re-hauled tracing: added server file logging added log file rotation Minor renames. Contributes to: #42211 and #42198
The description of #49999 has some useful links to stack traces according how failures occur today. The "new" HTTP 1.1 connection timeout seems to be a dominant issue on CI, failing on almost every Windows run. |
In the last 20 CI runs, there were 3 occurrences of the "System.Net.Sockets.SocketException (10054)" Windows + HTTP 2.0 case. I won't be surprised, if it's also a firewall issue being eventually solved by #52381. If not we should look into it. |
We decided that we need to rule out the Windows Firewall as a cause for further failure types documented in #42211.
I browsed through the failures of the last 25 days in the CI, and it seems that we are facing the same errors as las September. The only thing worth mentioning is that I've noticed a
|
@antonfirsov can you please add note about your analysis from last week? |
Report 7/29with manual observations after a brief look through the CI output of between 7/13-7/29, excluding the HTTP3 failures:
Going back a bit more I see signs of #55261 (see here), but it doesn't seem to happen since 7/8 (EDIT: Because the test was disabled on 7/8). |
Report 8/5brief look through the CI output between 7/29-8/5:
|
Report 8/13No new failure types in HTTP 1.1 & 2.0 in the last 1 week. |
Report 8/24
|
The occurrence is on 13.8., isn't that before the fix got merged? |
Sorry, linked wrong build, this one happened on the 18th: |
This comment has been minimized.
This comment has been minimized.
@Vijay-Nirmal please file a separate issue with additional information. This issue is tracking general stress test runs, which may or may not be related to what you see. It is fine to link this issue. Thanks! |
Report 2024-01-18Looking Linux runs since 2024-01-05. Windows builds are broken across all branches, see #95750.
|
Report 2024-02-15Looking Linux runs since 2024-01-18. Windows builds are broken across all branches, see #95750.
|
Report 2024-02-29Looking Linux runs since 2024-01-15. Windows builds are broken across all branches, see #95750.
|
Report 2024-03-19Looking Linux runs since 2024-02-29. Windows builds are broken across all branches, see #95750.
|
Report 2024-03-26Looking Linux runs since 2024-03-19. Windows runs are up on main since Sunday.
|
Report 2024-04-02Looking at all runs since 2024-03-26.
|
Report 2024-04-09Looking at runs since 2024-04-02.
|
Report 2024-04-26Looking at runs since 2024-04-09.
|
Report 2024-05-09Looking at runs since 2024-04-26.
|
Report 2024-05-14Looking at runs since 2024-05-09.
|
Report 2024-05-21Looking at runs since 2024-05-14.
|
Report 2024-05-30Looking at runs since 2024-05-21.
|
Report 2024-06-06Looking at runs since 2024-05-30.
|
Report 2024-06-11Looking at runs since 2024-06-06.
|
Looks like: #98220, Ahmet has a PR up for this in: #103081, but I see it happens in reading the content after |
Report 2024-06-18Looking at runs since 2024-06-11.
|
Report 2024-06-25Looking at runs since 2024-06-18.
|
Report 2024-07-10Looking at runs since 2024-06-25.
|
Report 2024-08-20Looking at runs since 2024-07-21. Windows builds are broken since August 11: #106694
|
Report 2024-08-27Looking at runs since 2024-08-20.
|
Report 2024-09-03Looking at runs since 2024-08-27.
|
Report 2024-09-10Looking at runs since 2024-09-03.
|
Report 2024-10-08Looking at runs since 2024-09-10.
|
Report 2024-10-22Looking at runs since 2024-09-08.
|
Report 2024-10-29Looking at runs since 2024-10-22.
|
Http Stress Status Report
What we've run so far:
HTTP 2.0 Error Statistics
The operation was canceled.
A task was canceled.
The response ended prematurely while waiting for the next frame from the server.
Broken pipe
An existing connection was forcibly closed by the remote host.
An established connection was aborted by the software in your host machine.
HTTP 1.1 Error Statistics
No connection could be made because the target machine actively refused it.
What we need to run:
Existing issues, root caused:
TaskCancelledException
as a reaction on GO_AWAY: HTTP/2 stress test TaskCanceledException when client hasn't cancelled #42472Discovered exceptions, not-investigated:
HTTP 2.0 System.Threading.Tasks.TaskCanceledException: The operation was canceled.HTTP/2 stress test TaskCanceledException when client hasn't cancelled #42472The discovered exceptions confirm what we've collected so far from the pipelines: #40388.
Distributable tasks by priority:
Tips and Tricks for investigations:
docker container prune && docker image prune -a
)-b
might be omitted for subsequent re-runs (skips the runtime build)artifacts/bin/testhost/net5.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0/
with the globally installed runtime (/usr/share/dotnet/shared/Microsoft.NETCore.App/your-latest-5.0-version
)corerun
didn't work for me since the app depends on ASP .NET Core SDKSystem.Net.Http
and copySystem.Net.Http.dll
to the global runtime againSystem.Net.Http/tests/StressTests/HttpStress
dotnet run -runMode server -aspnetlog
-aspnetlog
: console logging of server errors-serverUri https://localhost:5002
: bind to a different port (when running multiple tests in parallel)dotnet run -runMode client
-serverUri https://localhost:5002
: connect to a different port (when running multiple tests in parallel)-ops 1 2 3
: run only operation 1, 2 and 3 (GET, PUT Slow, etc...)-trace
: saves internal client/server traces in a log file, very verbose, useable only for very short runsIf you have any improvements to the stress app or the containers, please create a PR and don't keep it just for yourself.
If you have more tips and tricks for running the tests, please share them.
The text was updated successfully, but these errors were encountered: