Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow failed - km_mt_stress_tests #3607

Open
github-actions bot opened this issue Jun 7, 2024 · 25 comments · May be fixed by #3866
Open

Workflow failed - km_mt_stress_tests #3607

github-actions bot opened this issue Jun 7, 2024 · 25 comments · May be fixed by #3866
Assignees
Labels
blocked Blocked on another issue that must be done first bug Something isn't working ci/cd Issue is specific to CI/CD duplicate This issue or pull request already exists P1 triaged Discussed in a triage meeting
Milestone

Comments

@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2024

Failed Run
Codebase
Test name - km_mt_stress_tests

@github-actions github-actions bot added bug Something isn't working ci/cd Issue is specific to CI/CD labels Jun 7, 2024
@dv-msft dv-msft self-assigned this Jun 10, 2024
@dv-msft dv-msft added this to the 2406 milestone Jun 10, 2024
@shpalani
Copy link
Collaborator

Known issue: ebpf-for-windows.msi installation failed in CICD.

@dahavey dahavey added triaged Discussed in a triage meeting duplicate This issue or pull request already exists labels Jun 10, 2024
@dv-msft
Copy link
Collaborator

dv-msft commented Jun 10, 2024

This is a dup of #3602. Keeping this open to avoid CI/CD noise

@dv-msft dv-msft added the P1 label Jun 10, 2024
Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

@shankarseal shankarseal modified the milestones: 2406, 2407 Jun 29, 2024
Copy link
Contributor Author

github-actions bot commented Jul 4, 2024

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

github-actions bot commented Jul 6, 2024

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

@shankarseal shankarseal assigned matthewige and unassigned dv-msft Jul 25, 2024
@shankarseal shankarseal modified the milestones: 2407, 2408 Jul 25, 2024
Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

@shpalani
Copy link
Collaborator

shpalani commented Jul 29, 2024

Failure Log:

[01:47:12] :: Starting test *** native_invoke_v4_v6_programs_restart_extension_test ***
[01:47:12] :: test threads per program    : 2
[01:47:12] :: test duration (in minutes)  : 5
[01:47:12] :: test verbose output         : false
[01:47:12] :: test extension restart      : false
[01:47:12] :: waiting on 2 test threads...
[01:47:12] :: **_load_attach_program(0) FATAL ERROR: bpf_prog_attach(cgroup_count_connect4.sys) failed.** program:count_tcp_connect4, errno:22
[01:47:12] :: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[01:47:12] :: ebpf_stress_tests_km is a Catch2 v3.6.0 host application.
[01:47:12] :: Run with -? for options
[01:47:12] :: -------------------------------------------------------------------------------
[01:47:12] :: native_invoke_v4_v6_programs_restart_extension_test
[01:47:12] :: -------------------------------------------------------------------------------
[01:47:12] :: D:\a\ebpf-for-windows\ebpf-for-windows\tests\stress\km\stress_tests_km.cpp(1469)
[01:47:12] :: ...............................................................................
[01:47:13] :: D:\a\ebpf-for-windows\ebpf-for-windows\tests\stress\km\stress_tests_km.cpp(822): FAILED:
[01:47:13] ::   REQUIRE( result == 0 )
[01:47:13] :: with expansion:
[01:47:13] ::   -22 == 0
[01:47:13] :: 

[01:47:13] :: *** ERROR *** C:\eBPF\Run-Self-Hosted-Runner-Test.ps1: C:\eBPF\ebpf_stress_tests_km failed.
*** ERROR *** C:\eBPF\Run-Self-Hosted-Runner-Test.ps1: C:\eBPF\ebpf_stress_tests_km failed.
At C:\actions_runner_2019_1\_work\ebpf-for-windows\ebpf-for-windows\x64\Release\vm_run_tests.psm1:33 char:5
+     Invoke-Command -VMName $VMName -Credential $TestCredential -Scrip ...
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (*** ERROR *** C...ests_km failed.:String) [], RuntimeException
    + FullyQualifiedErrorId : *** ERROR *** C:\eBPF\Run-Self-Hosted-Runner-Test.ps1: C:\eBPF\ebpf_stress_tests_km fail 
   ed.
    + PSComputerName        : vm1_ws2019
 
Error: Process completed with exit code 1.

Copy link
Contributor Author

github-actions bot commented Aug 3, 2024

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

github-actions bot commented Aug 6, 2024

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

@matthewige
Copy link
Collaborator

matthewige commented Aug 19, 2024

From https://github.com/microsoft/ebpf-for-windows/actions/runs/10449590580/job/28932788838
[02:40:35] :: native_invoke_v4_v6_programs_restart_extension_test
[02:40:35] :: -------------------------------------------------------------------------------
[02:40:35] :: D:\a\ebpf-for-windows\ebpf-for-windows\tests\stress\km\stress_tests_km.cpp(1469)
[02:40:35] :: ...............................................................................
[02:40:35] :: D:\a\ebpf-for-windows\ebpf-for-windows\tests\stress\km\stress_tests_km.cpp(822): FAILED:
[02:40:35] :: REQUIRE( result == 0 )
[02:40:35] :: with expansion:
[02:40:35] :: -22 == 0
[02:40:35] ::

This test case has 2 threads, one loading/attaching a connect v4 program and the other loading a v6 program.

Program attach is failing (line 822 mentioned above):
result = bpf_prog_attach(program_fd, UNSPECIFIED_COMPARTMENT_ID, attach_type, 0);
if (result != 0) {
LOG_ERROR(
"{}({}) FATAL ERROR: bpf_prog_attach({}) failed. program:{}, errno:{}",
func,
thread_index,
file_name.c_str(),
program->program_name,
errno);
REQUIRE(result == 0);
}

From the failure traces:
1132290 [0]0F40.0A84::2024/08/19-02:40:27.557685400 [NetEbpfExtProvider]{"api":"FwpmTransactionBegin","status":"0xC022000E(NT=The call is not allowed from within an explicit transaction.)","meta":{"provider":"NetEbpfExtProvider","event":"NetEbpfExtApiError","time":"2024-08-19T09:40:27.5576854Z","cpu":0,"pid":3904,"tid":2692,"channel":11,"level":2,"keywords":"0x4"}}
1132291 [0]0F40.0A84::2024/08/19-02:40:27.557712600 [NetEbpfExtProvider]{"ErrorMessage":"net_ebpf_extension_add_wfp_filters returned error","Error":6,"meta":{"provider":"NetEbpfExtProvider","event":"NetEbpfExtGenericError","time":"2024-08-19T09:40:27.5577126Z","cpu":0,"pid":3904,"tid":2692,"channel":11,"level":2,"keywords":"0x2"}}
1132292 [0]0F40.0A84::2024/08/19-02:40:27.557715200 [NetEbpfExtProvider]{"ErrorMessage":"_net_ebpf_extension_sock_addr_on_client_attach returned error","Error":6,"meta":{"provider":"NetEbpfExtProvider","event":"NetEbpfExtGenericError","time":"2024-08-19T09:40:27.5577152Z","cpu":0,"pid":3904,"tid":2692,"channel":11,"level":2,"keywords":"0x2"}}
1132293 [0]0F40.0A84::2024/08/19-02:40:27.557735600 [NetEbpfExtProvider]{"ErrorMessage":"_net_ebpf_extension_sock_addr_on_client_attach returned error","Error":6,"meta":{"provider":"NetEbpfExtProvider","event":"NetEbpfExtGenericError","time":"2024-08-19T09:40:27.5577356Z","cpu":0,"pid":3904,"tid":2692,"channel":11,"level":2,"keywords":"0x2"}}
1132294 [0]0F40.0A84::2024/08/19-02:40:27.557737900 [NetEbpfExtProvider]{"Message":"attach_callback returned failure. Attach attempt rejected.","value":6,"meta":{"provider":"NetEbpfExtProvider","event":"NetEbpfExtGenericMessage","time":"2024-08-19T09:40:27.5577379Z","cpu":0,"pid":3904,"tid":2692,"channel":11,"level":2,"keywords":"0x4"}}

Upon code inspection, it's possible that multiple threads invoke net_ebpf_extension_add_wfp_filters() (this can be multiple programs of the same type, of different program types, which could all hit this issue). We are using a global filter engine handle, which both call FwpmTransactionBegin() using this same global handle, which would cause the failure observed above.

We'll need to fix this - probably adding some serialization, and/or using different filter handles per operation.

@shankarseal shankarseal modified the milestones: 2408, 2409 Aug 26, 2024
@matthewige
Copy link
Collaborator

After some offline discussion, the codepaths needed to fix this are also being modified in this PR 3571 (#3751). I will wait until that is completed before taking up this fix.

@matthewige matthewige added the blocked Blocked on another issue that must be done first label Aug 26, 2024
Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

github-actions bot commented Sep 4, 2024

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

github-actions bot commented Sep 6, 2024

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

github-actions bot commented Oct 3, 2024

Failed Run
Codebase
Test name - km_mt_stress_tests

Copy link
Contributor Author

github-actions bot commented Oct 4, 2024

Failed Run
Codebase
Test name - km_mt_stress_tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Blocked on another issue that must be done first bug Something isn't working ci/cd Issue is specific to CI/CD duplicate This issue or pull request already exists P1 triaged Discussed in a triage meeting
Projects
None yet
5 participants