Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should undecryptable packets arrive at the epoll? #2503

Closed
Chardrazle opened this issue Oct 28, 2022 · 9 comments
Closed

Should undecryptable packets arrive at the epoll? #2503

Chardrazle opened this issue Oct 28, 2022 · 9 comments
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@Chardrazle
Copy link

We are currently upgrading from version 1.4.3 to 1.5.0, and just wanted to check if there were any behavioural changes that might explain what we see - before delving deeper into the investigation.
We have some (relatively) simple unit-tests that set up a local sender/receiver. One of those tests is for encryption and has a missing passphrase in the receiver.
In the version 1.4.3 build the packet does not make it past the epoll, but in the 1.5.0 version it does. So an undecryptable packet is presented to the handler.

In both cases the logs can be seen to:

.../srt/srt-src/srtcore/crypto.cpp(822)	ERROR	decrypt	/SRT:RcvQ:w1*E:SRT.cn: SECURITY STATUS: NOSECRET - can't decrypt w_packet.
.../srt/srt-src/srtcore/crypto.cpp(824)	TRACE	decrypt	/SRT:RcvQ:w1 D:SRT.cn: Packet still not decrypted, status=NOSECRET - dropping size=1456
.../srt/srt-src/srtcore/core.cpp(10170)	TRACE	processData	/SRT:RcvQ:w1 D:SRT.qr: @29409669:ERROR: packet not decrypted, dropping data.

At first this looked like symptoms similar to issue #1977, but that appears to be for earlier versions.
(The SRTO_ENFORCEDENCRYPTION is switched off, though.)

When trying to identify a difference, a preliminary trace through the code found that the iteration loop in epoll.cpp (~607):
for (CEPollDesc::enotice_t::iterator it = ed.enotice_begin(), it_next = it; it != ed.enotice_end(); it = it_next)
was not entered for the 1.4.3 version, so the total counter was not incremented.
There is a 'notice' in this container for the 1.5.0 -based build, resulting in a non-zero return.

Of course, this may just be highlighting that we're doing something wrong elsewhere, hence the question rather than issue :)

@Chardrazle Chardrazle added the Type: Question Questions or things that require clarification label Oct 28, 2022
@ethouris
Copy link
Collaborator

No, that part is only to copy the readiness information that was earlier set through the call to (somewhere from a function in core.cpp file):

uglobal().m_EPoll.update_events(m_SocketID, m_sPollID, SRT_EPOLL_IN, true);

This looks like that the above call was executed after the decryption failed and the packet was requested to be dropped - that is, the readiness was set even if there was no packet ready to read.

We do have unit tests for encryption handling, it's test/test_enforced_encryption.cpp. Would you be able to modify it to represent your case? Likely to the TestConnect method there's needed also a mechanism of sending and receiving one packet of data under a condition of expected connection success and non-blocking mode.

Note that this test checks things in various combinations of encryption settings on both sides together with the value of SRTO_ENCFORCEDENCRYPTION socket option.

@Chardrazle
Copy link
Author

Chardrazle commented Nov 1, 2022

I've modified, well right now it's just a horrible kludge of, the test code you mentioned.
(The test is run with --gtest_filter, as it's so hard-coded at the moment.)
The test case D.3 is the closest to what we have (from the matrix).
In our system we have threads for the sender and receiver, with epoll instances on both. So, to roughly behave the same, after the current connection part of the test executes (in TestConnect) I've made the code create two new epoll instances, and link those to the sender (m_caller_socket) and receiver (accepted_socket), respectively. (We use srt_epoll_wait).
I then send a single packet, wait a while, and call the epoll associated with the receiver. This reports the event, and a subsequent srt_recvmsg2 acquires the (still encrypted) packet data.

I copied the kludge into the 1.4.3 code and the epoll does not report the event. So it looks like something changed between versions.

The area around the snippet you posted; i.e. this part (1.5.0):

            /*
             * Set EPOLL_IN to wakeup any thread waiting on epoll
             */
            self->uglobal().m_EPoll.update_events(self->m_SocketID, self->m_sPollID, SRT_EPOLL_IN, true);

Putting a breakpoint on that call fires in the test for the 1.5.0 version.
The previous 1.4.3 version:

            /*
             * Set EPOLL_IN to wakeup any thread waiting on epoll
             */
            self->s_UDTUnited.m_EPoll.update_events(self->m_SocketID, self->m_sPollID, SRT_EPOLL_IN, true);

Does not trigger. Perhaps it's related to the determination of the rxready state?

@Chardrazle
Copy link
Author

In light of the above findings, this is now a suspected regression. Can the label be updated, or should another issue be raised?

@maxsharabayko maxsharabayko added Type: Bug Indicates an unexpected problem or unintended behavior [core] Area: Changes in SRT library core and removed Type: Question Questions or things that require clarification labels Jan 9, 2023
@maxsharabayko maxsharabayko added this to the v1.5.2 milestone Jan 9, 2023
@maxsharabayko
Copy link
Collaborator

@Chardrazle Could you please share the test you have to reproduce the issue?

TODO

@Chardrazle
Copy link
Author

Chardrazle commented Jan 9, 2023

As mentioned, the test code was hacked just for the specific case. But, FWIW, the attached diff can be applied.
patch test_enforced_encryption.cpp 2503_srt_3pp_143_150_kludge.diff.txt
2503_srt_3pp_143_150_kludge.diff.txt

It's likely the code (incorrectly) breaks other unit-tests because it was purely based on exercising the D.3 matrix case.
I.e. ./test-srt --gtest_filter="*CASE_D_3_Non*"
The diff can be applied to both the 1.5.0 and 1.4.3 code, to see the different behaviour.

We have now moved over to SRTO_ENFORCEDENCRYPTION enabled, so have avoided the issue.

@ethouris
Copy link
Collaborator

I tried this, applied the patch and ran according to the instruction. Is it expected to fail? This is the result with the latest version:

[ RUN      ] TestEnforcedEncryption.CASE_D_3_NonBlocking_Enforced_Off_Off_Pwd_Set_None
13:43:44.481716/SRT:RcvQ:w1!W:SRT.cn: @110200653: HS KMREQ: Peer declares encryption, but agent does not - still allowing connection.
13:43:44.481890/SRT:RcvQ:w1!W:SRT.cn: processSrtMsg_KMREQ: Agent does not declare encryption - won't decrypt incoming packets!
Epoll wait result: 1 FOUND: @110200654 in [R]
ACCEPT: done, result=110200653
[T] ACCEPT SUCCEEDED: @110200653
13:43:44.482725/SRT:RcvQ:w2!W:SRT.cn: processSrtMsg_KMRSP: received failure report. STATE: UNSECURED
W: 1
13:43:44.833677/SRT:RcvQ:w1*E:SRT.cn: SECURITY STATUS: NOSECRET - can't decrypt w_packet.
13:43:44.833717/SRT:RcvQ:w1!W:SRT.qr: @110200653: Decryption failed. Seqno %825919141, msgno 1. Dropping 1.
R: -1
[       OK ] TestEnforcedEncryption.CASE_D_3_NonBlocking_Enforced_Off_Off_Pwd_Set_None (2396 ms)

To make sure, I retrtied by installing release from label v1.5.1 and applying the fix on the test again. This time the test failed:

14:03:25.519297/SRT:RcvQ:w2!W:SRT.cn: processSrtMsg_KMRSP: received failure report. STATE: UNSECURED
W: 1
14:03:25.871103/SRT:RcvQ:w1*E:SRT.cn: SECURITY STATUS: NOSECRET - can't decrypt w_packet.
R: 1
/drive/h/sektor/repos/srt.gitd/main.v/test/test_enforced_encryption.cpp:523: Failure
Expected: (epoll_res_r) <= (0), actual: 1 vs 0
It's wrongly reported, so let's take a look...
/drive/h/sektor/repos/srt.gitd/main.v/test/test_enforced_encryption.cpp:525: Failure
Expected equality of these values:
  srt_recvmsg2(accepted_socket, buffer, sizeof buffer, nullptr)
    Which is: 1316
  -1
[  FAILED  ] TestEnforcedEncryption.CASE_D_3_NonBlocking_Enforced_Off_Off_Pwd_Set_None (1900 ms)

It's then quite probable that this problem did exist, but has been fixed in the latest master. Could you try to confirm this, please?

@Chardrazle
Copy link
Author

Yes, I can confirm the same findings as yourself 👍
I cannot recall if I ever tested against latest master; I probably only tried the tagged versions.
I just tried the nearest commit to the date I raised the defect and that shows the issue. So it looks like the fix went in some time between then and now.

@ethouris
Copy link
Collaborator

Ok, I'm closing this issue then - there's nothing more to do, and it is still visible as required the 1.5.1 release.

@ethouris
Copy link
Collaborator

Just for the record: the #2599 PR has fixed the problem (likely, that is, maybe the fix wasn't exactly intended for this problem, but it did fix it).

The change was related to the new receiver buffer logics.

With the old receiver buffer logics packets that failed to get decrypted had the encryption flags not cleared and as such the packet didn't constitute any valid data and hence it wasn't also triggering read-readiness.

In the new receiver buffer the encryption logics have been taken away and therefore the decryption failure had to be programmed differently - that is, undecrypted packets (after adding to the buffer) had to be removed from the buffer, and not only themselves, but the whole message, if the packet was a part of a message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants