-
Notifications
You must be signed in to change notification settings - Fork 13.3k
LwIP issue, or how a simple "if" can go wrong
There once was an issue of apparently random exceptions coming from the WebServer. It was hard to track, but was ultimately resolved by re-building LwIP library from source provided by Espressif. It was quite strange that lwip.a
built by Espressif with xt-xcc
resulted in issues, while the one built with gcc
did not. However, the issue was resolved, everyone was happy, and I have already lost too much time to analyze this further.
So time went by, we have upgraded to SDK 1.5.1, and with the Espressif-provided LwIP the issue, naturally, reappeared. I couldn't fix this again by building LwIP from source, because LwIP source for this SDK version was not released yet. So I decided to dig into it again.
The offending piece of code was found within tcp_input
function. Here is the source (manually pre-processed for clarity):
err_t err;
if(pcb->recv != NULL) {
err = pcb->recv(pcb->callback_arg, pcb, NULL, ERR_OK);
} else {
err = ERR_OK;
}
if (err == ERR_ABRT) {
goto aborted;
}
/// some code
tcp_output(pcb);
/// more code
aborted:
/// jump here if connection was aborted from callback
This part of code gets executed when TCP connection is closed by the remote side. LwIP calls recv
callback with NULL pbuf
pointer, indicating that connection is closed. Application code may simply return ERR_OK
, or it may abort the connection (calling tcp_abort(pcb)
) and return ERR_ABRT
. LwIP will check the return value of callback and will either go through normal close (calling tcp_output
), or bail out if ERR_ABRT
was returned.
It is particularly important to bail out in this case, because tcp_abort
frees memory allocated for pcb
, so no further operations on pcb
may be performed, including tcp_output
.
Let's look at the code generated for this function:
|4022f600 loc_4022f600:
│4022f600 $a7 = *(u32*)($a12 + 0x8c) ; l32i $a7, $a12, 0x8c ; $a7 = pcb->recv
│4022f603 if (!$a7) goto recv_is_null ; beqz.n $a7, recv_is_null
│4022f605 $a2 = *(u32*)($a12 + 0x18) ; l32i.n $a2, $a12, 0x18
│4022f607 $a3 = $a12 ; mov.n $a3, $a12
│4022f609 $a4 = 0x0 ; movi.n $a4, 0x0
│4022f60b $a5 = 0x0 ; movi.n $a5, 0x0
│4022f60d call $a7 ; callx0 $a7 ; call recv callback
│4022f610 goto if_block_done ; j if_block_done ; return value (err) is in $a2
│4022f613 ; xref: 0x4022f603 j
│4022f613 recv_is_null:
│4022f613 $a2 = 0x0 ; movi.n $a2, 0x0 ; err = 0 (ERR_OK)
│4022f615 ; xref: 0x4022f610 j
│4022f615 if_block_done:
│4022f615 $a4 = $a2 + 0x8 ; addi.n $a4, $a2, 0x8 ; err += 8
│4022f617 if (!$a4) goto label_aborted ; beqz $a4, label_aborted
This looks pretty straightforward. First we check if the callback function pointer is non-zero. If it's not, we branch to recv_is_null
and set err
variable in register a2
to zero (which is ERR_OK
). If a callback is set, we set up arguments and invoke it. Return value is placed into the same a2
register.
Then the code compares err
with ERR_ABRT
(which is defined as -8
) by adding 8 and comparing to 0. This is fine until you remember that err_t
is a typedef for int8_t
. So if recv callback returns ERR_ABRT
, the value of a2
will be 248. Add 8 to that, and you get 256, not 0. Uh-oh! The code fails to properly handle ERR_ABRT
, goes on to call tcp_output
on a destroyed pcb
, and we are in trouble.
Looking through the code, sent
callback is also affected by the same issue. However, in another part of tcp_input
compiler decided to generate a different sequence of instructions:
│4022f4b9 $a11 = -0x8 ; movi.n $a11, -0x8
│4022f4bb if ($a2 == $a11) goto aborted ; beq $a2, $a11, aborted
Until we find a proper fix for this issue (on the xt-xcc side), we will convert return values of callbacks to int32_t.