Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGV after infinite loop retrying proxy_socks5_read_connection_response #2352

Closed
emdee-is opened this issue Oct 13, 2022 · 24 comments
Closed
Milestone

Comments

@emdee-is
Copy link

emdee-is commented Oct 13, 2022

I have weird networking problems with SOCKS proxies.

This is a very rare SEGV caught by gdb running python (gdb -ex r --args /usr/bin/python3.9). This is an infinite loop leading to a segv under toxygen when connected to a SOCKS5 proxy (Tor). This happening at a time when toxic on the same machine is completely blocked from Tox chat and NGC. SEGVs are a rare case under toxygen - I've very rarely seen it.

The tracebacks are Python running under gdb. toxygen is just wrapping libtoxcore, and any errors in the wrapping are almost always immediately fatal when calling, so I don't suspect toxygen.

So the trace is a mixture of:

  1. a recent c-toxcore compiled TRACE - those messages start
    TRAC>

  2. Python logging, which all looks normal. It starts:
    modname DEBUG|INFO debugmessage
    or DBUG+ or sometimes no prefix at all - [

  3. gdb traceback at the end

Comments:

@Green-Sky I'm not someone who knows the code, but I'd say the the code in TCP_client.c#950 is retrying proxy_socks5_read_connection_response an infinite number of times, and at the very least, as a failsafe, it must give up and fail after a set amount of time or number of tries. There should be no infinite loops.

@JFreegman Why is net_packet_type_name returning "unknown"?
There's no 0x05 entry in typedef enum Net_Packet_Type.

Traceback:

https://bin.nixnet.services/?bf35a2c2c484f266#9JRCgpKJ62Rsruu99Fdr4gzYGpZwWVtyncGWYWdWMosc

@JFreegman
Copy link
Member

@JFreegman Why is net_packet_type_name returning "unknown"?
There's no 0x05 entry in typedef enum Net_Packet_Type.

If it's a UDP packet it's coming from the testnet. You're probably running toxic with the testnet enabled which is why it's not working, and it's probably trying to send packets to your other client.

@emdee-is
Copy link
Author

emdee-is commented Oct 13, 2022

No it's not Toxic - I double checked . I checked to see if it is old Echobot or the like but there's nothing running. It must be coming over the Mainnet as I'm behind a firewall.

As I'm still seeing 05 packets, maybe the libtoxcore could define 0x05 on mainnet so that LOG_TRACE could be explicit when dropping them, and maybe log the IP. Then when the log says unknown it really is unknown.

But the SEGV is unrelated to this, as is the occurqnce of an infinite networking loop.

@emdee-is
Copy link
Author

emdee-is commented Oct 13, 2022

When I see 200 lines like

TRAC> TCP_common.c#203:read_TCP_packet recv buffer has 0 bytes, but requested 10 bytes

doesn't that mean something is retrying too hard and should have failed?

@Green-Sky
Copy link
Member

Green-Sky commented Oct 13, 2022

TCP_common.c#203:read_TCP_packet recv buffer has 0 bytes, but requested 10 bytes

The tcp code right now does not timeout established connections, that don't receive data. The same happens on tcp_relays. (#2332)

@emdee-is
Copy link
Author

@Green-Sky " tcp code right now does not timeout established connections," - no matter what the root cause of the problem is, I think the code should be bulletprooffed/tightened so that both the client and server TCP code have timeouts. It can't be that hard to do (YMMV :0) and #2332 gives a simple reproducible test case, so it that could be done it's a great step. That can be done right away and make help or solve the problem anyway.

The TCP code probably doesn't get as much testing/fuzzing as the UDP and it may get attacked, like Tor gets attacked, so it should be hardened,

@nurupo "did you modify the toxcore you are using in any way? " no -it's -DDebug and some CMakeLists things turned on but a clean TokTok/c-toxcore checkout of a couple of weeks ago, that have been running toxic and toxygen fine for weeks/months.

It's a recent problem - last few days, and the logs always have these errors:


TRAC> network.c#789:loglogdata [05 = <unknown>           ] T=>   3= 5.79.75.37:33445 (0: OK) | 0000000000000000...00
TRAC> TCP_common.c#203:read_TCP_packet recv buffer has 0 bytes, but requested 2 bytes
  1. I checked one of the IPs and it's Tha's and he's certain it's not TEST_NET.
  2. JF sez 05 is a known packet type code for TCP but network.c#789 is telling me otherwise. So I think this code needs to explicitly drop any 05 packets and log it as such.
  3. The log sez 0000000000000000...00 - does that mean someone it intentionally sending a null or short or zeros packet? Bear in mind that this could be intentionally bad behaviour - "this might be an attack scenario ".
  4. JF sez that 05 packets get dropped by relays, but I disagree and feel these packets came from Tox over Tox from TCP relays - I'm behind a firewall. So could the tcp_server code explicitly drop 05 packets and log it as such.

@nurupo " it's actually not toxcore that caused the crash but toxygen" No I don't think so. I know the stacktrace sez it SEGVed in the logger, but the logger is just print statements and hasn't changed in months. You can't ignore the logger worked fine for the preceding 200 lines of errors! My feeling is, and I've seen this before, the code infinite looped until resource exhaustion or something, which caused the SEGV - the code is spending most of its time in the logger, so it got caught there. I can CNP the logger code but the code is a CTYPEs wrapping of a Python callback that allocates no resources.

We're very lucky because:

  1. I got a gdb stack trace
  2. tox-bootstrapd goes into an infinite loop  #2332 is reproducible, although perhaps unrelated. Although "trace output is supposed to spam your log into oblivion" the logs are indicating problems and SEGVs are a very big problem.
  3. "these are very weird logs" :-)

PS: in writing TCP client or server code remember that "reasonable" timeouts may have to be seriously extended when it's over a proxy.

@nurupo
Copy link
Member

nurupo commented Oct 14, 2022

According to that stack trace, it crashes when calling log_callback callback function provided by Toxygen:

tox->log_callback(tox, (Tox_Log_Level)level, file, line, func, message, userdata);

Either the log_callback function pointer, as provided by Toxygen to toxcore, is invalid, or it crashes somewhere inside that function. Either way, the crash is caused by Toxygen.


As for the read_TCP_packet recv buffer has 0 bytes, but requested 10 bytes log spam: toxcore sends a SOCKS5 proxy an authentication request (for no authentication), the proxy replies that it accepted the request, then toxcore sends the connection request and the proxy just doesn't reply anything back. According to your logs, it looks like toxcore keeps waiting for the proxy to reply to the connection request, never receiving the reply, so it keeps checking for a reply for the full remainder of the 10 second TCP connection timeout.

@emdee-is

When I see 200 lines like

TRAC> TCP_common.c#203:read_TCP_packet recv buffer has 0 bytes, but requested 10 bytes

doesn't that mean something is retrying too hard and should have failed?

No it doesn't. The reason why you are seeing so many of such messages is because there are A LOT of SOCKS5 connections going on, and toxcore keeps checking all of them for the data on every iteration, until the TCP connection is timed out, which is 10 seconds. It's not an infinite loop in the regular sense of a hot while 1 loop, it just checks if the proxy replied every tox iteration, without blocking anything else toxcore might be doing, unlike what a hot loop would do. Everything is working as intended.

@Green-Sky

The tcp code right now does not timeout established connections, that don't receive data. The same happens on tcp_relays.

Yes it does time out. As mentioned on IRC, TCP client has a clear timeout: if it doesn't establish a connection in 10 seconds -- the connection gets closed:


Now to the weird stuff going on in the log.

A SOCKS5 negotiation goes like this:

1. Tox => Proxy: 05 01 00                   # hey, SOCKS5 proxy, i want to connect using the no-auth method
2. Proxy => Tox: 05 00                      # ok, go ahead, i support no-auth
3. Tox => Proxy: 05 01 00 01 <IPv4> <PORT>  # cool, connect me to IP:PORT please
4. Proxy => Tox: 05 00 00 01 <IPv4> <PORT>  # there you go, I have connected to it with my IP:PORT, whatever you send next will be proxied now

Does this look familiar? For example, take a look at the TCP communication with the 85.143.221.42:33445 node from your log:

1. TRAC> network.c#789:loglogdata [05 = <unknown>           ] T=>   3= 85.143.221.42:33445 (0: OK) | 0000000000000000...00
2. TRAC> network.c#789:loglogdata [05 = <unknown>           ] =>T   2= 85.143.221.42:33445 (0: OK) | 0000000000000000...00
3. TRAC> network.c#789:loglogdata [05 = <unknown>           ] T=>  10= 85.143.221.42:33445 (0: OK) | 010001558fdd2a82...a5

First toxcore sends 3 bytes of 05 00 00, then receives 05 00, then sends 05 01 00 01 558fdd2a82 82a5. 558fdd2a82 is 85.143.221.42, 82a5 is 33445.

Note that the first message is logged as 05 00 00 when it is in fact 05 01 00 that is being sent. That's because it's less than 5 bytes long. If a packet is less than 5 bytes, it's logged as zeros instead of actual value in the packet (kind of weird, but it looks like the code expects to see two 4-byte integers in there instead of the SOCKS5 protocol data, and in that context it makes sense. perhaps logging needs to account for the cases like this too?):

c-toxcore/toxcore/network.c

Lines 636 to 645 in 8054854

static uint32_t data_0(uint16_t buflen, const uint8_t *buffer)
{
uint32_t data = 0;
if (buflen > 4) {
net_unpack_u32(buffer + 1, &data);
}
return data;
}

Note that (2) is similarly logged as zeros, however because toxcore replies with (3), we can deduct that the node has in fact sent us 05 00, as otherwise toxcore wouldn't send (3) to it.

Also note that step 4 of SOCKS5 negotiation is missing, the node never replies back, which is the cause of what @emdee-is called an "infinite loop" -- toxcore is checking on the connection request reply from the proxy, with read_TCP_packet recv buffer has 0 bytes, but requested 10 bytes being printed into logs.

Anyway, to cut it with "Note ...", it looks like your toxcore is trying to connect to multiple DHT nodes using the said DHT nodes as SOCKS5 proxies. This should not be happening. It should be trying to connect to multiple DHT nodes using the user-provided SOCKS5 proxy, it should NOT be using the DHT nodes it's trying to connect to as SOCKS5 proxies. Also, it's weird that the nodes reply back with 05 00 as if they actually have a SOCKS5 proxy server running. Obviously it's something in toxcore that is replying, because 85.143.221.42:33445 is no SOCKS5 proxy address, that's toxcore, but why it replies back and why with the correct SOCKS5 server reply? The server reply is not even implemented in toxcore, toxcore doesn't implement SOCKS5 proxy server functionality, only the client functionality. Also why is all this happening inside of do_gc_tcp() call? Is this a NGC bug? Sounds like a NGC bug. Somehow TCP_Proxy_Info.ip_port that is being passed into new_TCP_connection() contains the ip_port of a DHT node instead of a proxy. I don't see how Toxygen could cause that since Tox_Options allows to specify only 1 proxy but according to your logs toxcore uses many different ones. @emdee-is Did you perhaps modify your toxcore? If not, @JFreegman should take a look at the ip_port used in init_gc_tcp_connection() and how could it be that of a different DHT node every time it's called.

So a list of issues discovered:

  • Toxygen does something that cases the crash during the log callback.
  • Toxcore packet logging assumes that the 1st byte is the packet type followed by two 4-byte integers in network-byte order. This is obviously not the case when toxcore communicates with SOCKS5 and HTTP proxies, because proxy servers don't use the Tox protocol, they use SOCKS5 and HTTP protocols, and those protocols don't start the packet with a Tox packet id followed by two integers. Perhaps unknown packet types need to be logged as is? Or we should somehow signify to the logger that these are not Tox protocol packets so that it logs them differently?
  • Toxcore replies to a SOCKS5 client authentication negotiation request with the correct SOCKS5 server reply. Would be nice if it didn't do that.
  • A bug (in NGC?) with the wrong proxy ip port being passed when creating TCP_client connections.

@JFreegman
Copy link
Member

JFreegman commented Oct 14, 2022

@nurupo

A bug (in NGC?) with the wrong proxy ip port being passed when creating TCP_client connections.

I checked this locally and everything seems to be doing what it's supposed to be doing. I can't reproduce any of this buggy behaviour locally. Also, I'd wonder how people could be connecting successfully to groupchats using proxies if it were not setting a valid SOCKS5 proxy.

@nurupo
Copy link
Member

nurupo commented Oct 14, 2022

Sorry, looks like the logs had me confused. When they said:

TRAC> network.c#789:loglogdata [05 = <unknown>           ] T=>   3= 85.143.221.42:33445 (0: OK) | 0000000000000000...00
TRAC> network.c#789:loglogdata [05 = <unknown>           ] =>T   2= 85.143.221.42:33445 (0: OK) | 0000000000000000...00
TRAC> network.c#789:loglogdata [05 = <unknown>           ] T=>  10= 85.143.221.42:33445 (0: OK) | 010001558fdd2a82...a5

I thought toxcore was communicating directly with 85.143.221.42:33445, when in fact it was communicating with 127.0.0.1:9050. The 85.143.221.42:33445 that is printed in logs is the address that the TCP connection should *eventually* communicate with, not necessarily the current address it's communicating with if it's using a proxy. Wish the logs would made that clear. So there is no bug in NGC and toxcore doesn't try to connect to DHT nodes as SOCKS5 proxies, it's just the logs that make it appear like that's what is happening. This also explains why we are getting the perfect SOCKS5 server reply back, because we are, in fact, communicating with a SOCKS5 proxy, not toxcore.

I have edited my last comment, crossing out the mistaken parts. The other parts are still relevant though.

@emdee-is
Copy link
Author

emdee-is commented Oct 14, 2022

@nurupo - thanks for you detailed analysis and I'm glad the bug report found some real issues. Just a quick comments that may help:

  1. this is an edge case under pathalogical network conditions. I've rarely/never had SEGVs with toxcore under toxygen, and I've almost never seen these network errors before, even though I have gotten all toxcore TRACE info for months.
  2. toxygen has been stable with NGC for years from the libcore perspective. Lots of little UI issues, but it's the only desktop client with NGC.
  3. Toxygen makes one Tox instance from one Tox_options instance, so the SOCKS info is set only once.
  4. I and others have noticed that NGC connects better and faster than connecting chat to its members (me always over SOCKS). In fact I can be in a state where I can /whisper to someone in the NGC group when I can't connect with them in chat, which always amazes me.
  5. Ideally we would have a SOCKS_TRACE analogue to LOGGER_TRACE that looked up the right SOCKS 5 protocol Type, so that the TRACES are meaningful for SOCKS usage. Otherwise people will end up saying: "these logs confused me so much".

@JFreegman
Copy link
Member

I and others have noticed that NGC connects better and faster than connecting chat to its members (me always over SOCKS). In fact I can be in a state where I can /whisper to someone in the NGC group when I can't connect with them in chat, which always amazes me.

This is because when you connect to a groupchat (that is, you establish a connection with any one peer in a group) you get many redundant opportunities to directly connect to all the peers in the chat through the sync protocol, whereas with peers in your friends list, you're forced to wait for a DHT lookup response.

@nurupo
Copy link
Member

nurupo commented Oct 14, 2022

So, what is happening is only the first three of the the four SOCKS5 negotiation packets are getting exchanged, e.g. the SOCKS5 proxy doesn't reply back to toxcore when toxcore requests the proxy to connect to 85.143.221.42:33445. I wonder why. It should reply back with success/failure at least. It could be that with the Tor proxy being Tor, it is taking a long time to establish a connection to the destination IP:port, and Toxygen crashed before Tor managed to established it and reply back to toxcore with the success/failure. Or perhaps it took Tor over 10 seconds to do so, at which point toxcore timedout and closed the SOCKS5 connection.

@emdee-is
Copy link
Author

@nurupo "SOCKS5 proxy doesn't reply back to toxcore when toxcore requests the proxy" I think that's the right general direction; I've had 2 more crashes today, and the first one kindly gets toxygen's logger off the hook:

                     
Thread 1 "python3.9" received signal SIGSEGV, Segmentation fault.               
0x00007ffff7af74b0 in pthread_mutex_lock () from /lib64/libc.so.6               
(gdb) where                                                                     
#0  0x00007ffff7af74b0 in pthread_mutex_lock () at /lib64/libc.so.6             
#1  0x00007fffed0fa3b0 in tox_lock ()                                           
    at /var/local/src/toxygen/toxygen/libs/libtoxcore.so                        
#2  0x00007fffed0f0137 in tox_bootstrap ()                                      
    at /var/local/src/toxygen/toxygen/libs/libtoxcore.so  

I'll put the full log and analysis up tomorrow, but I just wanted to clarify that these take place in the initial bootstrapping phase of starting up, long before any NGC groups would be accessed by the user.

The next one is almost a replay of the original with it crashing in the logger callback during bootstrapping, but a SIGILL this time:

Thread 13 "ToxIterateThrea" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7ffedc877640 (LWP 173276)]
0x00007ffff6dbc712 in ?? ()
(gdb) where
#0  0x00007ffff6dbc712 in  ()
#1  0x00007fffed0e88e9 in tox_log_handler
    (context=0x555556c30d30, level=LOGGER_LEVEL_TRACE, file=0x7fffed19fb99 "TCP_common.c", line=203, func=0x7fffed19fc50 <__func__.2> "read_TCP_packet", message=0x7ffedc875b70 "recv buffer has 0 bytes, but requested 10 bytes", userdata=0x0) at /var/local/src/c-toxcore/toxcore/tox.c:78
#2  0x00007fffed0be2a1 in logger_write
    (log=0x5555569919b0, level=LOGGER_LEVEL_TRACE, file=0x7fffed19fb99 "TCP_common.c", line=203, func=0x7fffed19fc50 <__func__.2> "read_TCP_packet", format=0x7fffed19fb40 "recv buffer has %d bytes, but requested %d bytes")
    at /var/local/src/c-toxcore/toxcore/logger.c:118
#3  0x00007fffed0e14c8 in read_TCP_packet
    (logger=0x5555569919b0, ns=0x555556c30d50, sock=..., data=0x7ffedc8760c0 "\300\033\310VUU", length=10, ip_port=0x555557263ac8)
    at /var/local/src/c-toxcore/toxcore/TCP_common.c:203
#4  0x00007fffed0deeb5 in proxy_socks5_read_connection_response
    (logger=0x5555569919b0, tcp_conn=0x555557263ab0)
    at /var/local/src/c-toxcore/toxcore/TCP_client.c:265

each after dozens of previous add_tcp_relay attempts. Again I don't suspect the logger because 1) it's just print() statements, and 2) it's printed hundreds of lines by the time we get here. So my feeling is that the SOCKS code is not robust enough to handle these pathalogical cases, or these cases are unusual for the proxy.

There are warnings in the Tor log that the network is having problems:

Guard kik*** is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are ...

The warnings are not that uncommon, but like it says, it could be an attack. There's nothing like it in the rest of my normal web usage with HTTP/POP...

I don't want to beat a dead horse, and I really don't want to waste your time, but 3 crashes in 3 days is unheard of - I haven't had one in months - and it is possible Tox over Tor is targeted. Similar logs of short reads. Although JF, as you reminded me, LOG_TRACE is supposed to generate tonnes of logging, and nurupo you read them as being "normal", this is actually not the case for me - I've rarely seen the logger say anything, although it's been wired up for a couple of months. So I think there is a real problem.

Thanks for your help and I'll put the logs for these 2 up tomorrow.

@emdee-is
Copy link
Author

As an aside to trouble shooting the issue: now that we known that this behaviour exists, should the code, instead of "hammering" at the proxy for a couple of hundred times in 10 seconds, maybe back off as the null reads or writes mount up, and wait longer but with less volume?

@emdee-is
Copy link
Author

(I'll speak of first,second,and third logs/events) This doesn't look right. I'm at the gdb command prompt of the third SEGV - hunting and pecking as I don't really know gdb.

(gdb) up
#3  0x00007fffed0e14c8 in read_TCP_packet (logger=0x5555569919b0, 
    ns=0x555556c30d50, sock=..., data=0x7ffedc8760c0 "\300\033\310VUU", 
    length=10, ip_port=0x555557263ac8)
    at /var/local/src/c-toxcore/toxcore/TCP_common.c:203
203             LOGGER_TRACE(logger, "recv buffer has %d bytes, but requested %d bytes", count, length);                                                        
(gdb) info locals
count = 0
__func__ = "read_TCP_packet"
len = 33576277
(gdb) 

Does that mean we tried to read a 33M packet, or is len not meaningful?

@emdee-is
Copy link
Author

emdee-is commented Oct 17, 2022

directly connect to all the peers in the chat through the sync protocol, whereas with peers in your friends list, you're forced to wait for a DHT lookup response.

@JFreegman I was in a NGC at the time: I could whisper to you and @Green-Sky but not chat to him or Tha - their chat was not coming online, but I could /whisper (or the toxygen equivalent) back and forth to them. (Did you get the URL I sent you in /whisper? toxygen is a little wierd in how it shows /whisper receipts.)

@nurupo So the DHT can be blocked when the NGC sync protocol is resilient?

@Green-Sky
Copy link
Member

Green-Sky commented Oct 17, 2022

I'll put the full log and analysis up tomorrow, but I just wanted to clarify that these take place in the initial bootstrapping phase of starting up

are you sure the save is valid? i had the same kind of issues when i fed a garbage save to toxcore.

edit: to be specific, a garbage save pointer that was not null.

@Green-Sky
Copy link
Member

also, do me a favor and compile and run it with ASAN (-fsanitize=address) and without gdb. (asan does not work in gdb)

@emdee-is
Copy link
Author

Maybe - remember I don't know C - running a Python app under gdb was a huge step for me and I'm still exploring a system that when you ask for help on the commands gives you >200 lines of "help" -:)

It would be useful if I knew what ASAN is, what it does, and where to get it. I asked gentoo if it could install if for me and it offered:

Description: Traditional game of Brunei

@Green-Sky Does that len = 33576277 for a TCP packet look right to you?

@emdee-is
Copy link
Author

emdee-is commented Oct 17, 2022

@Green-Sky

are you sure the save is valid?

I have no doubts about the savefile, and I can check them section by section using my tox_profile utility. There's crap in there like dead and duplicated nodes in the DHT section #2347 #2343 but these SEGVs happen long after startup and always during bootstrap. By guess is someone has found a way of using guard or exit nodes to be malicious and it affects the DHT and not NGC - remember when I could /whisper to you but not chat with you?

Bear in mind that Tox over Tor is a whole different ballgame than Tox over UDP, and is only lightly tested: not at all really - just simple SOCKS.

@Green-Sky
Copy link
Member

@Green-Sky

The tcp code right now does not timeout established connections, that don' receive data.

I don't know the code and it seems to me reading this thread that @nurupo is saying otherwise.

Could I ask you to just comment briefly to either retract or clarify so I can understand better? I'm not finger pointing, just unclear, and it might be relevant to resource exhaustion, DOS etc.

@nurupo has done an excellent analysis here. not much to add except that i remembered it wrong :)

@Green-Sky
Copy link
Member

Green-Sky commented Oct 17, 2022

Maybe - remember I don't know C - running a Python app under gdb was a huge step for me and I'm still exploring a system that when you ask for help on the commands gives you >200 lines of "help" -:)

It would be useful if I knew what ASAN is, what it does, and where to get it. I asked gentoo if it could install if for me and it offered:

Description: Traditional game of Brunei

@Green-Sky Does that len = 33576277 for a TCP packet look right to you?

ASAN or libasan or google address sanitizer is a library that intercepts calls to memory, tracks allocation etc. it can not be run with gdb though. You will have to recompile toxcore with it enabled. seems the toxcore cmake is mising an option for this, so you will have to add a link_libraries(-fsanitize=address) anywhere in the cmake to force it to link with it.
Additionally, since its a library and the host process needs to be hooked, you will need to preload libasan into python or something.
LD_PRELOAD=/lib/x86_64-linux-gnu/libasan.so.5 python stuff.py or similar on my machine.

edit: asan will tell you the place allocations happen, accessviolations doublefrees, (exclusive!!) use-after-frees etc

@emdee-is
Copy link
Author

I think I'll leave that for much later - my C and Cmake skills are primitive.
I'm still learning GDB...

Maybe write that up with instructions in INSTALL.md and CMakeLists.txt so it's easier to try to do for C noobs like me. Right now I think the best thing to do is to run down #2332 in case it's related as it's reproducible and a blocker.

(Thanks for the "remembered it wrong" - I was just confused)

(I don't think len = 33576277 is important; length=10 was the relevant one.)

@emdee-is
Copy link
Author

Here;s one of the SEGVs I had recently:
https://bin.nixnet.services/?37e882d46f463f68#57Fq2XvAJdJW4QyMjcsUaMfYi84BdrZ5TGhQ1MwWVmQz

It starts:

Thread 1 "python3.9" received signal SIGSEGV, Segmentation fault.
0x00007fffe3f2f4a9 in m_friend_exists (m=0x7ffff7ddc210, friendnumber=4)
    at /var/local/src/c-toxcore/toxcore/Messenger.c:534
534         return (unsigned int)friendnumber < m->numfriends && m->friendlist[f
riendnumber].status != 0;
(gdb) where
#0  0x00007fffe3f2f4a9 in m_friend_exists (m=0x7ffff7ddc210, friendnumber=4)
    at /var/local/src/c-toxcore/toxcore/Messenger.c:534
#1  0x00007fffe3f2e3dd in get_real_pk

I poked around up the stack and down and found this:

(gdb) up
#1  0x00007fffe3f2e3dd in get_real_pk (m=0x7ffff7ddc210, friendnumber=4, 
    real_pk=0x7ffff72f6e70 "") at /var/local/src/c-toxcore/toxcore/Messenger.c:88
88          if (!m_friend_exists(m, friendnumber)) {
(gdb) info locals
No locals.
(gdb) print friendnumber
$1 = 4
(gdb) print real_pk
$2 = (uint8_t *) 0x7ffff72f6e70 ""
(gdb) 

(gdb) down
#0  0x00007fffe3f2f4a9 in m_friend_exists (m=0x7ffff7ddc210, friendnumber=4)
    at /var/local/src/c-toxcore/toxcore/Messenger.c:534
534         return (unsigned int)friendnumber < m->numfriends && m->friendlist[friendnumber].status != 0;
(gdb) print m->numfriends
$3 = 4158503296

I have 4158503296 friends. I;m popular!

Does this mean I have a corrupt friends list.

Recall earlier that I asked about:

TRAC> Messenger.c#2709:do_messenger Friend num in DHT 2 != friend num in msger 3

which sounded like a warning to me.

Could that be related - it's in most or all of the SEGV logs.

@iphydf iphydf added this to the v0.2.x milestone Nov 13, 2023
@emdee-is
Copy link
Author

closing this as old, and likely memory corruption that was my fault.

@iphydf iphydf modified the milestones: v0.2.x, v0.2.19 Feb 13, 2024
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants