Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Too many open files' on Android 6.0-8.1 #2796

Closed
imReker opened this issue Sep 3, 2021 · 9 comments
Closed

'Too many open files' on Android 6.0-8.1 #2796

imReker opened this issue Sep 3, 2021 · 9 comments
Labels

Comments

@imReker
Copy link
Contributor

imReker commented Sep 3, 2021

LocalDnsWorker.accept will throw Broken pipe when UDP is filtered or network is disconnected.
And then, if DNS query continue incoming, the unix socks handle of VpnService process will exceeds handle limit which is 1024 (32768 on Android 9.0 and newer), so finally VpnService process will get exception Too many opened files and Bad file descriptor everywhere.
Meanwhile, because Java side UDP DNS query is timeout, sslocal will send TCP DNS query with 'java protected' socket, which create same amount of socket handles in sslocal. (Why sslocal makes a TCP query again?)
As a result, both VpnService and sslocal crash at random time.

Logs:
org.shadowsocks.xx_issue_19bc73ad993aad4d5fe278892d584231_error_session_61279182004D00013C85A04AC568A81B_DNE_5_v2.log
org.shadowsocks.xx_issue_274bc2d242720049275714683d3d4cc5_error_session_6127B31401C900010D2CF9C39D05D8E2_DNE_0_v2.log
org.shadowsocks.xx_issue_2d5e1ddcbf72ff6f25953b540bd48ff5_error_session_6127C4D9026F00011AC0A04AC568A81B_DNE_0_v2.log

@imReker imReker added the bug label Sep 3, 2021
@imReker
Copy link
Contributor Author

imReker commented Sep 3, 2021

Dumped File descriptor info, a DNS query from APP makes shadowsocks create at least 2 handle in VpnService, first one is local_dns_path of UDP query, and second is protect_path of TCP query.

fd list size = 928

fd list- 1bf: SOCK: socket:[23488393] UNIX / -- /
fd list- 1c0: SOCK: socket:[23472492] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1c2: SOCK: socket:[23490568] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/protect_path -- /dev/socket/dnsproxyd
fd list- 1c3: SOCK: socket:[23478133] UNIX / -- /
fd list- 1c4: SOCK: socket:[23466596] UNIX / -- /
fd list- 1c6: SOCK: socket:[23478364] UNIX / -- /
fd list- 1c7: SOCK: socket:[23466600] UNIX / -- /
fd list- 1c8: SOCK: socket:[23468862] UNIX / -- /
fd list- 1c9: SOCK: socket:[23490569] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/protect_path -- /dev/socket/dnsproxyd
fd list- 1ca: SOCK: socket:[23466603] UNIX / -- /
fd list- 1cb: SOCK: socket:[23478366] UNIX / -- /
fd list- 1cc: SOCK: socket:[23488399] UNIX / -- /
fd list- 1ce: SOCK: socket:[23466607] UNIX / -- /
fd list- 1cf: SOCK: socket:[23484823] UNIX / -- /
fd list- 1d1: SOCK: socket:[23476731] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1d2: SOCK: socket:[23490570] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/protect_path -- /dev/socket/dnsproxyd
fd list- 1d3: SOCK: socket:[23467117] UNIX / -- /
fd list- 1d5: SOCK: socket:[23476736] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1d6: SOCK: socket:[23488418] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/protect_path -- /dev/socket/dnsproxyd
fd list- 1d7: SOCK: socket:[23480411] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1d8: SOCK: socket:[23479978] UNIX / -- /
fd list- 1d9: SOCK: socket:[23461630] UNIX / -- /
fd list- 1da: SOCK: socket:[23467746] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1db: SOCK: socket:[23479988] UNIX / -- /
fd list- 1dc: SOCK: socket:[23467121] UNIX / -- /
fd list- 1dd: SOCK: socket:[23467126] UNIX / -- /
fd list- 1de: SOCK: socket:[23467748] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1df: SOCK: socket:[23480413] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1e0: SOCK: socket:[23466611] UNIX / -- /
fd list- 1e1: SOCK: socket:[23479119] UNIX / -- /
fd list- 1e2: SOCK: socket:[23480797] UNIX / -- /
fd list- 1e3: SOCK: socket:[23478626] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1e4: SOCK: socket:[23480440] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1e5: SOCK: socket:[23466617] UNIX / -- /
fd list- 1e6: SOCK: socket:[23477117] UNIX / -- /
fd list- 1e7: SOCK: socket:[23432212] UNIX / -- /
fd list- 1e8: SOCK: socket:[23480445] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
fd list- 1e9: SOCK: socket:[23480447] UNIX /data/user_de/0/org.shadowsocks.xx/no_backup/local_dns_path -- /dev/socket/dnsproxyd
.................

@Mygod
Copy link
Contributor

Mygod commented Sep 5, 2021

These seem normal. These fds are not closed? Does your server connection work properly?

@imReker
Copy link
Contributor Author

imReker commented Sep 6, 2021

These seem normal. These fds are not closed? Does your server connection work properly?

Most of them will be closed, or crash because of Too many open files.
In some terrible network environment, UDP packet loss rate can be very high, and this issue would be triggered.

Key point of this issue is ifferent timeout in Java and Rust side:
LocalDnsWorker use Java's getAllByName, it's timeout is defined by system, usually 90 seconds. But in Rust side, the timeout is 5 seconds.
So, when the network is very slow or UDP filtered, a DNS query send to Java side, it will wait for system I/O until 90s timeout, but Rust side will fail in 5s and return the fail result to App who made the DNS request. And App will then request again, if App retry without any interval and count limit, LocalDnsWorker.accept will create thousands socket FD in this 90 seconds.

But I still don't know why socket of protect_path is leaked too.

@Mygod
Copy link
Contributor

Mygod commented Sep 6, 2021

It is technically not a leak if they are eventually closed? Although I am down to tweak timeouts. Where did you find the 90s timeout?

@imReker
Copy link
Contributor Author

imReker commented Sep 6, 2021

90s timeout is an experience value by the log, it's not accurate.
Though, it's not a traditional 'leak', but I think it is still an issue because of the different timeout and the thousands DNS retries it caused. Maybe 'deny of service' is more accurate?
Currently, to solve this issue, I set a counter in LocalDnsWorker.accept, when pending DNS queries over 200, the accept just return an empty response to sslocal (this limit could be done in sslocal either).
I think correct method to fix this issue is replace getAllByName by dnsjava, which can set a timeout on query. But we need modify it and makes caller can set a Network for it to create socket.

@Mygod
Copy link
Contributor

Mygod commented Sep 7, 2021

Sounds good. I will take a look sometime.

Does this issue go away if you use the "All" Route?

@imReker
Copy link
Contributor Author

imReker commented Sep 7, 2021

Currently I use ACL with Bypass Lan. I think this issue doesn't exists in 'All' route case since DNS query will not be passed to Java side (so no extra FDs created) and it has 5s timeout.


And, maybe unix socket connection reuse ( ref #2751 ) is still needed? Because rust will make 2-3 DNS queries for 1 connection, there still has very little chance to create over 1000 FDs before the 5s timeout.

@imReker
Copy link
Contributor Author

imReker commented Sep 9, 2021

@Mygod
I modified a little code of dnsjava(mainly Network related works and Java8/Android adaptation) and it works!
Only downside is dnsjava 3.4.1 doesn't support Android 6.x because of Java NIO. (Old version support Android 6.x but it use blocking socket, so may still result same issue)
I'll perform a stress test again tomorrow.

@shadowsocks shadowsocks deleted a comment from yndue736 May 31, 2022
@Mygod
Copy link
Contributor

Mygod commented Dec 20, 2022

Closing as Android versions too old.

@Mygod Mygod closed this as not planned Won't fix, can't repro, duplicate, stale Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants