Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(net): Fix TCP Unresponsiveness and Inability to Close Connections #791

Merged
merged 13 commits into from
May 11, 2024

Conversation

Samuka007
Copy link
Member

Fix several issues that can cause TCP to get stuck on the accept state and fail to send a complete close (FIN) message.

Changes:

  • Change SocketHandleItem::wait_queue to Arc, so that the listening smoltcp socket in listening posix socket could share a common wait queue and won't lose its wake on accept.
  • Change the tcp socket close logic, remove the socket from socket set after poll ifce instead of before.

Todo:

  • The behavior removing socket immediatly after close it may be incorrect. The close operation actually just close the transmit half of the full-duplex connection, but still could recv from remote.
  • In the listening posix tcp socket TcpSocket , the handles list might have a problem. The socket behind each handle could time out and close before the accept function is called. This could cause a potential unresponsive issue.

@dragonosbot
Copy link

@Samuka007: no appropriate reviewer found, use r? to override

@dragonosbot dragonosbot added the S-等待审查 Status: 等待assignee以及相关方的审查。 label Apr 29, 2024
@github-actions github-actions bot added the Bug fix A bug is fixed in this pull request label Apr 29, 2024
@Samuka007
Copy link
Member Author

#743 对于不阻塞read的访问(如来自curl-8.7.1与Firefox-125.0.2的,一次访问仅建立一个连接的访问)已经能正常高效完成多次连接。对于Edge等chrome内核浏览器的访问,多个TCP连接会一直处于Establish状态,而测试程序http_server未做read阻塞超时处理(且单线程),会导致除了拥有该连接的应用,都无法再与server建立连接,属正常现象。

@fslongjin fslongjin requested a review from GnoCiYeH April 29, 2024 11:53
Comment on lines 888 to 895
if let Some(Endpoint::Ip(Some(ip))) = self.endpoint() {
PORT_MANAGER.unbind_port(self.metadata.socket_type, ip.port)?; // NOTICE
PORT_MANAGER.bind_port(
self.metadata.socket_type,
ip.port,
*(sock_ret.clone()),
)?;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉PORT_MANAGER的入参有问题,因为它存储的是Arc<dyn Socket>,而arc是在bind的时候new的。这个应该是有问题的。我感觉是bind的时候去把socket跟端口绑定才对,这里不应该更新绑定。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该是绑定PID号而不是绑定某个socket的引用?

@fslongjin
Copy link
Member

@dragonosbot author 我改了一下,目前仍需修改PORT_MANAGER相关的地方。

@dragonosbot dragonosbot added S-等待作者修改 Status: 这正在等待作者的一些操作(例如代码更改或更多信息)。 and removed S-等待审查 Status: 等待assignee以及相关方的审查。 labels Apr 29, 2024
@fslongjin fslongjin merged commit 37cef00 into DragonOS-Community:master May 11, 2024
7 checks passed
Chiichen added a commit that referenced this pull request Sep 2, 2024
* docs(sched):调度子系统文档即cfs文档 (#807)

* 调度子系统文档以及cfs文档

* fix(net): Fix TCP Unresponsiveness and Inability to Close Connections (#791)

* fix(net): Improve stability. 为RawSocket与UdpSocket实现close时调用close方法,符合smoltcp的行为。为SocketInode实现drop,保证程序任何情况下退出时都能正确close对应socket, 释放被占用的端口。

* fix(net): Correct socket close behavior.

* fix: disable mm debug log to prevent system lockup due to thingbuf issue (#808)

* 添加支持gentoo系统的一键安装脚本 (#809)

* feat(driver/pci): add pci bus into sysfs (#792)

把pci设备加入sysfs

* doc: Add Gentoo Linux In build_system.md (#810)

* 增加安装文档中的Gentoo Linux提示

* doc: add v0.1.10 changelog (#813)

* 完成v0.1.10 changelog

* fix(driver/apic_timer): 修复local apic timer初始化顺序导致的在某些云服务器上无法收到中断的bug (#815)

* chore: move setup_arch_post timepoint to before clocksource_boot_finish (#820)

This commit adjusts the timing of the setup_arch_post event to occur before the clocksource_boot_finish event, allowing the time subsystem to properly register architecture-specific clock sources.

* feat(log): 将内核日志统一为新的logger (#814)

* fix(log): 修复pr #814 的问题 (#821)

* feat(driver/pci): 完善pci root结构体,增加portio的pci配置空间访问 (#818)

* feat(driver/pci): 完善pci root结构体,增加portio的pci配置空间访问

* 增加rust sparse稀疏索引选项 (#826)

* fix(time):修复了issue #816 (#830)

* chore(tools): add the gentoo grub_auto_install support (#827)

* 20240524 3:40

* 20240527 0010

* 修复mmap未延迟分配内存的问题

* feat(procfs): update procfs (#831)

为procfs增加是否是kthread的显示
增加返回进程已经占用的文件描述符数量

* Revert "Merge branch 'patch-add-file-mapping' into patch-fix-mmap"

This reverts commit 8eb687c, reversing
changes made to 33e9f0b.

* 20240528 1800

* Revert "Revert "Merge branch 'patch-add-file-mapping' into patch-fix-mmap""

This reverts commit 9261cb7.

* feat(mm): 修复mmap未延迟分配内存的问题 (#837)

* 20240524 3:40

* 20240527 0010

* 修复mmap未延迟分配内存的问题

* Revert "Merge branch 'patch-add-file-mapping' into patch-fix-mmap"

This reverts commit 8eb687c, reversing
changes made to 33e9f0b.

* update-20240529-0347

* fix(driver): fix memory security problem in tty device ioctl (#833)

* add soft link to musl-gcc

* fix the tty_ioctl

* modified

* modified

* update 20240604 0233

* feat(user): user management tool (#825)

* 用户管理工具

* 重构

* 改为多个bin文件入口

* bin文件的usage显示自身程序名而非固定程序名

* update 20240606 1800

* update 20240607 0200

* update 20240617 1747

* 重写页面保护标志的构造逻辑

* update20240620 1726

* 添加Riscv64的protection_map

* 简单实现fat文件系统的文件映射,添加msync系统调用

* trait FileSystem增加统一接口

* MountFS实现文件映射相关接口

* 格式化代码

* feat(time): Add syscall support for utime* (#838)

* feat(vfs): Add syscall support for utime*

impl sys_utimensat
impl sys_utimes
add utimensat test
fix some warning

* fix(vfs): Verify pointer validity

* fix: remove bad cfg

* pagecache存储方式由HashMap改为XArray

* 修复mprotect系统调用未正确设置vm_flags的错误 (#847)

* fix(time): modify update wall time (#836)

更改了时间子系统的update_wall_time函数,通过读取当前周期数,计算delta值进行更新,而不是通过传入delta值进行更新

* chore: 调整triagebot.toml以适应新的组织架构 (#848)

* doc: 完善README.md (#849)

* doc: 完善README.md

* chore: 更新sphinx相关配置,适应read the docs的更新 (#850)

根据read the docs在7月15日blog,进行此修改

https://about.readthedocs.com/blog/2024/07/addons-by-default/

* feat(driver/net): 实现Loopback网卡接口 (#845)

* 初步实现loopback设备

* fix: build-scripts和tools目录下的make check指定工具链版本 (#861)

* fix: tcp poll没有正确处理posix socket的listen状态的问题 (#859)

* 使用读写锁包装Page结构体

* chore: 将工具链更新到2024-07-23 (#864)

* chore: 将工具链更新到2024-07-23

* PageCache由存放物理地址改为直接存放页面

* 优化protection_map的初始化方式

* feat(fs): add eventfd syscall support (#858)

* feat(fs): add eventfd syscall support

* refactor: 删除过时的va-pa转换函数,改为统一使用MMArch (#862)

* 添加shrink_list方法释放页面

* 默认nightly-2024-07-23 & config改为config.toml (#872)

* fix: 修复由于升级到2024-07-23工具链之后,某些机器上面内核运行一直fault的问题。 (#870)

* fix: 修复由于升级到2024-07-23工具链之后,某些机器上面内核运行一直fault的问题。

* 添加页面回收机制

* 添加页面回收内核线程

* feat(cred): 初步实现Cred (#846)

* 初步实现Cred

* 添加seteuid和setegid

* 添加cred测试程序

* 修改Cred::fscmp返回结果为CredFsCmp枚举

* 完善root用户相关信息

* fix: 修复键盘码解析器没能正确处理类似ctrl C的控制字符的问题 (#877)

* fix: 修复键盘码解析器没能正确处理类似ctrl C的控制字符的问题

* fix: 解决ntty潜在的panic问题

* 缺页中断使用的锁修改为irq_save; 添加脏页回写机制

* 优化代码结构,添加部分注释

* ci: enable ci workflow on branches other than master (#891)

* 修复unlink、unlinkat系统调用的路径错误 (#892)

* fix: socket shutdown wrong implement (#893)

* feat: 增加tokio异步运行时支持 (#894)

* fix the EventFdFlags error

* feat: support tokio (Single thread version)

Fix deadlock issue on closing file.
Add function for PipeInode and EventFdInode.

* 优化PageCache的创建

* fix: pipe 读取/写入阻塞时,无法kill进程的问题 (#889)

* 将入口点改为链接器;修正链接器加载地址

* 修复合并错误

* 修复do_cow_page死锁问题

* 将PageFaultMessage中的地址对齐

* auxv添加随机数指针;修复AtType序号错误

* 简单实现用户栈的16字节对齐

* 通过check fmt

* 完善用户栈的字节对齐机制

* 通过riscv64编译

* 修改测试程序路径

* 添加动态库libgcc_s.so.1

---------

Co-authored-by: GnoCiYeH <[email protected]>
Co-authored-by: Samuel Dai <[email protected]>
Co-authored-by: LoGin <[email protected]>
Co-authored-by: donjuanplatinum <[email protected]>
Co-authored-by: 曾俊 <[email protected]>
Co-authored-by: Mingtao Huang <[email protected]>
Co-authored-by: BrahmaMantra <[email protected]>
Co-authored-by: laokengwt <[email protected]>
Co-authored-by: Jomo <[email protected]>
Co-authored-by: linfeng <[email protected]>
Co-authored-by: SMALLC <[email protected]>
Co-authored-by: linfeng <[email protected]>
Co-authored-by: Chiichen <[email protected]>
Co-authored-by: Samuel Dai <[email protected]>
@Samuka007 Samuka007 deleted the fix-bug-tcp branch November 7, 2024 06:51
BrahmaMantra pushed a commit to BrahmaMantra/DragonOS that referenced this pull request Dec 9, 2024
…DragonOS-Community#791)

* fix(net): Improve stability. 为RawSocket与UdpSocket实现close时调用close方法,符合smoltcp的行为。为SocketInode实现drop,保证程序任何情况下退出时都能正确close对应socket, 释放被占用的端口。

* fix(net): Correct socket close behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug fix A bug is fixed in this pull request S-等待作者修改 Status: 这正在等待作者的一些操作(例如代码更改或更多信息)。
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants