Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curvefs getfattr fail and retry 1000000 times #1427

Closed
cw123 opened this issue May 12, 2022 · 6 comments
Closed

curvefs getfattr fail and retry 1000000 times #1427

cw123 opened this issue May 12, 2022 · 6 comments
Assignees
Labels
bug Something isn't working high high priority need test Completion of development, requires QA verification

Comments

@cw123
Copy link
Contributor

cw123 commented May 12, 2022

Describe the bug (描述bug)
在curvefs的挂载目录下执行getfattr花了23min之后失败。日志里面有重试100万次,然后失败。失败的截图及client相关日志见截图。
这个集群,已经有2000多万文件,规模比较大。是否和规模有关。

It failed after 23 minutes to execute getfattr in the mount directory of curvefs. There are 1 million retries in the log, and then it fails. See screenshots for failed screenshots and client-related logs.
This cluster already has more than 20 million files, and the scale is relatively large. Is it related to size.

To Reproduce (复现方法)

Expected behavior (期望行为)

Versions (各种版本)
OS: linux
Compiler:
branch: master
commit id: 5c985ff

Additional context/screenshots (更多上下文/截图)
image
image

@cw123 cw123 added the bug Something isn't working label May 12, 2022
@SeanHai
Copy link
Contributor

SeanHai commented May 12, 2022

尝试复现:再次跑这个命令getfattr -n curve.dir.rfbytes -d ILSVRC2012_img_val_dir
1、刚开始一直超时8s:http://10.166.16.62:7801/rpcz 看不到BatchGetInodeAttr请求
截屏2022-05-12 20 47 42

2、一段时间后,response返回后未携带inodeAttr;http://10.166.16.62:7801/rpcz 可以看到BatchGetInodeAttr请求
截屏2022-05-12 20 50 53

@SeanHai
Copy link
Contributor

SeanHai commented May 13, 2022

接下来可测:
1、将该文件系统再挂载一个新的挂载点,再次测试看下结果。
2、如果1可复现,可在对应clent版本上加点日志,看下每次请求是否携带inodeId(首次发送肯定携带了,不然不会发送rpc,主要加在rpc重试过程,看下是否在重试过程中request发生变化)。
3、如果2可看到每次都携带了inodeId,则需要在metaserver端加日志,升级服务端再测试。

@baijiaruo baijiaruo added the high high priority label May 16, 2022
@SeanHai
Copy link
Contributor

SeanHai commented May 16, 2022

The batchGetInodeAttr limit is 100, and raft apply cost about 2s.
The performance of rocksdb with a large data need to be test.
截屏2022-05-16 20 05 51

getInode cost at metaserver:
截屏2022-05-16 20 46 45

getDentry cost at metaserver:
截屏2022-05-16 20 46 26

@SeanHai
Copy link
Contributor

SeanHai commented May 17, 2022

截屏2022-05-17 10 38 22
截屏2022-05-17 10 38 36

@SeanHai
Copy link
Contributor

SeanHai commented May 20, 2022

The timeout is solved when set storage.rocksdb.block_cache_capacity to a larger value.
截屏2022-05-20 11 27 12

@SeanHai
Copy link
Contributor

SeanHai commented May 20, 2022

The rpc response is empty is a bug and detail info can see in pr.

@SeanHai SeanHai added the need test Completion of development, requires QA verification label May 23, 2022
@Wine93 Wine93 closed this as completed Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high high priority need test Completion of development, requires QA verification
Projects
None yet
Development

No branches or pull requests

4 participants