Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curvefs client : fix bug of getleader always fails causes stack overflow #1070

Merged
merged 1 commit into from
Feb 17, 2022

Conversation

xu-chaojie
Copy link
Member

@xu-chaojie xu-chaojie commented Feb 10, 2022

What problem does this PR solve?

Issue Number: close #1067

Problem Summary:

What is changed and how it works?

What's Changed:

How it Works:

Side effects(Breaking backward compatibility? Performance regression?):

Check List

  • Relevant documentation/comments is changed or added
  • I acknowledge that all my contributions will be made under the project's license

@xu-chaojie
Copy link
Member Author

recheck

excutor->DoAsyncRPCTask(taskDone);
TaskExecutorDone *taskDone = new TaskExecutorDone(
excutor, done);
done_guard.release();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems this done_guard is useless

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

excutor->DoAsyncRPCTask(taskDone);
TaskExecutorDone *taskDone = new TaskExecutorDone(
excutor, done);
done_guard.release();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ExcuteTask(channel.get(), done);
done_guard.release();
return;
return retCode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retCode is always -1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

bthread_usleep(opt_.retryIntervalUS);
continue;
}
ExcuteTask(channel.get(), done);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExcuteTask has a return value, but you didn't check it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

excutor, done);
done_guard.release();
brpc::ClosureGuard taskDone_guard(taskDone);
int ret = excutor->DoAsyncRPCTask(taskDone);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DoAsyncRPCTask will block forever if the target copyset doesn't have a valid leader, and in this case, caller's thread is also blocked, which means it's not actually asynchronous.
is this you expected? or make the whole process asynchronous

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if getleader failed, which means the cluster is abnomal, it is reasonable to stuck all the async task here.

@xu-chaojie
Copy link
Member Author

recheck

@xu-chaojie
Copy link
Member Author

recheck

1 similar comment
@xu-chaojie
Copy link
Member Author

recheck

@@ -586,8 +586,15 @@ void MetaServerClientImpl::UpdateInodeAsync(const Inode &inode,
MetaServerOpType::UpdateInode, task, inode.fsid(), inode.inodeid());
auto excutor = std::make_shared<UpdateInodeExcutor>(opt_,
metaCache_, channelManager_, taskCtx);
TaskExecutorDone *taskDone = new TaskExecutorDone(excutor, done);
excutor->DoAsyncRPCTask(taskDone);
TaskExecutorDone *taskDone = new TaskExecutorDone(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please state the reason for stack overflow in issue or commit message.

Copy link
Member Author

@xu-chaojie xu-chaojie Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When getleader always fails, the function 'TaskExcutor::DoAsyncRpcTask' and the function 'TaskExcurorDone:Run' will call each other cyclically, resulting in stack overflow

@xu-chaojie xu-chaojie changed the title curvefs client : fix bug of getleader always fails causes stack overflow curvefs client : fix bug of getleader always fails causes stack overflow, Feb 17, 2022
@xu-chaojie xu-chaojie changed the title curvefs client : fix bug of getleader always fails causes stack overflow, curvefs client : fix bug of getleader always fails causes stack overflow,:When getleader always fails, the function 'TaskExcutor::DoAsyncRpcTask' and the function 'TaskExcurorDone:Run' will call each other cyclically, resulting in stack overflow Feb 17, 2022
@xu-chaojie xu-chaojie changed the title curvefs client : fix bug of getleader always fails causes stack overflow,:When getleader always fails, the function 'TaskExcutor::DoAsyncRpcTask' and the function 'TaskExcurorDone:Run' will call each other cyclically, resulting in stack overflow curvefs client : fix bug of getleader always fails causes stack overflow,When getleader always fails, the function 'TaskExcutor::DoAsyncRpcTask' and the function 'TaskExcurorDone:Run' will call each other cyclically, resulting in stack overflow Feb 17, 2022
@ilixiaocui ilixiaocui changed the title curvefs client : fix bug of getleader always fails causes stack overflow,When getleader always fails, the function 'TaskExcutor::DoAsyncRpcTask' and the function 'TaskExcurorDone:Run' will call each other cyclically, resulting in stack overflow curvefs client : fix bug of getleader always fails causes stack overflow Feb 17, 2022
@@ -303,7 +304,11 @@ void TaskExecutorDone::Run() {
needRetry = excutor_->OnReturn(code_);
if (needRetry) {
excutor_->PreProcessBeforeRetry(code_);
excutor_->DoAsyncRPCTask(this);
code_ = excutor_->DoAsyncRPCTask(this);
if (code_ < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncTask always return MetaStatusCode::OK

return MetaStatusCode::OK;

return MetaStatusCode::OK;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncTask will return -1, when retryTimes runing out

@xu-chaojie xu-chaojie merged commit 8854d9f into opencurve:master Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

curve-fuse getleader always fails causes stack overflow
3 participants