Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UBSAN error in AlterTableWithConcurrentTxnTest.TServerLeaderChange #12692

Closed
mbautin opened this issue May 27, 2022 · 1 comment
Closed

UBSAN error in AlterTableWithConcurrentTxnTest.TServerLeaderChange #12692

mbautin opened this issue May 27, 2022 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@mbautin
Copy link
Contributor

mbautin commented May 27, 2022

Jira Link: DB-584

Description

https://gist.githubusercontent.com/mbautin/f861305bf8ab284b2918c3cbc3e19c54/raw

@mbautin mbautin added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels May 27, 2022
@mbautin mbautin self-assigned this May 27, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels May 27, 2022
@mbautin
Copy link
Contributor Author

mbautin commented May 27, 2022

Other affected tests:

@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label May 31, 2022
@d-uspenskiy d-uspenskiy self-assigned this Jun 3, 2022
d-uspenskiy added a commit that referenced this issue Jun 3, 2022
…utdown

Summary:
The diff consists of 3 parts.

**Part #1:** Multiple call of the `PerformFuture::Get` method
In case of session close when temp tables exists the following stack is possible

```
std::__1::future<yb::pggate::PerformResult>::get()
yb::pggate::PerformFuture::Get()                   // Second call of the PerformFuture::Get on same object
yb::pggate::PgDocResponse::Get()
yb::pggate::PgDocOp::~PgDocOp()
yb::pggate::PgDocReadOp::~PgDocReadOp()
...
yb::pggate::PgDml::~PgDml()
yb::pggate::PgDmlRead::~PgDmlRead()
yb::pggate::PgSelect::~PgSelect()
...
yb::pggate::PgMemctx::Clear()
yb::pggate::PgMemctx::~PgMemctx()
...
yb::pggate::ClearGlobalPgMemctxMap()
YBCDestroyPgGate
YBOnPostgresBackendShutdown
quickdie                                           // Caused by interruption by the SIGQUIT signal
...
std::__1::future<yb::pggate::PerformResult>::get()
yb::pggate::PerformFuture::Get()                   // First call of the PerformFuture::Get
yb::pggate::PgDocResponse::Get()
yb::pggate::PgDocOp::GetResult(std::__1::list<yb::pggate::PgDocResult, std::__1::allocator<yb::pggate::PgDocResult> >*)
yb::pggate::PgDml::FetchDataFromServer()
yb::pggate::PgDml::Fetch(int, unsigned long*, bool*,
yb::pggate::PgApiImpl::DmlFetch(yb::pggate::PgStatement*, int, unsigned long*,
YBCPgDmlFetch
ybcFetchNextHeapTuple
ybc_getnext_heaptuple
ybc_systable_getnext
systable_getnext
findDependentObjects
performDeletion
RemoveTempRelations
RemoveTempRelationsCallback
shmem_exit
proc_exit_prepare
proc_exit
PostgresMain
BackendRun
BackendStartup
ServerLoop
PostmasterMain
PostgresServerProcessMain
main
```

In this stack the `PerformFuture::Get` method is called on same object twice. But the `PerformFuture` object is designed to call `Get` method only once. After the first call the `PerformFuture::Valid` method should return `false` and `Get` method should not be called. But in the above stack the second call of the `PerformFuture::Get`method is done before the first one is returned (due to interruption by the `SIGQUIT` signal).

To handle this situation `session_` field is set to `null` at the very beginning of the `PerformFuture::Get`. In case `session_` is `null` the `PerformFuture::Valid` returns `false`.

**Part #2:** Access deleted `PgApiImpl` object
The `YBCDestroyPgGate` function destroys the `PgApiImpl` object and then calls the `ClearGlobalPgMemctxMap` function. But this function may call the code which access the `PgApiImpl` object fields. Here is the stack trace

```
std::__1::unique_ptr<yb::pggate::PgClient::Impl, std::__1::default_delete<yb::pggate::PgClient::Impl> >::operator->() const
yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>)
   yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::AbortTransaction()
yb::pggate::PgTxnManager::~PgTxnManager()
...
yb::pggate::PgSession::~PgSession()
...
yb::pggate::PgStatement::~PgStatement()
yb::pggate::PgDml::~PgDml()
yb::pggate::PgDmlRead::~PgDmlRead()
yb::pggate::PgSelect::~PgSelect()
...
yb::pggate::PgMemctx::Clear()
yb::pggate::PgMemctx::~PgMemctx()
...
yb::pggate::ClearGlobalPgMemctxMap()
...
```

Solution is to move PgMemctx map into the `PgApiImpl` object (this also fixes #7216)

**Part #3:** Access shut down `PgClient`object

The `~PgApiImpl()` explicitly shut downs `pg_client_` object but the `pg_session_` is still alive. And when it will be destroyed the pg_client_ object may be used.

The stack is

```
yb::pggate::PgClient::Impl::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>)
yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>)
yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::AbortTransaction()
yb::pggate::PgTxnManager::~PgTxnManager()
...
yb::pggate::PgSession::~PgSession()
...
yb::pggate::PgApiImpl::~PgApiImpl()
...
YBCDestroyPgGate
YBOnPostgresBackendShutdown
quickdie
...
```

Solution is to explicitly destroy `pg_session_` before shutting down of `pg_client_`

Test Plan:
Jenkins

The following tests had crashed without the fix

```
./yb_build.sh asan --gtest_filter PgFKeyTest.AddFKCorrectnessOnTempTables
./yb_build.sh asan --gtest_filter PgCatalogPerfTest.CacheRefreshRPCCountWithPartitionTables
./yb_build.sh asan --gtest_filter PgMiniTest.BigInsertWithRestart
./yb_build.sh asan --gtest_filter AlterTableWithConcurrentTxnTest.TServerLeaderChange
```

Reviewers: dsrinivasan, jason, alex

Reviewed By: alex

Subscribers: mbautin, bogdan, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D17259
d-uspenskiy added a commit that referenced this issue Aug 22, 2022
… on postgres shutdown

Summary:
The diff consists of 3 parts.

**Part #1:** Multiple call of the `PerformFuture::Get` method
In case of session close when temp tables exists the following stack is possible

```
std::__1::future<yb::pggate::PerformResult>::get()
yb::pggate::PerformFuture::Get()                   // Second call of the PerformFuture::Get on same object
yb::pggate::PgDocResponse::Get()
yb::pggate::PgDocOp::~PgDocOp()
yb::pggate::PgDocReadOp::~PgDocReadOp()
...
yb::pggate::PgDml::~PgDml()
yb::pggate::PgDmlRead::~PgDmlRead()
yb::pggate::PgSelect::~PgSelect()
...
yb::pggate::PgMemctx::Clear()
yb::pggate::PgMemctx::~PgMemctx()
...
yb::pggate::ClearGlobalPgMemctxMap()
YBCDestroyPgGate
YBOnPostgresBackendShutdown
quickdie                                           // Caused by interruption by the SIGQUIT signal
...
std::__1::future<yb::pggate::PerformResult>::get()
yb::pggate::PerformFuture::Get()                   // First call of the PerformFuture::Get
yb::pggate::PgDocResponse::Get()
yb::pggate::PgDocOp::GetResult(std::__1::list<yb::pggate::PgDocResult, std::__1::allocator<yb::pggate::PgDocResult> >*)
yb::pggate::PgDml::FetchDataFromServer()
yb::pggate::PgDml::Fetch(int, unsigned long*, bool*,
yb::pggate::PgApiImpl::DmlFetch(yb::pggate::PgStatement*, int, unsigned long*,
YBCPgDmlFetch
ybcFetchNextHeapTuple
ybc_getnext_heaptuple
ybc_systable_getnext
systable_getnext
findDependentObjects
performDeletion
RemoveTempRelations
RemoveTempRelationsCallback
shmem_exit
proc_exit_prepare
proc_exit
PostgresMain
BackendRun
BackendStartup
ServerLoop
PostmasterMain
PostgresServerProcessMain
main
```

In this stack the `PerformFuture::Get` method is called on same object twice. But the `PerformFuture` object is designed to call `Get` method only once. After the first call the `PerformFuture::Valid` method should return `false` and `Get` method should not be called. But in the above stack the second call of the `PerformFuture::Get`method is done before the first one is returned (due to interruption by the `SIGQUIT` signal).

To handle this situation `session_` field is set to `null` at the very beginning of the `PerformFuture::Get`. In case `session_` is `null` the `PerformFuture::Valid` returns `false`.

**Part #2:** Access deleted `PgApiImpl` object
The `YBCDestroyPgGate` function destroys the `PgApiImpl` object and then calls the `ClearGlobalPgMemctxMap` function. But this function may call the code which access the `PgApiImpl` object fields. Here is the stack trace

```
std::__1::unique_ptr<yb::pggate::PgClient::Impl, std::__1::default_delete<yb::pggate::PgClient::Impl> >::operator->() const
yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>)
   yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::AbortTransaction()
yb::pggate::PgTxnManager::~PgTxnManager()
...
yb::pggate::PgSession::~PgSession()
...
yb::pggate::PgStatement::~PgStatement()
yb::pggate::PgDml::~PgDml()
yb::pggate::PgDmlRead::~PgDmlRead()
yb::pggate::PgSelect::~PgSelect()
...
yb::pggate::PgMemctx::Clear()
yb::pggate::PgMemctx::~PgMemctx()
...
yb::pggate::ClearGlobalPgMemctxMap()
...
```

Solution is to move PgMemctx map into the `PgApiImpl` object (this also fixes #7216)

**Part #3:** Access shut down `PgClient`object

The `~PgApiImpl()` explicitly shut downs `pg_client_` object but the `pg_session_` is still alive. And when it will be destroyed the pg_client_ object may be used.

The stack is

```
yb::pggate::PgClient::Impl::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>)
yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>)
yb::pggate::PgTxnManager::ExitSeparateDdlTxnMode(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>)
yb::pggate::PgTxnManager::AbortTransaction()
yb::pggate::PgTxnManager::~PgTxnManager()
...
yb::pggate::PgSession::~PgSession()
...
yb::pggate::PgApiImpl::~PgApiImpl()
...
YBCDestroyPgGate
YBOnPostgresBackendShutdown
quickdie
...
```

Solution is to explicitly destroy `pg_session_` before shutting down of `pg_client_`

Original commit: 77396db / D17259

Test Plan:
Jenkins: rebase: 2.14

The following tests had crashed without the fix

```
./yb_build.sh asan --gtest_filter PgFKeyTest.AddFKCorrectnessOnTempTables
./yb_build.sh asan --gtest_filter PgCatalogPerfTest.CacheRefreshRPCCountWithPartitionTables
./yb_build.sh asan --gtest_filter PgMiniTest.BigInsertWithRestart
./yb_build.sh asan --gtest_filter AlterTableWithConcurrentTxnTest.TServerLeaderChange
```

Reviewers: dsrinivasan, alex, jason

Reviewed By: jason

Subscribers: yql, bogdan, mbautin

Differential Revision: https://phabricator.dev.yugabyte.com/D19048
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants