Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RELEASE_ASSERT when creating upstream connection and socket call failed with ENOBUFS #24073

Open
sharmamona opened this issue Nov 18, 2022 · 2 comments

Comments

@sharmamona
Copy link

sharmamona commented Nov 18, 2022

Title: Envoy is hitting RELEASE_ASSERT when creating upstream connection and socket call failed with ENOBUFS

Description:

When creating upstream connection, if socket call fails, envoy crashes due to following RELEASE_ASSERT

From envoy service log
assert failure: SOCKET_VALID(result.return_value_). Details: socket(2) failed, got error: No buffer space available

https://github.com/envoyproxy/envoy/blob/main/source/common/network/socket_interface_impl.cc#L71

Need to add error handling in this case. Instead of crashing, we can just drop the request.

Repro steps:

We are running envoy as edge reverse proxy on a custom OS with FreeBSD-based networking stack. The socket buffers are allocated from a UMA (https://www.freebsd.org/cgi/man.cgi?query=uma&sektion=9) zone. The stress test is inducing network failures which causes the socket memory pool to get exhausted (mostly because of a distributed storage protocol). So it is not the case that the system or envoy process is generally OOM. In fact, the system recovers well once the network failures are cleared. Only envoy crashes.

Call Stack:

(gdb) bt
#0 0x000000cab299cc7b in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:38
#1 0x0000000000fa643b in Envoy::SignalAction::sigHandler(int, siginfo_t*, void*) () at source/common/signal/signal_action.cc:53
#2
#3 0x000000cab2bdee15 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#4 0x000000cab2be028b in __GI_abort () at abort.c:90
#5 0x0000000000f80bc3 in Envoy::Network::SocketInterfaceImpl::socket(Envoy::Network::Socket::Type, Envoy::Network::Address::Type, Envoy::Network::Address::IpVersion, bool, Envoy::Network::SocketCreationOptions const&) const () at source/common/network/socket_interface_impl.cc:63
#6 0x0000000000f80f9a in Envoy::Network::SocketInterfaceImpl::socket(Envoy::Network::Socket::Type, std::shared_ptr<Envoy::Network::Address::Instance const>, Envoy::Network::SocketCreationOptions const&) const
() at source/common/network/socket_interface_impl.cc:92
#7 0x0000000000db2bb1 in Envoy::Network::ClientConnectionImpl::ClientConnectionImpl(Envoy::Event::Dispatcher&, std::shared_ptr<Envoy::Network::Address::Instance const> const&, std::shared_ptr<Envoy::Network::Address::Instance const> const&, std::unique_ptr<Envoy::Network::TransportSocket, std::default_deleteEnvoy::Network::TransportSocket >&&, std::shared_ptr<std::vector<std::shared_ptr<Envoy::Network::Socket::Option const>, std::allocator<std::shared_ptr<Envoy::Network::Socket::Option const> > > > const&, std::shared_ptr<Envoy::Network::TransportSocketOptions const> const&) ()
#8 0x0000000000da15de in Envoy::Network::DefaultClientConnectionFactory::createClientConnection(Envoy::Event::Dispatcher&, std::shared_ptr<Envoy::Network::Address::Instance const>, std::shared_ptr<Envoy::Network::Address::Instance const>, std::unique_ptr<Envoy::Network::TransportSocket, std::default_deleteEnvoy::Network::TransportSocket >&&, std::shared_ptr<std::vector<std::shared_ptr<Envoy::Network::Socket::Option const>, std::allocator<std::shared_ptr<Envoy::Network::Socket::Option const> > > > const&, std::shared_ptr<Envoy::Network::TransportSocketOptions const> const&) ()
#9 0x0000000000d9619c in Envoy::Event::DispatcherImpl::createClientConnection(std::shared_ptr<Envoy::Network::Address::Instance const>, std::shared_ptr<Envoy::Network::Address::Instance const>, std::unique_ptr<Envoy::Network::TransportSocket, std::default_deleteEnvoy::Network::TransportSocket >&&, std::shared_ptr<std::vector<std::shared_ptr<Envoy::Network::Socket::Option const>, std::allocator<std::shared_ptr<Envoy::Network::Socket::Option const> > > > const&, std::shared_ptr<Envoy::Network::TransportSocketOptions const> const&) ()
#10 0x0000000000a7c691 in Envoy::Upstream::HostImpl::createConnection(Envoy::Event::Dispatcher&, Envoy::Upstream::ClusterInfo const&, std::shared_ptr<Envoy::Network::Address::Instance const> const&, std::vector<std::shared_ptr<Envoy::Network::Address::Instance const>, std::allocator<std::shared_ptr<Envoy::Network::Address::Instance const> > > const&, Envoy::Network::UpstreamTransportSocketFactory&, std::shared_ptr<std::vector<std::shared_ptr<Envoy::Network::Socket::Option const>, std::allocator<std::shared_ptr<Envoy::Network::Socket::Option const> > > > const&, std::shared_ptr<Envoy::Network::TransportSocketOptions const>, std::shared_ptr<Envoy::Upstream::HostDescription const>) ()
#11 0x0000000000a7cb8f in Envoy::Upstream::HostImpl::createConnection(Envoy::Event::Dispatcher&, std::shared_ptr<std::vector<std::shared_ptr<Envoy::Network::Socket::Option const>, std::allocator<std::shared_ptr<Envoy::Network::Socket::Option const> > > > const&, std::shared_ptr<Envoy::Network::TransportSocketOptions const>) const ()
#12 0x0000000000a46392 in Envoy::Http::Http1::ActiveClient::ActiveClient(Envoy::Http::HttpConnPoolImplBase&, Envoy::OptRefEnvoy::Upstream::Host::CreateConnectionData) ()
at ./source/common/conn_pool/conn_pool_base.h:277
#13 0x0000000000a46760 in std::_Function_handler<std::unique_ptr<Envoy::ConnectionPool::ActiveClient, std::default_deleteEnvoy::ConnectionPool::ActiveClient > (Envoy::Http::HttpConnPoolImplBase*), Envoy::Http::Http1::allocateConnPool(Envoy::Event::Dispatcher&, Envoy::Random::RandomGenerator&, std::shared_ptr<Envoy::Upstream::Host const>, Envoy::Upstream::ResourcePriority, std::shared_ptr<std::vector<std::shared_ptr<Envoy::Network::Socket::Option const>, std::allocator<std::shared_ptr<Envoy::Network::Socket::Option const> > > > const&, std::shared_ptr<Envoy::Network::TransportSocketOptions const> const&, Envoy::Upstream::ClusterConnectivityState&)::{lambda(Envoy::Http::HttpConnPoolImplBase*)#1}>::_M_invoke(std::_Any_data const&, Envoy::Http::HttpConnPoolImplBase*&&) ()
at external/com_google_absl/absl/types/internal/optional.h:181
#14 0x0000000000a4533f in Envoy::Http::FixedHttpConnPoolImpl::instantiateActiveClient() ()
#15 0x0000000000a5b03f in Envoy::ConnectionPool::ConnPoolImplBase::tryCreateNewConnection(float) () at source/common/conn_pool/conn_pool_base.cc:146
#16 0x0000000000a5c30b in Envoy::ConnectionPool::ConnPoolImplBase::tryCreateNewConnections() () at source/common/conn_pool/conn_pool_base.cc:119
#17 0x0000000000a5c660 in Envoy::ConnectionPool::ConnPoolImplBase::newStreamImpl(Envoy::ConnectionPool::AttachContext&, bool) () at source/common/conn_pool/conn_pool_base.cc:296
#18 0x0000000000a4a44b in Envoy::Http::HttpConnPoolImplBase::newStream(Envoy::Http::ResponseDecoder&, Envoy::Http::ConnectionPool::Callbacks&, Envoy::Http::ConnectionPool::Instance::StreamOptions const&) ()
at source/common/http/conn_pool_base.cc:64
#19 0x0000000000e4f417 in Envoy::Extensions::Upstreams::Http::Http::HttpConnPool::newStream(Envoy::Router::GenericConnectionPoolCallbacks*) () at ./envoy/upstream/thread_local_cluster.h:28
#20 0x0000000000e62d94 in Envoy::Router::Filter::decodeHeaders(Envoy::Http::RequestHeaderMap&, bool) ()
#21 0x0000000000ccb0b5 in Envoy::Http::FilterManager::decodeHeaders(Envoy::Http::ActiveStreamDecoderFilter*, Envoy::Http::RequestHeaderMap&, bool) ()
#22 0x0000000000cb0aa5 in Envoy::Http::ConnectionManagerImpl::ActiveStream::decodeHeaders(std::unique_ptr<Envoy::Http::RequestHeaderMap, std::default_deleteEnvoy::Http::RequestHeaderMap >&&, bool) ()
at ./source/common/http/filter_manager.h:730
#23 0x0000000000d03adf in Envoy::Http::Http1::ServerConnectionImpl::onHeadersCompleteBase() ()
#24 0x0000000000d05e30 in Envoy::Http::Http1::ConnectionImpl::onHeadersCompleteImpl() () at source/common/http/http1/codec_impl.cc:886
#25 0x0000000000d06b8e in Envoy::Http::Http1::ConnectionImpl::onHeadersComplete() () at source/common/http/http1/codec_impl.cc:716
#26 0x0000000000f40a74 in http_parser_execute () at external/envoy/bazel/external/http_parser/http_parser.c:1849
#27 0x0000000000cfafb7 in Envoy::Http::Http1::ConnectionImpl::dispatchSlice(char const*, unsigned long) ()
#28 0x0000000000d02a9a in Envoy::Http::Http1::ConnectionImpl::dispatch(Envoy::Buffer::Instance&) () at source/common/http/http1/codec_impl.cc:640
#29 0x0000000000d031c4 in Envoy::Http::Http1::ServerConnectionImpl::dispatch(Envoy::Buffer::Instance&) () at source/common/http/http1/codec_impl.cc:1224
#30 0x0000000000cb3223 in Envoy::Http::ConnectionManagerImpl::onData(Envoy::Buffer::Instance&, bool) () at source/common/http/conn_manager_impl.cc:388
#31 0x0000000000db5516 in Envoy::Network::FilterManagerImpl::onContinueReading(Envoy::Network::FilterManagerImpl::ActiveReadFilter*, Envoy::Network::ReadBufferSource&) ()
#32 0x0000000000dada54 in Envoy::Network::ConnectionImpl::onReadReady() () at source/common/network/connection_impl.cc:651
#33 0x0000000000daf665 in Envoy::Network::ConnectionImpl::onFileEvent(unsigned int) () at source/common/network/connection_impl.cc:602
#34 0x0000000000d9497f in std::_Function_handler<void (unsigned int), Envoy::Event::DispatcherImpl::createFileEvent(int, std::function<void (unsigned int)>, Envoy::Event::FileTriggerType, unsigned int)::{lambda(unsigned int)#1}>::_M_invoke(std::_Any_data const&, unsigned int&&) ()
#35 0x0000000000d997d4 in Envoy::Event::FileEventImpl::mergeInjectedEventsAndRunCb(unsigned int) ()
#36 0x0000000000f99f92 in event_process_active_single_queue.isra () at source/common/signal/signal_action.cc:136
#37 0x0000000000f9a71f in event_base_loop () at source/common/signal/signal_action.cc:136
#38 0x00000000007be04d in Envoy::Server::WorkerImpl::threadRoutine(Envoy::Server::GuardDog&, std::function<void ()> const&) ()
#39 0x000000000125a305 in Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::function<void ()>, absl::optionalEnvoy::Thread::Options const&)::{lambda(void*)#1}::_FUN(void*) ()
#40 0x000000cab2994d3b in start_thread (arg=0xcab347f700) at pthread_create.c:308
#41 0x000000cab2c9717d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Note: creating public issue of this crash as per discussion on envoysecurity group.

@sharmamona sharmamona added bug triage Issue requires triage labels Nov 18, 2022
@wbpcode
Copy link
Member

wbpcode commented Nov 24, 2022

When a socket call fails, we always think it a extream unusual case and quick down is best choice for now (just like memory allocation failure).
It's a little hard to find a better and sound solution for this problem.

cc @mattklein123

@wbpcode wbpcode added investigate Potential bug that needs verification area/connection and removed triage Issue requires triage bug labels Nov 24, 2022
@mattklein123
Copy link
Member

We discussed this on the security mailing list. In general we treat OOM conditions as fatal and crash. In this case, the poster says that in certain system configurations socket buffers are disjoint from main memory and theoretically there could be recovery. I'm fine if someone wants to take this on but not sure how difficult it will be to fix/test.

@mattklein123 mattklein123 added help wanted Needs help! and removed investigate Potential bug that needs verification labels Nov 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants