Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy with Postgres filter, envoy is crashing #35908

Closed
rohitkanchan opened this issue Aug 29, 2024 · 18 comments
Closed

Envoy with Postgres filter, envoy is crashing #35908

rohitkanchan opened this issue Aug 29, 2024 · 18 comments
Labels
area/postgres bug stale stalebot believes this issue/PR has not been touched recently

Comments

@rohitkanchan
Copy link

If you are reporting any crash or any potential security issue, do not
open an issue in this repo. Please report the issue via emailing
[email protected] where the issue will be triaged appropriately.

Title: Envoy with Postgres is crashing

Description:

Deployed envoy with postgres v3 alpha filter, after deploying into VM, after few hours it crashed. Logs are attached. ? Envoy should not crash, it should keep running.

Repro steps:

Deploy Envoy with Postgres, TCPProxy and v3.StartTlsConfig filters. deploy docker image in a virtual machine. Let it keep running, after 4-5 hours envoy will crash and stop running.

Note: The Envoy_collect tool
gathers a tarball with debug logs, config and the following admin
endpoints: /stats, /clusters and /server_info. Please note if there are
privacy concerns, sanitize the data prior to sharing the tarball/pasting.

Admin and Stats Output:

Include the admin output for the following endpoints: /stats,
/clusters, /routes, /server_info. For more information, refer to the
admin endpoint documentation.

Note: If there are privacy concerns, sanitize the data prior to
sharing.

Config:

Include the config used to configure Envoy.

Config:
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 5432 # Frontend port
filter_chains:
- filters:
- name: envoy.filters.network.postgres_proxy
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.postgres_proxy.v3alpha.PostgresProxy
stat_prefix: imperva
terminate_ssl: true
- name: envoy.filters.network.tcp_proxy
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
stat_prefix: tcp
cluster: backend_cluster
transport_socket:
name: envoy.transport_sockets.starttls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.starttls.v3.StartTlsConfig
tls_socket_config:
common_tls_context:
tls_certificates:
- certificate_chain:
filename: "/etc/envoy/fullchain.pem"
private_key:
filename: "/etc/envoy/privkey.pem"
clusters:
- name: backend_cluster
connect_timeout: 0.25s
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: backend_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address:
port_value: 5432

Logs:

Include the access logs and the Envoy logs.
DEFAULT 2024-08-27T04:00:44.727761510Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:44.719][19][critical][backtrace] [./source/server/backtrace.h:119] #2: Envoy::Extensions::NetworkFilters::PostgresProxy::ZeroTCodes<>::validate() [0x55b591400f7a]
DEFAULT 2024-08-27T04:00:44.751002873Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:44.750][19][critical][backtrace] [./source/server/backtrace.h:119] #3: Envoy::Extensions::NetworkFilters::PostgresProxy::MessageImpl<>::validate() [0x55b591400b55]
DEFAULT 2024-08-27T04:00:44.778504515Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:44.778][19][critical][backtrace] [./source/server/backtrace.h:119] #4: Envoy::Extensions::NetworkFilters::PostgresProxy::DecoderImpl::onDataInSync() [0x55b5913f688c]
DEFAULT 2024-08-27T04:00:44.805880148Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:44.805][19][critical][backtrace] [./source/server/backtrace.h:119] #5: Envoy::Extensions::NetworkFilters::PostgresProxy::PostgresFilter::onWrite() [0x55b59140ab34]
DEFAULT 2024-08-27T04:00:44.833436028Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:44.832][19][critical][backtrace] [./source/server/backtrace.h:119] #6: Envoy::Network::FilterManagerImpl::onWrite() [0x55b5918fe311]
DEFAULT 2024-08-27T04:00:44.861139329Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:44.860][19][critical][backtrace] [./source/server/backtrace.h:119] #7: Envoy::Network::ConnectionImpl::write() [0x55b5918ef60a]
DEFAULT 2024-08-27T04:00:44.888796366Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:44.888][19][critical][backtrace] [./source/server/backtrace.h:119] #8: Envoy::TcpProxy::Filter::onUpstreamData() [0x55b58f85994e]
DEFAULT 2024-08-27T04:00:44.916880944Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:44.916][19][critical][backtrace] [./source/server/backtrace.h:119] #9: Envoy::Tcp::ActiveTcpClient::ConnReadFilter::onData() [0x55b5914bac6d]
DEFAULT 2024-08-27T04:00:44.943397145Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:44.942][19][critical][backtrace] [./source/server/backtrace.h:119] #10: Envoy::Network::FilterManagerImpl::onContinueReading() [0x55b5918fe225]
DEFAULT 2024-08-27T04:00:44.969863385Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:44.969][19][critical][backtrace] [./source/server/backtrace.h:119] #11: Envoy::Network::ConnectionImpl::onReadReady() [0x55b5918f50cb]
DEFAULT 2024-08-27T04:00:44.997142728Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:44.996][19][critical][backtrace] [./source/server/backtrace.h:119] #12: Envoy::Network::ConnectionImpl::onFileEvent() [0x55b5918f12e5]
DEFAULT 2024-08-27T04:00:45.023768062Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.023][19][critical][backtrace] [./source/server/backtrace.h:119] #13: std::__1::__function::_func<>::operator()() [0x55b5918fb086]
DEFAULT 2024-08-27T04:00:45.050198286Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.049][19][critical][backtrace] [./source/server/backtrace.h:119] #14: std::1::function::func<>::operator()() [0x55b5918ce866]
DEFAULT 2024-08-27T04:00:45.077442460Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.077][19][critical][backtrace] [./source/server/backtrace.h:119] #15: Envoy::Event::FileEventImpl::mergeInjectedEventsAndRunCb() [0x55b5918cfd75]
DEFAULT 2024-08-27T04:00:45.104793330Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.104][19][critical][backtrace] [./source/server/backtrace.h:119] #16: event_process_active_single_queue [0x55b59280dc60]
DEFAULT 2024-08-27T04:00:45.131326158Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.130][19][critical][backtrace] [./source/server/backtrace.h:119] #17: event_base_loop [0x55b59280c5a1]
DEFAULT 2024-08-27T04:00:45.157976815Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.157][19][critical][backtrace] [./source/server/backtrace.h:119] #18: Envoy::Server::WorkerImpl::threadRoutine() [0x55b590ab695f]
DEFAULT 2024-08-27T04:00:45.185891786Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.185][19][critical][backtrace] [./source/server/backtrace.h:119] #19: Envoy::Thread::PosixThreadFactory::createPthread()::$2::invoke() [0x55b59288b863]
DEFAULT 2024-08-27T04:00:45.186991950Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.185][19][critical][backtrace] [./source/server/backtrace.h:121] #20: [0x7f7bd57acac3]
DEFAULT 2024-08-27T04:00:45.187304685Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.186][19][critical][backtrace] [./source/server/backtrace.h:127] Caught Aborted, suspect faulting address 0x6500000001
DEFAULT 2024-08-27T04:00:45.187435666Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.186][19][critical][backtrace] [./source/server/backtrace.h:111] Backtrace (use tools/stack_decode.py to get line numbers):
DEFAULT 2024-08-27T04:00:45.187586025Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.186][19][critical][backtrace] [./source/server/backtrace.h:112] Envoy version: 7b8baff/1.31.0/Clean/RELEASE/BoringSSL
DEFAULT 2024-08-27T04:00:45.187721302Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.186][19][critical][backtrace] [./source/server/backtrace.h:114] Address mapping: 55b58f24b000-55b592ce0000 /usr/local/bin/envoy
DEFAULT 2024-08-27T04:00:45.187819445Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.186][19][critical][backtrace] [./source/server/backtrace.h:121] #0: [0x7f7bd575a520]
DEFAULT 2024-08-27T04:00:45.187909486Z [resource.labels.instanceId: haproxy-dev-vm] ConnectionImpl 0x27763f7e4480, connecting
: 0, bind_error
: 0, state(): Open, read_buffer_limit
: 1048576
DEFAULT 2024-08-27T04:00:45.188018622Z [resource.labels.instanceId: haproxy-dev-vm] socket
:
DEFAULT 2024-08-27T04:00:45.188112094Z [resource.labels.instanceId: haproxy-dev-vm] ListenSocketImpl 0x27763f725480, transport_protocol
:
DEFAULT 2024-08-27T04:00:45.188199034Z [resource.labels.instanceId: haproxy-dev-vm] connection_info_provider
:
DEFAULT 2024-08-27T04:00:45.188231588Z [resource.labels.instanceId: haproxy-dev-vm] ConnectionInfoSetterImpl 0x27763f7de218, remote_address
: 10.138.33.3:5432, direct_remote_address
: 10.138.33.3 5432, local_address
: 172.17.0.2:58976, server_name
:
DEFAULT 2024-08-27T04:00:45.188344531Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.186][19][critical][backtrace] [./source/server/backtrace.h:127] Caught Segmentation fault, suspect faulting address 0x0
DEFAULT 2024-08-27T04:00:45.188444912Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.186][19][critical][backtrace] [./source/server/backtrace.h:111] Backtrace (use tools/stack_decode.py to get line numbers):
DEFAULT 2024-08-27T04:00:45.188539354Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.186][19][critical][backtrace] [./source/server/backtrace.h:112] Envoy version: 7b8baff/1.31.0/Clean/RELEASE/BoringSSL
DEFAULT 2024-08-27T04:00:45.188631242Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.186][19][critical][backtrace] [./source/server/backtrace.h:114] Address mapping: 55b58f24b000-55b592ce0000 /usr/local/bin/envoy
DEFAULT 2024-08-27T04:00:45.188723722Z [resource.labels.instanceId: haproxy-dev-vm] [2024-08-27 04:00:45.187][19][critical][backtrace] [./source/server/backtrace.h:121] #0: [0x7f7bd575a520]
DEFAULT 2024-08-27T04:00:45.188835221Z [resource.labels.instanceId: haproxy-dev-vm] Our FatalActions triggered a fatal signal.
DEFAULT 2024-08-27T04:00:45.202467819Z [resource.labels.instanceId: haproxy-dev-vm] Aug 27 04:00:45 haproxy-dev-vm kernel: [597442.542112] traps: wrk:worker_1[89419] general protection fault ip:7f7bd5740898 sp:7f7bd16c68b0 error:0 in libc.so.6[7f7bd5740000+195000]

Note: If there are privacy concerns, sanitize the data prior to
sharing.

Call Stack:

If the Envoy binary is crashing, a call stack is required.
Please refer to the Bazel Stack trace documentation.

I added logs already in logs section.

@rohitkanchan rohitkanchan added bug triage Issue requires triage labels Aug 29, 2024
@ggreenway
Copy link
Contributor

cc @fabriziomello @cpakulski

@ggreenway ggreenway added area/postgres and removed triage Issue requires triage labels Aug 30, 2024
@rohitkanchan
Copy link
Author

Is there any update on this?

@cpakulski
Copy link
Contributor

Did it crash again?

@rohitkanchan
Copy link
Author

rohitkanchan commented Sep 6, 2024 via email

@cpakulski
Copy link
Contributor

@rohitkanchan Can you try to narrow it to a specific SQL query which causes the crash?

@rohitkanchan
Copy link
Author

rohitkanchan commented Sep 10, 2024 via email

@cpakulski
Copy link
Contributor

@rohitkanchan I suspect that non-postgres traffic is received by postgres filter. I added some protection code to validate if a packet is legitimate postgres request, but maybe it is not sufficient. The security posture of this filter is for "trusted downstream" only, so in general you should not expose it to wider audience.
There are several methods we could use to check if non-postgres traffic is received. Since you claim that it crashes without any queries, can you remove postgres filter and check stats of backend_cluster? I assume that backend_cluster is used only by postgres filter chain. You should see that there is no traffic received by backend_cluster. If it is and you do not generate any SQL queries, it means that some other app sends something to port 5432. WDYT?

@rohitkanchan
Copy link
Author

rohitkanchan commented Sep 10, 2024 via email

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Oct 11, 2024
@rohitkanchan
Copy link
Author

rohitkanchan commented Oct 11, 2024 via email

@github-actions github-actions bot removed the stale stalebot believes this issue/PR has not been touched recently label Oct 11, 2024
@cpakulski
Copy link
Contributor

cpakulski commented Oct 15, 2024

@rohitkanchan Can you check if it generates the same backtrace each time it crashes?

@rohitkanchan
Copy link
Author

rohitkanchan commented Oct 16, 2024 via email

@cpakulski
Copy link
Contributor

OK. Let us try one more thing. Can you modify config and change the listener's port from 5432 to something different, like 12345 for example?

Instead of
port_value: 5432 # Frontend port

change it to

port_value: 12345 # Frontend port

Thanks.

@rohitkanchan
Copy link
Author

rohitkanchan commented Oct 21, 2024 via email

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Nov 21, 2024
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 28, 2024
@rohitkanchan
Copy link
Author

rohitkanchan commented Nov 29, 2024 via email

@cpakulski
Copy link
Contributor

That pretty much proves that there was some unexpected traffic hitting port 5432.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/postgres bug stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

3 participants