-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
health: ensure /v1/health/service/:service endpoint returns the most recent results when a filter is used with streaming #12640
Conversation
@@ -241,16 +241,15 @@ func (h *Health) ServiceNodes(args *structs.ServiceSpecificRequest, reply *struc | |||
return err | |||
} | |||
|
|||
reply.Index, reply.Nodes = index, nodes | |||
if len(args.NodeMetaFilters) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it is not safe to mutate reply
until just before returning. This is not the first time this kind of bug has manifested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. This makes me sad. This basically means the entire endpoint is not thread-safe if the response (aka reply
) pointer is mutated? It would only apply to those endpoints that handle blocking queries, correct? There are a lot of endpoints that are implemented this way afaict and seems like a trap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only one thread/goroutine involved, but it loops around during retry without resetting the reply
var, so depending upon how the access goes and how the body of the blocking query function proceeds you can get "carry over" between attempts that you didn't intend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example from the last time this kind of thing specifically caused a bug: #10239
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return err | ||
case passed: | ||
} else if passed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the filter no longer matched an entry we did not GC the prior record.
I'm assuming the test failures mean the previous behavior was codified incorrectly? |
5057681
to
19d53aa
Compare
…recent results when a filter is used This is two bugs in two subsystems (blocking queries ; streaming) that make the overall endpoint behave brokenly in the same way to the end user. Simple reproduction (streaming): 1. Register a service with a tag. curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \ --header 'Content-Type: application/json' \ --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "a" ], "EnableTagOverride": true }' 2. Do an initial filter query that matches on the tag. curl -sLi --get 'http://localhost:8500/v1/health/service/test' --data-urlencode 'filter=a in Service.Tags' 3. Note you get one result. Use the `X-Consul-Index` header to establish a blocking query in another terminal, this should not return yet. curl -sLi --get 'http://localhost:8500/v1/health/service/test?index=$INDEX' --data-urlencode 'filter=a in Service.Tags' 4. Re-register that service with a different tag. curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \ --header 'Content-Type: application/json' \ --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "b" ], "EnableTagOverride": true }' 5. Your blocking query from (3) should return with a header `X-Consul-Query-Backend: streaming` and empty results if it works correctly `[]`. To reproduce for non-streaming, simply add `&near=_agent` to your read queries and ensure `X-Consul-Query-Backend: blocking-query` shows up in the results.
af60ad3
to
5011885
Compare
@@ -679,6 +679,85 @@ func TestHealth_ServiceNodes(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestHealth_ServiceNodes_BlockingQuery_withFilter(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: we can't test streaming here because streaming happens elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after some live review, LGTM... I have separate sadness about net/rpc.
🍒 If backport labels were added before merging, cherry-picking will start automatically. To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/653596. |
🍒✅ Cherry pick of commit 11213ae onto |
…recent results when a filter is used with streaming (#12640) The primary bug here is in the streaming subsystem that makes the overall v1/health/service/:service request behave incorrectly when servicing a blocking request with a filter provided. There is a secondary non-streaming bug being fixed here that is much less obvious related to when to update the `reply` variable in a `blockingQuery` evaluation. It is unlikely that it is triggerable in practical environments and I could not actually get the bug to manifest, but I fixed it anyway while investigating the original issue. Simple reproduction (streaming): 1. Register a service with a tag. curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \ --header 'Content-Type: application/json' \ --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "a" ], "EnableTagOverride": true }' 2. Do an initial filter query that matches on the tag. curl -sLi --get 'http://localhost:8500/v1/health/service/test' --data-urlencode 'filter=a in Service.Tags' 3. Note you get one result. Use the `X-Consul-Index` header to establish a blocking query in another terminal, this should not return yet. curl -sLi --get 'http://localhost:8500/v1/health/service/test?index=$INDEX' --data-urlencode 'filter=a in Service.Tags' 4. Re-register that service with a different tag. curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \ --header 'Content-Type: application/json' \ --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "b" ], "EnableTagOverride": true }' 5. Your blocking query from (3) should return with a header `X-Consul-Query-Backend: streaming` and empty results if it works correctly `[]`. Attempts to reproduce with non-streaming failed (where you add `&near=_agent` to the read queries and ensure `X-Consul-Query-Backend: blocking-query` shows up in the results).
…the most recent results when a filter is used with streaming Backport of #12640 to 1.11.x
…the most recent results when a filter is used with streaming Backport of #12640 to 1.10.x
The primary bug here is in the streaming subsystem that makes the overall v1/health/service/:service request behave incorrectly when servicing a blocking request with a filter provided.
There is a secondary non-streaming bug being fixed here that is much less obvious related to when to update the
reply
variable in ablockingQuery
evaluation. It is unlikely that it is triggerable in practical environments and I could not actually get the bug to manifest, but I fixed it anyway while investigating the original issue.Simple reproduction (streaming):
Register a service with a tag.
Do an initial filter query that matches on the tag.
Note you get one result. Use the
X-Consul-Index
header to establisha blocking query in another terminal, this should not return yet.
Re-register that service with a different tag.
Your blocking query from (3) should return with a header
X-Consul-Query-Backend: streaming
and empty results if it workscorrectly
[]
.Attempts to reproduce with non-streaming failed (where you add
&near=_agent
to the read queries and ensureX-Consul-Query-Backend: blocking-query
shows up in the results).TODO: