Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

health: ensure /v1/health/service/:service endpoint returns the most recent results when a filter is used with streaming #12640

Merged
merged 2 commits into from
Apr 27, 2022

Conversation

rboyer
Copy link
Member

@rboyer rboyer commented Mar 29, 2022

The primary bug here is in the streaming subsystem that makes the overall v1/health/service/:service request behave incorrectly when servicing a blocking request with a filter provided.

There is a secondary non-streaming bug being fixed here that is much less obvious related to when to update the reply variable in a blockingQuery evaluation. It is unlikely that it is triggerable in practical environments and I could not actually get the bug to manifest, but I fixed it anyway while investigating the original issue.

Simple reproduction (streaming):

  1. Register a service with a tag.

     curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \
         --header 'Content-Type: application/json' \
         --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "a" ], "EnableTagOverride": true }'
    
  2. Do an initial filter query that matches on the tag.

     curl -sLi --get 'http://localhost:8500/v1/health/service/test' --data-urlencode 'filter=a in Service.Tags'
    
  3. Note you get one result. Use the X-Consul-Index header to establish
    a blocking query in another terminal, this should not return yet.

     curl -sLi --get 'http://localhost:8500/v1/health/service/test?index=$INDEX' --data-urlencode 'filter=a in Service.Tags'
    
  4. Re-register that service with a different tag.

     curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \
         --header 'Content-Type: application/json' \
         --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "b" ], "EnableTagOverride": true }'
    
  5. Your blocking query from (3) should return with a header
    X-Consul-Query-Backend: streaming and empty results if it works
    correctly [].

Attempts to reproduce with non-streaming failed (where you add &near=_agent to the read queries and ensure X-Consul-Query-Backend: blocking-query shows up in the results).

TODO:

  • tests
    • RPC (blocking query)
    • API (blocking query ; streaming)
  • backports?
  • changelog

@rboyer rboyer requested a review from a team March 29, 2022 14:25
@rboyer rboyer self-assigned this Mar 29, 2022
@@ -241,16 +241,15 @@ func (h *Health) ServiceNodes(args *structs.ServiceSpecificRequest, reply *struc
return err
}

reply.Index, reply.Nodes = index, nodes
if len(args.NodeMetaFilters) > 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it is not safe to mutate reply until just before returning. This is not the first time this kind of bug has manifested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. This makes me sad. This basically means the entire endpoint is not thread-safe if the response (aka reply) pointer is mutated? It would only apply to those endpoints that handle blocking queries, correct? There are a lot of endpoints that are implemented this way afaict and seems like a trap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only one thread/goroutine involved, but it loops around during retry without resetting the reply var, so depending upon how the access goes and how the body of the blocking query function proceeds you can get "carry over" between attempts that you didn't intend.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example from the last time this kind of thing specifically caused a bug: #10239

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return err
case passed:
} else if passed {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the filter no longer matched an entry we did not GC the prior record.

@kisunji kisunji self-requested a review March 29, 2022 14:30
@kisunji
Copy link
Contributor

kisunji commented Apr 4, 2022

I'm assuming the test failures mean the previous behavior was codified incorrectly?

@rboyer rboyer force-pushed the fix-blocking-query-bug branch from 5057681 to 19d53aa Compare April 11, 2022 21:37
@vercel vercel bot temporarily deployed to Preview – consul April 11, 2022 21:37 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging April 11, 2022 21:37 Inactive
@vercel vercel bot temporarily deployed to Preview – consul April 11, 2022 22:08 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging April 11, 2022 22:08 Inactive
@vercel vercel bot temporarily deployed to Preview – consul April 11, 2022 22:30 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging April 11, 2022 22:30 Inactive
…recent results when a filter is used

This is two bugs in two subsystems (blocking queries ; streaming) that
make the overall endpoint behave brokenly in the same way to the end
user.

Simple reproduction (streaming):

1. Register a service with a tag.

    curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \
        --header 'Content-Type: application/json' \
        --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "a" ], "EnableTagOverride": true }'

2. Do an initial filter query that matches on the tag.

    curl -sLi --get 'http://localhost:8500/v1/health/service/test' --data-urlencode 'filter=a in Service.Tags'

3. Note you get one result. Use the `X-Consul-Index` header to establish
   a blocking query in another terminal, this should not return yet.

    curl -sLi --get 'http://localhost:8500/v1/health/service/test?index=$INDEX' --data-urlencode 'filter=a in Service.Tags'

4. Re-register that service with a different tag.

    curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \
        --header 'Content-Type: application/json' \
        --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "b" ], "EnableTagOverride": true }'

5. Your blocking query from (3) should return with a header
   `X-Consul-Query-Backend: streaming` and empty results if it works
   correctly `[]`.

To reproduce for non-streaming, simply add `&near=_agent` to your read
queries and ensure `X-Consul-Query-Backend: blocking-query` shows up in the results.
@rboyer rboyer force-pushed the fix-blocking-query-bug branch from af60ad3 to 5011885 Compare April 12, 2022 18:00
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging April 12, 2022 18:00 Inactive
@vercel vercel bot temporarily deployed to Preview – consul April 12, 2022 18:00 Inactive
@@ -679,6 +679,85 @@ func TestHealth_ServiceNodes(t *testing.T) {
}
}

func TestHealth_ServiceNodes_BlockingQuery_withFilter(t *testing.T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we can't test streaming here because streaming happens elsewhere.

@rboyer rboyer changed the title health: ensure /v1/health/service/:service endpoint returns the most recent results when a filter is used health: ensure /v1/health/service/:service endpoint returns the most recent results when a filter is used with streaming Apr 12, 2022
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging April 12, 2022 18:27 Inactive
@vercel vercel bot temporarily deployed to Preview – consul April 12, 2022 18:27 Inactive
Copy link
Contributor

@eculver eculver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after some live review, LGTM... I have separate sadness about net/rpc.

@rboyer rboyer merged commit 11213ae into main Apr 27, 2022
@rboyer rboyer deleted the fix-blocking-query-bug branch April 27, 2022 15:39
@hc-github-team-consul-core
Copy link
Collaborator

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/653596.

@hc-github-team-consul-core
Copy link
Collaborator

🍒✅ Cherry pick of commit 11213ae onto release/1.12.x succeeded!

@hc-github-team-consul-core
Copy link
Collaborator

🍒❌ Cherry pick of commit 11213ae onto release/1.11.x failed! Build Log

hc-github-team-consul-core pushed a commit that referenced this pull request Apr 27, 2022
…recent results when a filter is used with streaming (#12640)

The primary bug here is in the streaming subsystem that makes the overall v1/health/service/:service request behave incorrectly when servicing a blocking request with a filter provided.

There is a secondary non-streaming bug being fixed here that is much less obvious related to when to update the `reply` variable in a `blockingQuery` evaluation. It is unlikely that it is triggerable in practical environments and I could not actually get the bug to manifest, but I fixed it anyway while investigating the original issue.

Simple reproduction (streaming):

1. Register a service with a tag.

        curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \
            --header 'Content-Type: application/json' \
            --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "a" ], "EnableTagOverride": true }'

2. Do an initial filter query that matches on the tag.

        curl -sLi --get 'http://localhost:8500/v1/health/service/test' --data-urlencode 'filter=a in Service.Tags'

3. Note you get one result. Use the `X-Consul-Index` header to establish
   a blocking query in another terminal, this should not return yet.

        curl -sLi --get 'http://localhost:8500/v1/health/service/test?index=$INDEX' --data-urlencode 'filter=a in Service.Tags'

4. Re-register that service with a different tag.

        curl -sL --request PUT 'http://localhost:8500/v1/agent/service/register' \
            --header 'Content-Type: application/json' \
            --data-raw '{ "ID": "ID1", "Name": "test", "Tags":[ "b" ], "EnableTagOverride": true }'

5. Your blocking query from (3) should return with a header
   `X-Consul-Query-Backend: streaming` and empty results if it works
   correctly `[]`.

Attempts to reproduce with non-streaming failed (where you add `&near=_agent` to the read queries and ensure `X-Consul-Query-Backend: blocking-query` shows up in the results).
@hc-github-team-consul-core
Copy link
Collaborator

🍒❌ Cherry pick of commit 11213ae onto release/1.10.x failed! Build Log

rboyer added a commit that referenced this pull request Apr 27, 2022
…the most recent results when a filter is used with streaming

Backport of #12640 to 1.11.x
rboyer added a commit that referenced this pull request Apr 27, 2022
…the most recent results when a filter is used with streaming (#12866)

Backport of #12640 to 1.11.x
rboyer added a commit that referenced this pull request Apr 27, 2022
…the most recent results when a filter is used with streaming

Backport of #12640 to 1.10.x
rboyer added a commit that referenced this pull request Apr 27, 2022
…the most recent results when a filter is used with streaming (#12868)

Backport of #12640 to 1.10.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants