Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] new QueryGroupTask warning in 2.18 #16874

Open
doug-numetric opened this issue Dec 17, 2024 · 11 comments · May be fixed by #16981
Open

[BUG] new QueryGroupTask warning in 2.18 #16874

doug-numetric opened this issue Dec 17, 2024 · 11 comments · May be fixed by #16981
Assignees
Labels
bug Something isn't working Search Search query, autocomplete ...etc

Comments

@doug-numetric
Copy link

Describe the bug

The warning:
https://github.com/opensearch-project/OpenSearch/pull/14708/files#diff-4e901163f39dfc072ae4d7f93a43f0958bdfdf2bb82f044b2ec9f3ae13fa66dfR56
Ideally should never happen, but it's spamming our logs for our use cases.

Related component

Search

To Reproduce

curl -s -X POST "localhost:9200/some_index_name/_search?size=2&scroll=30s" -H "content-type: application/json" -d '{"query":{"match_all":{}}}'

with the _scroll_id from the response:

curl -s "localhost:9200/_search/scroll/${_scroll_id}?scroll=30s

The opensearch logs show:

2024-12-17T17:00:36.571341698Z [2024-12-17T17:00:36,570][WARN ][o.o.w.QueryGroupTask     ] [b6feeca76aab] QueryGroup _id can't be null, It should be set before accessing it. This is abnormal behaviour

Expected behavior

No warning

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@doug-numetric doug-numetric added bug Something isn't working untriaged labels Dec 17, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Dec 17, 2024
@sandeshkr419
Copy link
Contributor

@jainankitk Do you know someone who would be interested in looking this further?

@kaushalmahi12 kaushalmahi12 self-assigned this Dec 18, 2024
@kaushalmahi12
Copy link
Contributor

Let me look into it.

@kaushalmahi12
Copy link
Contributor

kaushalmahi12 commented Dec 18, 2024

The diff you are mentioning in the issue is old one. This is fixed in all of the branches as part of this PR: 3b4e11d#diff-4e901163f39dfc072ae4d7f93a43f0958bdfdf2bb82f044b2ec9f3ae13fa66df

The issue was when the header is not set in thread context then it would return null for queryGroupId because the threadContext itself is not null

@andrross
Copy link
Member

@kaushalmahi12 The reproduction steps shared by @doug-numetric in this issue work for me. I'm able to generate that log statement on a min distribution install of OpenSearch 2.18.

@kaushalmahi12
Copy link
Contributor

kaushalmahi12 commented Dec 18, 2024

I think the mechanism to set the queryGroupId for scroll action is missing. To mitigate the logging from QueryGroupTask class, we can do the following

curl -XPUT "localhost:9200/_cluster/settings" -H "Content-Type: Application/json" -d '{"persistent" : { "logger.org.opensearch.wlm.QueryGroupTask": "ERROR" }}'

@kaushalmahi12
Copy link
Contributor

@doug-numetric I suppose this is only happening for scroll APIs?

@doug-numetric
Copy link
Author

@kaushalmahi12 Yes, as far as I can find, only the scroll APIs are affected.

@jamesbuddrige
Copy link

I'm also seeing this error popup on a fresh cluster running the re-index API.

@kaushalmahi12 kaushalmahi12 linked a pull request Jan 8, 2025 that will close this issue
3 tasks
@rursprung
Copy link
Contributor

this problem also appears with simple requests against the /search API when you just send enough of them at the same time. tested using a stock 2.18.0 release (i disabled the security plugin for simplicity).

  1. create a test index:
curl --location --request PUT 'http://localhost:9200/testindex' \
--header 'Content-Type: application/json' \
--data-raw '{
    "mappings": {
        "properties": {
            "testfield": {
                "type": "text"
            }
        }
    }
}'
  1. add test data (probably not even needed):
curl --location --request POST 'http://localhost:9200/testindex/_doc' \
--header 'Content-Type: application/json' \
--data-raw '{
    "testfield": "foo"
}'
  1. run a lot of GET requests against the API; i used oha for this, letting it run for 2 minutes with the default 50 parallel requests:
oha -z 2m http://localhost:9200/testindex/_search

i got the message 14x in the OS log during this time.

@andrross
Copy link
Member

andrross commented Jan 9, 2025

FYI @kaushalmahi12. There's a couple more cases here beyond scroll. @rursprung's case suggests there might be a race condition during search triggering this case.

@kaushalmahi12
Copy link
Contributor

Let me see what is causing this. Thanks rursprung bringing this to our attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search Search query, autocomplete ...etc
Projects
Status: 🆕 New
Development

Successfully merging a pull request may close this issue.

6 participants