Skip to content

Comments

Avoid internal server error on suggester ngram bad request#132321

Merged
carlosdelest merged 6 commits intoelastic:mainfrom
carlosdelest:bugfix/suggester-all-tokens-unigrams-response
Aug 1, 2025
Merged

Avoid internal server error on suggester ngram bad request#132321
carlosdelest merged 6 commits intoelastic:mainfrom
carlosdelest:bugfix/suggester-all-tokens-unigrams-response

Conversation

@carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Aug 1, 2025

A phrase suggester by default is configured to require unigrams. If an analyzer that only produces n-grams (no unigrams) is used, we get an internal server error as a response (due to NoisyChannelSpellChecker throwing an IllegalStateException).

This is a configuration problem, not a server error. The user used an analyzer that does not provide unigrams, but unigrams are required for the suggestion.

This PR makes NoisyChannelSpellChecker throw an IllegalArgumentException instead, so a shard failure is returned. It also adds an IT to check it works as expected.

Closes #131928

…nnelSpellChecker, so we get a bad request instead of an internal server error
@carlosdelest carlosdelest added >bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0 labels Aug 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

@carlosdelest carlosdelest marked this pull request as ready for review August 1, 2025 11:11
@carlosdelest carlosdelest requested a review from a team August 1, 2025 11:11
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@carlosdelest carlosdelest marked this pull request as draft August 1, 2025 12:21
Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

If an analyzer that only produces n-grams (no unigrams) is used,

I guess we don't know this before executing the suggested so we can fail early.

@carlosdelest carlosdelest marked this pull request as ready for review August 1, 2025 12:49
@carlosdelest carlosdelest merged commit 079cff5 into elastic:main Aug 1, 2025
33 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Aug 1, 2025
…cking

* upstream/main: (166 commits)
  Reduce inactive sink interval in VectorSimilarityFunctionsIT (elastic#132288)
  ESQL: Allow agg tests to process many columns (elastic#132358)
  Update analysis-lowercase-tokenfilter.md (elastic#132359)
  Add Sparse Vector Index Options Settings to Semantic Text Field (elastic#131058)
  Collect node thread pool usage for shard balancing (elastic#131480)
  Add tasks to validate new style transport versions (elastic#131782)
  Mute org.elasticsearch.search.routing.SearchReplicaSelectionIT testNodeSelection elastic#132354
  Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryIT testBadAsyncId elastic#132353
  Fixes DenseVectorFieldIndexTypeUpdateIT release tests (elastic#132346)
  Fix testCloseOrReallocateDuringPartialSnapshot (elastic#132049)
  (Doc) ILM Force Merge not on HDD and happens on hosting node not current phase tier (elastic#130280)
  Run GeoIp YAML tests in multi-project cluster and fix bug discovered by tests (elastic#131521)
  Unmutes elastic#132111, seems a transient, non reproducible issue (elastic#132253)
  Mute org.elasticsearch.search.suggest.phrase.PhraseSuggesterIT testPhraseSuggestionWithNgramOnlyAnalyzerThrowsException elastic#132347
  Add AI21 support to Inference Plugin (elastic#131238)
  OpenJDK EA builds should use https instead of http (elastic#132297)
  ESQL: Normalize timeseries aggs slightly (elastic#132284)
  Avoid internal server error on suggester ngram bad request (elastic#132321)
  [ES|QL] Rerank operator improvements (elastic#132318)
  Mute org.elasticsearch.xpack.logsdb.qa.LogsDbVersusReindexedLogsDbChallengeRestIT testTermsQuery elastic#132337
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IllegalStateException: At least one unigram is required but all tokens were ngrams

3 participants