Skip to content

[main] Fix an OOM error when creating to many chained synonym graph token filter. (#140026)#11

Open
tomerqodo wants to merge 4 commits into
copilot_only-issues-20260113-qodo-grep-copilot_base_main_fix_an_oom_error_when_creating_to_many_chained_synonym_graph_token_filter_140026_pr68from
copilot_only-issues-20260113-qodo-grep-copilot_head_main_fix_an_oom_error_when_creating_to_many_chained_synonym_graph_token_filter_140026_pr68
Open

[main] Fix an OOM error when creating to many chained synonym graph token filter. (#140026)#11
tomerqodo wants to merge 4 commits into
copilot_only-issues-20260113-qodo-grep-copilot_base_main_fix_an_oom_error_when_creating_to_many_chained_synonym_graph_token_filter_140026_pr68from
copilot_only-issues-20260113-qodo-grep-copilot_head_main_fix_an_oom_error_when_creating_to_many_chained_synonym_graph_token_filter_140026_pr68

Conversation

@tomerqodo
Copy link
Copy Markdown

Benchmark PR from qodo-benchmark#68

afoucret and others added 4 commits December 31, 2025 10:59
…lter. (elastic#140026)

* Create YAML REST test for chained synonym filters.

* Adding some unit tests for chained synonyms graph filters creation.

* Ensure previous synonyms set from the chain aren't applied again.

(cherry picked from commit e6bd70d)
Copilot AI review requested due to automatic review settings January 14, 2026 16:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an OutOfMemoryError that occurs when creating multiple chained synonym graph token filters. The issue stems from recursive synonym expansion during the building phase, where each filter in the chain would reprocess synonyms through all previous filters, leading to exponential memory growth.

Changes:

  • Override getSynonymFilter() in SynonymGraphTokenFilterFactory to return the current filter instance, preventing recursive synonym expansion
  • Add unit test (testManyChainedSynonymGraphFilters) that creates 10 chained synonym filters with 10,000 synonyms each to verify the fix
  • Add YAML REST test to verify basic functionality of chained synonym graph filters

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
SynonymGraphTokenFilterFactory.java Implements the fix by overriding getSynonymFilter() to prevent recursive synonym expansion during filter chain building
SynonymsAnalysisTests.java Adds comprehensive unit tests for chained synonym graph filters, including an OOM regression test with many filters
50_chained_synonym_filters.yml Adds integration test verifying that chained synonym graph filters work correctly in actual index/search scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

BaseTokenStreamTestCase.assertAnalyzesTo(
indexAnalyzers.get("syn"),
"foo baz hello",
new String[] { "bar", "foo", "qux", "baz", "world", "hello" },
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected token order appears to contradict the comment on line 375 which states 'synonym_graph produces synonym first, then original'. The test expects 'bar' before 'foo', but this ordering should be explicitly verified and documented as the actual behavior, especially since it differs from the comment pattern in the YAML test file (lines 76-81) which shows original tokens first in position.

Copilot uses AI. Check for mistakes.
}

sb.append(" => ");
sb.append("syn").append(i * (j + 1)); // Shared ID appears in ALL filters
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment 'Shared ID appears in ALL filters' is misleading. The synonym target is actually unique per filter (multiplied by filter index i) and per synonym rule (multiplied by j+1). The IDs are not shared across filters but rather uniquely generated using i * (j + 1) which produces different values for each filter-synonym combination.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants