-
Notifications
You must be signed in to change notification settings - Fork 107
Add multi-modal search #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds a multimodal search code sample, an integration test validating multimodal (text/image) hybrid searches with a Voyage embedder, and a movies dataset used by the tests. Tests enable experimental multimodal features, configure the embedder in index settings, index documents, and assert expected top hits. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant T as PHPUnit Test
participant MS as Meilisearch API
participant IDX as movies Index
rect #F0F8FF
Note over T,MS: Setup — enable multimodal & configure embedder
T->>MS: PATCH /experimental-features { multimodal: true }
T->>MS: POST /indexes (create/clear movies)
T->>MS: PATCH /indexes/movies/settings { embedder: { indexingFragments, searchFragments, ... } }
end
rect #F5FFF0
Note over T,IDX: Indexing dataset
T->>IDX: addDocuments(movies.json)
IDX-->>T: tasks completed
end
rect #FFF8F0
Note over T,MS: Searches (text, image, text+image)
T->>MS: POST /indexes/movies/search { q, media: text | image | textAndPoster }
MS-->>T: hits (ranked hybrid results)
end
Note over T: Assertions
T->>T: Assert top hit titles (e.g., "Star Wars", "The Fifth Element")
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
.code-samples.meilisearch.yaml (1)
811-825: Good sample; consider showing explicit semantic weighting to make intent obvious.Add semanticRatio to mirror other hybrid examples and clarify behavior when combining media with semantic ranking.
$client->index('INDEX_NAME')->search('a futuristic movie', [ 'hybrid' => [ - 'embedder' => 'EMBEDDER_NAME' + 'embedder' => 'EMBEDDER_NAME', + 'semanticRatio' => 1 ],tests/datasets/movies.json (1)
1-133: Dataset looks solid; watch out for network flakiness from external poster URLs.Because tests fetch remote images via image_url, CI without outbound internet or transient TMDB outages can fail the suite. Consider adding a local fallback (e.g., base64 assets or a second test using image_base64) to keep tests reliable when the network is restricted.
tests/Endpoints/MultiModalSearchTest.php (6)
120-123: Guard experimental-feature toggle for older servers.If the server doesn’t support multimodal (pre-1.16) this throws and fails the suite. Skip gracefully.
- $http = new Client($this->host, getenv('MEILISEARCH_API_KEY')); - $http->patch('/experimental-features', ['multimodal' => true]); + $http = new Client($this->host, getenv('MEILISEARCH_API_KEY')); + try { + $http->patch('/experimental-features', ['multimodal' => true]); + } catch (\Throwable $e) { + self::markTestSkipped('Server does not support multimodal experimental feature: '.$e->getMessage()); + }
139-147: Prefer robust file IO and JSON checks; current fread/filesize usage is brittle across working directories.Use DIR, file_get_contents, and assert decoding to fail fast with actionable messages.
- // Load the movies.json dataset - $fileJson = fopen('./tests/datasets/movies.json', 'r'); - $documentJson = fread($fileJson, filesize('./tests/datasets/movies.json')); - fclose($fileJson); - $this->documents = json_decode($documentJson, true); + // Load the movies.json dataset + $datasetPath = __DIR__.'/../datasets/movies.json'; + $documentJson = file_get_contents($datasetPath); + self::assertNotFalse($documentJson, 'Failed to read movies dataset at '.$datasetPath); + $this->documents = json_decode($documentJson, true); + self::assertIsArray($this->documents, 'Failed to decode movies dataset JSON'); $addDocumentsPromise = $this->index->addDocuments($this->documents); $this->index->waitForTask($addDocumentsPromise['taskUid']);
157-172: Assert non-empty hits before indexing into the array.Prevents undefined offset when results are empty due to embedder/network issues.
]); - self::assertSame('Star Wars', $response->getHits()[0]['title']); + self::assertNotEmpty($response->getHits(), 'No hits returned for text-only multimodal search'); + self::assertSame('Star Wars', $response->getHits()[0]['title']);
178-183: Avoid brittle dataset indexing; locate the poster by title.Hard-coding index 3 will break if the dataset order changes.
- $theFifthElementPoster = $this->documents[3]['poster']; + $idx = array_search('The Fifth Element', array_column($this->documents, 'title'), true); + $this->assertNotFalse($idx, 'The Fifth Element not found in dataset'); + $theFifthElementPoster = $this->documents[$idx]['poster'];Also add a non-empty hits assertion similar to the text-only test:
]); - self::assertSame('The Fifth Element', $response->getHits()[0]['title']); + self::assertNotEmpty($response->getHits(), 'No hits returned for image-only multimodal search'); + self::assertSame('The Fifth Element', $response->getHits()[0]['title']);
197-205: Use deterministic asset paths and check presence.Prevents failures when the working directory differs or the asset is missing.
- $masterYodaBase64 = base64_encode(file_get_contents('./tests/assets/master-yoda.jpeg')); + $assetPath = __DIR__.'/../assets/master-yoda.jpeg'; + $this->assertFileExists($assetPath, 'Missing test asset: '.$assetPath); + $masterYodaBase64 = base64_encode((string) file_get_contents($assetPath));Also add a non-empty hits assertion before dereferencing:
]); - self::assertSame('Star Wars', $response->getHits()[0]['title']); + self::assertNotEmpty($response->getHits(), 'No hits returned for text+image multimodal search'); + self::assertSame('Star Wars', $response->getHits()[0]['title']);
116-148: Optional: clean up test artifacts to keep the test environment tidy.Delete the ephemeral index and (optionally) revert experimental features in tearDown to avoid polluting shared servers.
Add outside the shown lines:
protected function tearDown(): void { if (isset($this->index)) { try { $this->client->deleteIndex($this->index->getUid()); } catch (\Throwable $e) { // ignore cleanup failures } } parent::tearDown(); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
tests/assets/master-yoda.jpegis excluded by!**/*.jpeg
📒 Files selected for processing (3)
.code-samples.meilisearch.yaml(1 hunks)tests/Endpoints/MultiModalSearchTest.php(1 hunks)tests/datasets/movies.json(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Endpoints/MultiModalSearchTest.php (3)
src/Endpoints/Indexes.php (2)
Indexes(24-278)search(186-198)tests/TestCase.php (3)
TestCase(16-180)createEmptyIndex(139-145)safeIndexName(147-150)src/Endpoints/Delegates/HandlesDocuments.php (1)
addDocuments(41-44)
🔇 Additional comments (1)
tests/Endpoints/MultiModalSearchTest.php (1)
132-140: No change—Indexes::waitForTask is available via the HandlesTasks trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/Endpoints/MultiModalSearchTest.php (1)
109-114: Method visibility inconsistency with test class conventions.The
skipIfVoyageApiKeyIsMissingmethod is private, but based on the past review comment, test helper methods should be static when they don't access instance properties.- private function skipIfVoyageApiKeyIsMissing(): void + private function skipIfVoyageApiKeyIsMissing(): void { if (null === $this->voyageApiKey) { self::markTestSkipped('Missing `VOYAGE_API_KEY` environment variable'); } }Note: This method accesses
$this->voyageApiKey, so it cannot be static. The current implementation is correct.
🧹 Nitpick comments (2)
tests/Endpoints/MultiModalSearchTest.php (2)
17-47: Consider extracting dataset operations to a separate helper method.The setUp method is handling multiple responsibilities. Consider extracting the dataset loading and document indexing logic into a separate helper method for better organization and readability.
protected function setUp(): void { parent::setUp(); $http = new Client($this->host, getenv('MEILISEARCH_API_KEY')); $http->patch('/experimental-features', ['multimodal' => true]); $apiKey = getenv('VOYAGE_API_KEY'); if (false === $apiKey || '' === $apiKey) { $this->voyageApiKey = null; return; // This test case is skipped if the Voyage API key is not set } else { $this->voyageApiKey = $apiKey; } $this->index = $this->createEmptyIndex($this->safeIndexName()); $updateSettingsPromise = $this->index->updateSettings([ 'searchableAttributes' => ['title', 'overview'], 'embedders' => [ 'multimodal' => self::getEmbedderConfig($this->voyageApiKey), ], ]); $this->index->waitForTask($updateSettingsPromise['taskUid']); - // Load the movies.json dataset - $documentsJson = file_get_contents('./tests/datasets/movies.json'); - $this->documents = json_decode($documentsJson, true); - $addDocumentsPromise = $this->index->addDocuments($this->documents); - $this->index->waitForTask($addDocumentsPromise['taskUid']); + $this->seedTestData(); } + +private function seedTestData(): void +{ + // Load the movies.json dataset + $documentsJson = file_get_contents('./tests/datasets/movies.json'); + $this->documents = json_decode($documentsJson, true); + $addDocumentsPromise = $this->index->addDocuments($this->documents); + $this->index->waitForTask($addDocumentsPromise['taskUid']); +}
90-90: Guard against missing test asset at tests/Endpoints/MultiModalSearchTest.php:90
Add afile_exists()check fortests/assets/master-yoda.jpegbefore callingfile_get_contents()so the test fails with a clear message if the asset is ever removed or renamed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/Endpoints/MultiModalSearchTest.php(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Endpoints/MultiModalSearchTest.php (3)
src/Endpoints/Indexes.php (2)
Indexes(24-278)search(186-198)tests/TestCase.php (3)
TestCase(16-180)createEmptyIndex(139-145)safeIndexName(147-150)src/Endpoints/Delegates/HandlesDocuments.php (1)
addDocuments(41-44)
🔇 Additional comments (2)
tests/Endpoints/MultiModalSearchTest.php (2)
116-213: LGTM! Well-structured embedder configuration.The embedder configuration is comprehensive and properly handles all three modalities (text, poster, and textAndPoster) with appropriate Voyage API integration.
70-70: No changes needed: dataset fixture includes at least four entries, so accessing index 3 is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
tests/Endpoints/MultiModalSearchTest.php (6)
21-23: Avoid enabling experimental features when the test will be skipped; also reset after tests.Move the experimental toggle after the API key check so you don't mutate cluster state unnecessarily when skipping. Consider resetting it in tearDown to avoid leakage across suites.
Example addition outside the selected range:
protected function tearDown(): void { if ($this->voyageApiKey) { $http = new Client($this->host, getenv('MEILISEARCH_API_KEY')); $http->patch('/experimental-features', ['multimodal' => false]); } parent::tearDown(); }
24-31: Skip in setUp and drop per-test guards.Marking the test as skipped in setUp simplifies flow and prevents uninitialized typed property risks. Then remove the helper and calls.
Apply:
- $apiKey = getenv('VOYAGE_API_KEY'); - if (false === $apiKey || '' === $apiKey) { - $this->voyageApiKey = null; - - return; // This test case is skipped if the Voyage API key is not set - } else { - $this->voyageApiKey = $apiKey; - } + $apiKey = getenv('VOYAGE_API_KEY'); + if (!$apiKey) { + self::markTestSkipped('Missing `VOYAGE_API_KEY` environment variable'); + } + $this->voyageApiKey = $apiKey;- $this->skipIfVoyageApiKeyIsMissing(); + // running only when VOYAGE_API_KEY is set (skipped in setUp)(repeat the removal in both other tests)
- private function skipIfVoyageApiKeyIsMissing(): void - { - if (null === $this->voyageApiKey) { - self::markTestSkipped('Missing `VOYAGE_API_KEY` environment variable'); - } - } + // skip helper no longer neededAlso applies to: 49-52, 66-69, 85-88, 109-114
43-47: Make dataset loading path-agnostic and fail fast on IO errors.Rely on DIR to avoid CWD sensitivity; assert file read success to surface issues early.
- // Load the movies.json dataset - $documentsJson = file_get_contents('./tests/datasets/movies.json'); + // Load the movies.json dataset + $datasetPath = __DIR__ . '/../datasets/movies.json'; + $documentsJson = file_get_contents($datasetPath); + self::assertNotFalse($documentsJson, 'Failed to read dataset: ' . $datasetPath); $this->documents = json_decode($documentsJson, true, 512, JSON_THROW_ON_ERROR);
53-64: Add a guard before indexing into hits.Prevent undefined offset if the search yields zero hits.
- self::assertSame('Star Wars', $response->getHits()[0]['title']); + self::assertNotEmpty($response->getHits(), 'No search hits returned'); + self::assertSame('Star Wars', $response->getHits()[0]['title']);
70-75: Avoid brittle dataset indexing by position.Selecting by fixed index ties the test to file order. Pick the poster by title instead.
- $theFifthElementPoster = $this->documents[3]['poster']; + $theFifthElementPoster = null; + foreach ($this->documents as $doc) { + if (($doc['title'] ?? null) === 'The Fifth Element') { + $theFifthElementPoster = $doc['poster'] ?? null; + break; + } + } + self::assertNotEmpty($theFifthElementPoster, 'Poster for "The Fifth Element" not found in dataset');Also guard hits:
- self::assertSame('The Fifth Element', $response->getHits()[0]['title']); + self::assertNotEmpty($response->getHits(), 'No search hits returned'); + self::assertSame('The Fifth Element', $response->getHits()[0]['title']);Also applies to: 82-83
89-99: Harden image read and use DIR for path.Avoid CWD issues and TypeError when file_get_contents fails.
- $masterYodaBase64 = base64_encode(file_get_contents('./tests/assets/master-yoda.jpeg')); + $imagePath = __DIR__ . '/../assets/master-yoda.jpeg'; + $imageBytes = file_get_contents($imagePath); + self::assertNotFalse($imageBytes, 'Failed to read image: ' . $imagePath); + $masterYodaBase64 = base64_encode($imageBytes);Guard hits:
- self::assertSame('Star Wars', $response->getHits()[0]['title']); + self::assertNotEmpty($response->getHits(), 'No search hits returned'); + self::assertSame('Star Wars', $response->getHits()[0]['title']);Also applies to: 105-107
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/Endpoints/MultiModalSearchTest.php(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Endpoints/MultiModalSearchTest.php (3)
src/Endpoints/Indexes.php (2)
Indexes(24-278)search(186-198)tests/TestCase.php (3)
TestCase(16-180)createEmptyIndex(139-145)safeIndexName(147-150)src/Endpoints/Delegates/HandlesDocuments.php (1)
addDocuments(41-44)
🔇 Additional comments (2)
tests/Endpoints/MultiModalSearchTest.php (2)
44-44: Nice: strict JSON decoding with exceptions.Using JSON_THROW_ON_ERROR is spot-on for surfacing dataset issues.
116-123: No changes needed: endpoint, embedding dimension, and image_base64 schema are all correct as implemented.
|
Thanks for the review @norkunas 🙌 |
Pull Request
Related issue
Fixes #775
What does this PR do?
VOYAGE_API_KEYfrom VoyageAI)Update settings methods to allow configuring
indexingFragmentsandsearchFragmentswas not required as the settings methods are not typed currently.PR checklist
Please check if your PR fulfills the following requirements:
Thank you so much for contributing to Meilisearch!
Summary by CodeRabbit
Documentation
Tests