Skip to content

Conversation

@Strift
Copy link
Collaborator

@Strift Strift commented Sep 5, 2025

Pull Request

Related issue

Fixes #775

What does this PR do?

  • Add tests for multi-modal search and embeddings (requires VOYAGE_API_KEY from VoyageAI)
  • Add code sample

Update settings methods to allow configuring indexingFragments and searchFragments was not required as the settings methods are not typed currently.

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Summary by CodeRabbit

  • Documentation

    • Added a new code sample demonstrating multimodal search with a hybrid embedder using text plus a poster image (JPEG/base64) in the search payload.
  • Tests

    • Added integration tests for text-only, image-only, and combined text-and-image multimodal searches.
    • Included a movies dataset (11 entries) to support the tests.
    • Tests are gated by an external embedder API key and validate expected top search results.

@coderabbitai
Copy link

coderabbitai bot commented Sep 5, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds a multimodal search code sample, an integration test validating multimodal (text/image) hybrid searches with a Voyage embedder, and a movies dataset used by the tests. Tests enable experimental multimodal features, configure the embedder in index settings, index documents, and assert expected top hits.

Changes

Cohort / File(s) Summary
Code Samples: Multimodal Search
\.code-samples.meilisearch.yaml
Appends search_parameter_reference_media_1 example demonstrating a hybrid search against INDEX_NAME using a textAndPoster media payload with text and an image object (mime: image/jpeg, data as base64).
Tests: Multimodal Flow
tests/Endpoints/MultiModalSearchTest.php
Adds PHPUnit integration tests that enable experimental multimodal features, configure a Voyage multimodal embedder (indexingFragments/searchFragments, request/response shapes), create/clear the movies index, load and index movies.json, and assert text-only, image-only, and text+image search results.
Test Data
tests/datasets/movies.json
Adds a dataset of 11 movie records (id, title, overview, genres, poster URLs, release_date timestamps) used by the new tests.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant T as PHPUnit Test
  participant MS as Meilisearch API
  participant IDX as movies Index

  rect #F0F8FF
  Note over T,MS: Setup — enable multimodal & configure embedder
  T->>MS: PATCH /experimental-features { multimodal: true }
  T->>MS: POST /indexes (create/clear movies)
  T->>MS: PATCH /indexes/movies/settings { embedder: { indexingFragments, searchFragments, ... } }
  end

  rect #F5FFF0
  Note over T,IDX: Indexing dataset
  T->>IDX: addDocuments(movies.json)
  IDX-->>T: tasks completed
  end

  rect #FFF8F0
  Note over T,MS: Searches (text, image, text+image)
  T->>MS: POST /indexes/movies/search { q, media: text | image | textAndPoster }
  MS-->>T: hits (ranked hybrid results)
  end

  Note over T: Assertions
  T->>T: Assert top hit titles (e.g., "Star Wars", "The Fifth Element")
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I twitched my nose at pixels bright,
Text and posters danced in light.
I hopped through hits with whiskered cheer,
Star Wars leapt to the top, so near.
Carrots, code, and tests—hippity-hop! 🥕🐇


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d369703 and 36452f5.

📒 Files selected for processing (1)
  • tests/Endpoints/MultiModalSearchTest.php (1 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/add-multi-modal-search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Strift Strift self-assigned this Sep 5, 2025
@Strift Strift added the enhancement New feature or request label Sep 5, 2025
@Strift Strift marked this pull request as ready for review September 5, 2025 07:12
@Strift
Copy link
Collaborator Author

Strift commented Sep 5, 2025

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Sep 5, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
.code-samples.meilisearch.yaml (1)

811-825: Good sample; consider showing explicit semantic weighting to make intent obvious.

Add semanticRatio to mirror other hybrid examples and clarify behavior when combining media with semantic ranking.

   $client->index('INDEX_NAME')->search('a futuristic movie', [
     'hybrid' => [
-      'embedder' => 'EMBEDDER_NAME'
+      'embedder' => 'EMBEDDER_NAME',
+      'semanticRatio' => 1
     ],
tests/datasets/movies.json (1)

1-133: Dataset looks solid; watch out for network flakiness from external poster URLs.

Because tests fetch remote images via image_url, CI without outbound internet or transient TMDB outages can fail the suite. Consider adding a local fallback (e.g., base64 assets or a second test using image_base64) to keep tests reliable when the network is restricted.

tests/Endpoints/MultiModalSearchTest.php (6)

120-123: Guard experimental-feature toggle for older servers.

If the server doesn’t support multimodal (pre-1.16) this throws and fails the suite. Skip gracefully.

-        $http = new Client($this->host, getenv('MEILISEARCH_API_KEY'));
-        $http->patch('/experimental-features', ['multimodal' => true]);
+        $http = new Client($this->host, getenv('MEILISEARCH_API_KEY'));
+        try {
+            $http->patch('/experimental-features', ['multimodal' => true]);
+        } catch (\Throwable $e) {
+            self::markTestSkipped('Server does not support multimodal experimental feature: '.$e->getMessage());
+        }

139-147: Prefer robust file IO and JSON checks; current fread/filesize usage is brittle across working directories.

Use DIR, file_get_contents, and assert decoding to fail fast with actionable messages.

-        // Load the movies.json dataset
-        $fileJson = fopen('./tests/datasets/movies.json', 'r');
-        $documentJson = fread($fileJson, filesize('./tests/datasets/movies.json'));
-        fclose($fileJson);
-        $this->documents = json_decode($documentJson, true);
+        // Load the movies.json dataset
+        $datasetPath = __DIR__.'/../datasets/movies.json';
+        $documentJson = file_get_contents($datasetPath);
+        self::assertNotFalse($documentJson, 'Failed to read movies dataset at '.$datasetPath);
+        $this->documents = json_decode($documentJson, true);
+        self::assertIsArray($this->documents, 'Failed to decode movies dataset JSON');
         $addDocumentsPromise = $this->index->addDocuments($this->documents);
         $this->index->waitForTask($addDocumentsPromise['taskUid']);

157-172: Assert non-empty hits before indexing into the array.

Prevents undefined offset when results are empty due to embedder/network issues.

         ]);
-        self::assertSame('Star Wars', $response->getHits()[0]['title']);
+        self::assertNotEmpty($response->getHits(), 'No hits returned for text-only multimodal search');
+        self::assertSame('Star Wars', $response->getHits()[0]['title']);

178-183: Avoid brittle dataset indexing; locate the poster by title.

Hard-coding index 3 will break if the dataset order changes.

-        $theFifthElementPoster = $this->documents[3]['poster'];
+        $idx = array_search('The Fifth Element', array_column($this->documents, 'title'), true);
+        $this->assertNotFalse($idx, 'The Fifth Element not found in dataset');
+        $theFifthElementPoster = $this->documents[$idx]['poster'];

Also add a non-empty hits assertion similar to the text-only test:

         ]);
-        self::assertSame('The Fifth Element', $response->getHits()[0]['title']);
+        self::assertNotEmpty($response->getHits(), 'No hits returned for image-only multimodal search');
+        self::assertSame('The Fifth Element', $response->getHits()[0]['title']);

197-205: Use deterministic asset paths and check presence.

Prevents failures when the working directory differs or the asset is missing.

-        $masterYodaBase64 = base64_encode(file_get_contents('./tests/assets/master-yoda.jpeg'));
+        $assetPath = __DIR__.'/../assets/master-yoda.jpeg';
+        $this->assertFileExists($assetPath, 'Missing test asset: '.$assetPath);
+        $masterYodaBase64 = base64_encode((string) file_get_contents($assetPath));

Also add a non-empty hits assertion before dereferencing:

         ]);
-        self::assertSame('Star Wars', $response->getHits()[0]['title']);
+        self::assertNotEmpty($response->getHits(), 'No hits returned for text+image multimodal search');
+        self::assertSame('Star Wars', $response->getHits()[0]['title']);

116-148: Optional: clean up test artifacts to keep the test environment tidy.

Delete the ephemeral index and (optionally) revert experimental features in tearDown to avoid polluting shared servers.

Add outside the shown lines:

protected function tearDown(): void
{
    if (isset($this->index)) {
        try {
            $this->client->deleteIndex($this->index->getUid());
        } catch (\Throwable $e) {
            // ignore cleanup failures
        }
    }
    parent::tearDown();
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 59f3ede and d39da8e.

⛔ Files ignored due to path filters (1)
  • tests/assets/master-yoda.jpeg is excluded by !**/*.jpeg
📒 Files selected for processing (3)
  • .code-samples.meilisearch.yaml (1 hunks)
  • tests/Endpoints/MultiModalSearchTest.php (1 hunks)
  • tests/datasets/movies.json (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Endpoints/MultiModalSearchTest.php (3)
src/Endpoints/Indexes.php (2)
  • Indexes (24-278)
  • search (186-198)
tests/TestCase.php (3)
  • TestCase (16-180)
  • createEmptyIndex (139-145)
  • safeIndexName (147-150)
src/Endpoints/Delegates/HandlesDocuments.php (1)
  • addDocuments (41-44)
🔇 Additional comments (1)
tests/Endpoints/MultiModalSearchTest.php (1)

132-140: No change—Indexes::waitForTask is available via the HandlesTasks trait.

@Strift Strift requested review from a team and norkunas September 5, 2025 07:25
@Strift Strift requested a review from norkunas September 10, 2025 07:35
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/Endpoints/MultiModalSearchTest.php (1)

109-114: Method visibility inconsistency with test class conventions.

The skipIfVoyageApiKeyIsMissing method is private, but based on the past review comment, test helper methods should be static when they don't access instance properties.

-    private function skipIfVoyageApiKeyIsMissing(): void
+    private function skipIfVoyageApiKeyIsMissing(): void
     {
         if (null === $this->voyageApiKey) {
             self::markTestSkipped('Missing `VOYAGE_API_KEY` environment variable');
         }
     }

Note: This method accesses $this->voyageApiKey, so it cannot be static. The current implementation is correct.

🧹 Nitpick comments (2)
tests/Endpoints/MultiModalSearchTest.php (2)

17-47: Consider extracting dataset operations to a separate helper method.

The setUp method is handling multiple responsibilities. Consider extracting the dataset loading and document indexing logic into a separate helper method for better organization and readability.

 protected function setUp(): void
 {
     parent::setUp();

     $http = new Client($this->host, getenv('MEILISEARCH_API_KEY'));
     $http->patch('/experimental-features', ['multimodal' => true]);

     $apiKey = getenv('VOYAGE_API_KEY');
     if (false === $apiKey || '' === $apiKey) {
         $this->voyageApiKey = null;

         return; // This test case is skipped if the Voyage API key is not set
     } else {
         $this->voyageApiKey = $apiKey;
     }

     $this->index = $this->createEmptyIndex($this->safeIndexName());
     $updateSettingsPromise = $this->index->updateSettings([
         'searchableAttributes' => ['title', 'overview'],
         'embedders' => [
             'multimodal' => self::getEmbedderConfig($this->voyageApiKey),
         ],
     ]);
     $this->index->waitForTask($updateSettingsPromise['taskUid']);

-    // Load the movies.json dataset
-    $documentsJson = file_get_contents('./tests/datasets/movies.json');
-    $this->documents = json_decode($documentsJson, true);
-    $addDocumentsPromise = $this->index->addDocuments($this->documents);
-    $this->index->waitForTask($addDocumentsPromise['taskUid']);
+    $this->seedTestData();
 }
+
+private function seedTestData(): void
+{
+    // Load the movies.json dataset
+    $documentsJson = file_get_contents('./tests/datasets/movies.json');
+    $this->documents = json_decode($documentsJson, true);
+    $addDocumentsPromise = $this->index->addDocuments($this->documents);
+    $this->index->waitForTask($addDocumentsPromise['taskUid']);
+}

90-90: Guard against missing test asset at tests/Endpoints/MultiModalSearchTest.php:90
Add a file_exists() check for tests/assets/master-yoda.jpeg before calling file_get_contents() so the test fails with a clear message if the asset is ever removed or renamed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d39da8e and 7414150.

📒 Files selected for processing (1)
  • tests/Endpoints/MultiModalSearchTest.php (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Endpoints/MultiModalSearchTest.php (3)
src/Endpoints/Indexes.php (2)
  • Indexes (24-278)
  • search (186-198)
tests/TestCase.php (3)
  • TestCase (16-180)
  • createEmptyIndex (139-145)
  • safeIndexName (147-150)
src/Endpoints/Delegates/HandlesDocuments.php (1)
  • addDocuments (41-44)
🔇 Additional comments (2)
tests/Endpoints/MultiModalSearchTest.php (2)

116-213: LGTM! Well-structured embedder configuration.

The embedder configuration is comprehensive and properly handles all three modalities (text, poster, and textAndPoster) with appropriate Voyage API integration.


70-70: No changes needed: dataset fixture includes at least four entries, so accessing index 3 is safe.

@Strift Strift requested a review from norkunas September 10, 2025 07:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
tests/Endpoints/MultiModalSearchTest.php (6)

21-23: Avoid enabling experimental features when the test will be skipped; also reset after tests.

Move the experimental toggle after the API key check so you don't mutate cluster state unnecessarily when skipping. Consider resetting it in tearDown to avoid leakage across suites.

Example addition outside the selected range:

protected function tearDown(): void
{
    if ($this->voyageApiKey) {
        $http = new Client($this->host, getenv('MEILISEARCH_API_KEY'));
        $http->patch('/experimental-features', ['multimodal' => false]);
    }
    parent::tearDown();
}

24-31: Skip in setUp and drop per-test guards.

Marking the test as skipped in setUp simplifies flow and prevents uninitialized typed property risks. Then remove the helper and calls.

Apply:

-        $apiKey = getenv('VOYAGE_API_KEY');
-        if (false === $apiKey || '' === $apiKey) {
-            $this->voyageApiKey = null;
-
-            return; // This test case is skipped if the Voyage API key is not set
-        } else {
-            $this->voyageApiKey = $apiKey;
-        }
+        $apiKey = getenv('VOYAGE_API_KEY');
+        if (!$apiKey) {
+            self::markTestSkipped('Missing `VOYAGE_API_KEY` environment variable');
+        }
+        $this->voyageApiKey = $apiKey;
-        $this->skipIfVoyageApiKeyIsMissing();
+        // running only when VOYAGE_API_KEY is set (skipped in setUp)

(repeat the removal in both other tests)

-    private function skipIfVoyageApiKeyIsMissing(): void
-    {
-        if (null === $this->voyageApiKey) {
-            self::markTestSkipped('Missing `VOYAGE_API_KEY` environment variable');
-        }
-    }
+    // skip helper no longer needed

Also applies to: 49-52, 66-69, 85-88, 109-114


43-47: Make dataset loading path-agnostic and fail fast on IO errors.

Rely on DIR to avoid CWD sensitivity; assert file read success to surface issues early.

-        // Load the movies.json dataset
-        $documentsJson = file_get_contents('./tests/datasets/movies.json');
+        // Load the movies.json dataset
+        $datasetPath = __DIR__ . '/../datasets/movies.json';
+        $documentsJson = file_get_contents($datasetPath);
+        self::assertNotFalse($documentsJson, 'Failed to read dataset: ' . $datasetPath);
         $this->documents = json_decode($documentsJson, true, 512, JSON_THROW_ON_ERROR);

53-64: Add a guard before indexing into hits.

Prevent undefined offset if the search yields zero hits.

-        self::assertSame('Star Wars', $response->getHits()[0]['title']);
+        self::assertNotEmpty($response->getHits(), 'No search hits returned');
+        self::assertSame('Star Wars', $response->getHits()[0]['title']);

70-75: Avoid brittle dataset indexing by position.

Selecting by fixed index ties the test to file order. Pick the poster by title instead.

-        $theFifthElementPoster = $this->documents[3]['poster'];
+        $theFifthElementPoster = null;
+        foreach ($this->documents as $doc) {
+            if (($doc['title'] ?? null) === 'The Fifth Element') {
+                $theFifthElementPoster = $doc['poster'] ?? null;
+                break;
+            }
+        }
+        self::assertNotEmpty($theFifthElementPoster, 'Poster for "The Fifth Element" not found in dataset');

Also guard hits:

-        self::assertSame('The Fifth Element', $response->getHits()[0]['title']);
+        self::assertNotEmpty($response->getHits(), 'No search hits returned');
+        self::assertSame('The Fifth Element', $response->getHits()[0]['title']);

Also applies to: 82-83


89-99: Harden image read and use DIR for path.

Avoid CWD issues and TypeError when file_get_contents fails.

-        $masterYodaBase64 = base64_encode(file_get_contents('./tests/assets/master-yoda.jpeg'));
+        $imagePath = __DIR__ . '/../assets/master-yoda.jpeg';
+        $imageBytes = file_get_contents($imagePath);
+        self::assertNotFalse($imageBytes, 'Failed to read image: ' . $imagePath);
+        $masterYodaBase64 = base64_encode($imageBytes);

Guard hits:

-        self::assertSame('Star Wars', $response->getHits()[0]['title']);
+        self::assertNotEmpty($response->getHits(), 'No search hits returned');
+        self::assertSame('Star Wars', $response->getHits()[0]['title']);

Also applies to: 105-107

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7414150 and d369703.

📒 Files selected for processing (1)
  • tests/Endpoints/MultiModalSearchTest.php (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Endpoints/MultiModalSearchTest.php (3)
src/Endpoints/Indexes.php (2)
  • Indexes (24-278)
  • search (186-198)
tests/TestCase.php (3)
  • TestCase (16-180)
  • createEmptyIndex (139-145)
  • safeIndexName (147-150)
src/Endpoints/Delegates/HandlesDocuments.php (1)
  • addDocuments (41-44)
🔇 Additional comments (2)
tests/Endpoints/MultiModalSearchTest.php (2)

44-44: Nice: strict JSON decoding with exceptions.

Using JSON_THROW_ON_ERROR is spot-on for surfacing dataset issues.


116-123: No changes needed: endpoint, embedding dimension, and image_base64 schema are all correct as implemented.

@Strift Strift requested a review from norkunas September 10, 2025 07:56
@Strift
Copy link
Collaborator Author

Strift commented Sep 10, 2025

Thanks for the review @norkunas 🙌

@Strift Strift merged commit fbc80e1 into main Sep 10, 2025
10 of 11 checks passed
@Strift Strift deleted the feat/add-multi-modal-search branch September 10, 2025 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[v1.16.0] Add support for multi-modal search

3 participants