Skip to content

Implicit index creation #48

@curquiza

Description

@curquiza

In the next release (v0.16.0), MeiliSearch would be able to create the index if it does not exist during:

  • the documents addition (add and update routes)
  • the settings update

The goal of this issue is to integrate the implicit index creation in the integration-side process by improving the user experience.

The new function index('indexUid')

Currently, here are the lines to add documents:

index = client.create_index('movies') # If your index does not exist
index = client.get_index('movies')    # If you already created your index
index.add_documents([...])

The goal is to avoid this index creation step and to provide a function that defines the "scope" where the users want to work: index(...). It will return an Index object/struct but without doing any HTTP call.

In all situations, the new way to add documents will be:

index = client.index('movies')
index.add_documents([...])

Prototype:

index(indexUid: String) -> Index object/struct

Limitation: because the index() method returns an Index object/struct but does not do any HTTP call, this object/struct contains a primary_key attribute that would be set to an undefined value which would not necessarely the reality in MeiliSearch. Indeed, we have to differentiate the Index object in RAM from the index present in the MeiliSearch instance.

Primary Key

Because the users have the possibility to skip the index creation, we should let them add the primary key during the document addition/update.

The prototypes should be:

add_documents(documents: Array, options: optional dict) -> JSON with updateId or a Update object
update_documents(documents: Array, options: optional dict) -> JSON with updateId or a Update object

The options parameter will be used as follow: { primaryKey: String }.
Ex: client.index('movies').add_documents([...], { primaryKey: 'name' })

Limitation: when writing client.index('movies').add_documents([...], { primaryKey: 'name' }) it could imply the primary key is only for this specific batch of documents. Which is not. The primary-key would be only take into account in the first documents addition. This ambiguous behavior is also present when using cURL but we should have this in mind to answer to our users if needed.

New Getting Started

Here are the modification to the getting started:

import meilisearch

client = meilisearch.Client('http://127.0.0.1:7700', 'masterKey')

# An index is where the documents are stored.
index = client.index('movies')

documents = […]
# If the index 'movies' does not exist, MeiliSearch creates it when you first add the documents.
index.add_documents(documents)
....

The unmentionned parts of the getting started stay unchanged.

What about getIndex, createIndex and getOrCreateIndex?

createIndex and getOrCreateIndex stay unchanged => ⚠️ Changes are finally needed, see the text at the bottom of the description

getIndex will now do a HTTP call before returning the Index object. In some SDKs, this method already does an HTTP call: in this case, nothing changes for this method.

⚠️ Be careful about all the methods where getIndex is used. Now .index() should be called instead, otherwise, it would do an HTTP call.

Code samples

The code samples have to be updated. Because the get_index method is almost called everywhere, almost all the code-samples are concerned.


  • meilisearch-dart
  • meilisearch-dotnet
  • meilisearch-go (need a refacto to be properly done)
  • meilisearch-java
  • meilisearch-js
  • meilisearch-php
  • meilisearch-python
  • meilisearch-ruby
  • meilisearch-rust
  • meilisearch-swift

Edit after the python implementation:

We did the implementation of the implicit index creation in the Python SDK as a first try. Here what we improved/changed compared to the description above.

  • the primary_key attribute should stay as accurate as possible, it means it should be set/updated during index manipulation (get/creation/update)
  • related to the point right above, the get_or_create_index method now does at least one (sometimes 2) HTTP call to return a Index object containing the correct primary_key value. Plus, the name get in get_or_create_index would imply an HTTP call. See the related discussion: Changes due to the implicit index creation meilisearch-python#175 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions