From e4d5bb1b6b50ebcafd404df84feeec1923e15d7a Mon Sep 17 00:00:00 2001 From: Guillaume Mourier Date: Thu, 10 Jun 2021 09:54:09 +0200 Subject: [PATCH 01/10] Indexing NDJSONs (#29) * initialize a draft for json lines indexation support specification * update filename number to match related pull-request * update specs * update link to CSV spec * update spec name * Apply typos correction from code review Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com> * fix typo * update impact on documentation part * replace file by data * add information about giving application/json content-type or not for a json payload * updates error codes, curl instructions * moved behavior about missing content-type in explanation part Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com> --- text/0029-indexing-ndjson.md | 136 +++++++++++++++++++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 text/0029-indexing-ndjson.md diff --git a/text/0029-indexing-ndjson.md b/text/0029-indexing-ndjson.md new file mode 100644 index 00000000..e4f48516 --- /dev/null +++ b/text/0029-indexing-ndjson.md @@ -0,0 +1,136 @@ +- Title: Indexing NDJSON +- Start Date: 2021-04-12 +- Specification PR: [PR-#29](https://github.com/meilisearch/specifications/pull/29) +- MeiliSearch Tracking-Issues: TBD + +# Indexing NDJSON + +## 1. Feature Description and Interaction + +### I. Summary + +To index documents, the body of the add documents request has to match a specific format. That specific format is then parsed and tokenized inside MeiliSearch. After which, the documents added are in the pool of searchable and returnable documents. + +An [NDJSON](http://ndjson.org/) data format is easier to use than a CSV format because it propose a convenient format for storing structured data. + +### II. Motivation + +Currently, the engine only accepts JSON format as a data source. We want to give users the possibility of another simple data format to use. Thus, give them more versatility at the data source choices for the indexing step. + +Writing performance is also a motivation since JSON Lines data parsing is less CPU and memory-intensive than parsing standard JSON. When new lines represent separate entries it makes the NDJSON data streamable, thus, more suited for indexing a consequent data set. + +While we give the ability to Meilisearch to ingest CSV data for indexing in this [specification](https://github.com/meilisearch/specifications/pull/28), we are aware of the limitations of CSV so we also want to provide a format that is easy to validate. Handling the validity of a CSV can be frustrating and difficult. Only strings can be managed within a CSV. In addition, there is no official specification except [RFC 4180](https://tools.ietf.org/html/rfc4180) which is not sufficient for all data scheme. + +Representing nested structures in a JSON object is easy and convenient. + +### III. Additional Materials + +TBD + +### IV. Explanation + +Newline-delimited JSON (`ndjson`), line-delimited JSON (`ldjson`), JSON lines (`jsonl`) are three terms expressing the same formats primarily intended for JSON streaming. + +As of now, we will use `ndjson` in the next parts to refer to a data format that represents JSON entries separated by a new line character. + +- Each entries will represent a document for MeiliSearch. +- Each entries should be a valid JSON object. +- The data should be encoded in UTF-8. + +#### Example of a valid NJSON + +Given the NDJSON payload +''' +{"id":1, "label": "t-shirt", "price": 4.99, "colors": ["red", "green", "blue"]} +{"id":499, "label": "hoodie", "price": 19.99, "colors": ["purple"]} +''' +the search result should be displayed as +```json +{ + "hits": [ + { + "id": 1, + "label": "t-shirt", + "price": 4.99, + "colors": [ + "red", + "green", + "blue" + ], + }, + { + "id": 499, + "label": "hoodie", + "price": 19.99, + "colors": [ + "purple" + ], + } + ], + ... +} +``` + +#### API Endpoints + +> Each API endpoints mentioned above will now require a `application/x-ndjson` as `Content-Type` header to be processed as NDJSON data. +> ⚠ A missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. + +#### Add or Replace Documents [πŸ“Ž](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) + +```curl +curl \ + -X POST 'http://localhost:7700/indexes/movies/documents' \ + -H 'Content-Type: application/x-ndjson' \ + --binary-data ' + {"id":1, "label": "t-shirt", "price": 4.99, "colors": ["red", "green", "blue"]}\n + {"id":499, "label": "hoodie", "price": 19.99, "colors": ["purple"]} + ' +``` +> Response code: 202 Accepted + +##### Error codes + +> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. +> - Too large payload according to the limit should return a `413 payload_too_large` error +> - Wrong encoding should return a `400 bad_request` error +> - Invalid NDJSON data should return a `400 bad_request` error + +### Add or Update Documents [πŸ“Ž](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents) + +```curl +curl \ + -X PUT 'http://localhost:7700/indexes/movies/documents' \ + -H 'Content-Type: application/x-ndjson' \ + --binary-data ' + {"id":1, "label": "t-shirt", "price": 4.99, "colors": ["red", "green", "blue"]}\n + {"id":499, "label": "hoodie", "price": 19.99, "colors": ["purple"]} + ' +``` +> Response code: 202 Accepted + +##### Errors handling + +> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. +> - Too large payload according to the limit should return a `413 payload_too_large` error +> - Wrong encoding should return a `400 bad_request` error +> - Invalid NDJSON data should return a `400 bad_request` error + +### V. Impact on documentation + +This feature should impact MeiliSearch users documentation by adding the possibility to use `ndjson` as an accepted format in the Documents scope at [Add or replace documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) and [Add or update documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents). It should also mention that a missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. + +We should also not only mention JSON format in `unsupported_media_type` section on the [errors page](https://docs.meilisearch.com/errors/#unsupported_media_type) and add `ndjson` format. The documentation says "Currently, MeiliSearch supports only JSON payloads." + +Documentation should also guide the user in the correct way to properly format and send ndjson data. Adding a dedicated page for the purpose of formatting and sending ndjson data should be considered. + +### VI. Impact on SDKs + +This feature should impact MeiliSearch SDK's in the future by adding the possibility to send ndjson data to MeiliSearch on the previous explicited endpoints. + +## 2. Technical Aspects +N/A + +## 3. Future possibilities +- Provide an interface in the future dashboard to upload NDJSON data into an index. +- Set a payload limit directly related to the type of data format. Currently, the payload size is equivalent to [JSON payload size](https://docs.meilisearch.com/reference/features/configuration.html#payload-limit-size). Metrics on feature usage and configuration update should help to choose a better suited value for this type of data. From f518075757af2d7e7d1d05e9a3c85671e4e809b2 Mon Sep 17 00:00:00 2001 From: Guillaume Mourier Date: Wed, 16 Jun 2021 15:27:10 +0200 Subject: [PATCH 02/10] Indexing CSVs (#28) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Initiate csv indexation support specification * update spec file name to match pull request id * update csv indexation spec * fix code examples and typos * fix typos * update spec name * Update header part to match MeiliSearch Tracking-Issues * update spec from the equivalent ndjson spec reviews * update --data sample examples * add information about giving application/json content-type or not for a json payload * Apply suggestions from code review Co-authored-by: ClΓ©ment Renault * Change curl --data param to --binary-data in examples Co-authored-by: ClΓ©ment Renault * updates error codes * moved behavior about missing content-type in explanation part * Apply suggestions from code review Co-authored-by: ClΓ©ment Renault Co-authored-by: ClΓ©ment Renault --- text/0028-indexing-csv.md | 159 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 text/0028-indexing-csv.md diff --git a/text/0028-indexing-csv.md b/text/0028-indexing-csv.md new file mode 100644 index 00000000..8eeb9ab9 --- /dev/null +++ b/text/0028-indexing-csv.md @@ -0,0 +1,159 @@ +- Title: Indexing CSV +- Start Date: 2021-04-9 +- Specification PR: [PR-#28](https://github.com/meilisearch/specifications/pull/28) +- MeiliSearch Tracking-Issues: + +# Indexing CSV + +## 1. Feature Description and Interaction + +### I. Summary + +To index documents, the body of the add documents request has to match a specific format. That specific format is then parsed and tokenized inside MeiliSearch. After which, the documents added are in the pool of searchable and returnable documents. + +A [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) data format is broadly used to store and exchange data in a simple format. + +Also, in order to boost write performance CSV data format is more suited than JSON for consequent datasets, as keys are not duplicated for every document. + +### II. Motivation + +We want to provide our users with an always improved usage experience. Currently, the engine only accepts JSON format as a data source. We want to give users the possibility of another simple data format, well known, to use. Thus, give them more versatility at the data source choices for the indexing (add and update) step. + +Since most SQL engines or SQL clients can easily dump data as CSV, it will facilitate MeiliSearch adoption by extending the indexing step on a wider range of customer cases than before. + +Writing performance is also considered as a motivation since CSV parsing is less CPU and memory intensive than parsing Json due to the streamable capability of the CSV format. + +### III. Additional Materials +N/A + +### IV.Explanation + +#### Csv Formatting Rules + +While there's [RFC 4180](https://tools.ietf.org/html/rfc4180) as a try to add a specification for CSV format, we will find a lot of variations from that. MeiliSearch features capabilities requires CSV data to be formatted the proper way to be parsable by the engine. + +- CSV data format needs to contain a first line representing the list of attributes with the optionally chosen type separated from the attribute name by `:` character. The type is case insensitive. + +> An attribute can be specificed with two types: `string` or `number`. A missing type will be interpreted as a `string` by default. +> +> Valid headline example: "id:number","title:string","author","price:number" + +- The following CSV lines will represent a document for MeiliSearch. +- A CSV value should be enclosed in double-quotes when it contains a comma character or a newline to escape it. +- Using double-quotes to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote as mentioned in [RFC 4180](https://tools.ietf.org/html/rfc4180). +- Float value should be written with a `.` character, like `3.14`. +- CSV text should be encoded in UTF8. +- The format can't handle array cell values. We are providing `nd-json` format to deal with theses types of attribute in a easier way. + +##### Example with a comma inside a cell + +Given the CSV payload +``` +"id:number","label","price:number","colors","description" +"1","t-shirt","4.99","red","Thus, you will rock at summer time." +``` +the search result should be displayed as +```json +{ + "hits": [ + { + "id": 1, + "label": "t-shirt", + "price": 4.99, + "colors": "red", + "description": "Hey, you will rock at summer time." + } + ], + ... +} +``` + +##### Example with a double quote inside a cell + +Given the CSV payload +``` +"id:number","label","price","colors","description" +"1","t-shirt","4.99","red","Hey, you will ""rock"" at summer time." +``` +the search result should be displayed as +```json +{ + "hits": [ + { + "id": 1, + "label": "t-shirt", + "price": "4.99", + "colors": "red", + "description": "Hey, you will rock at summer time.", + } + ], + ... +} +``` + +> Note that the price attribute was not typed as a number. By default, MeiliSearch type it as a string. + +#### API Endpoints + +> Each API endpoints mentioned above will now require a `text/csv` as `Content-Type` header to process CSV data. +> ⚠ A missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. + +#### Add or Replace Documents [πŸ“Ž](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) + +```bash +curl \ + -X POST 'http://localhost:7700/indexes/movies/documents' \ + -H 'Content-Type: text/csv' \ + --binary-data ' + "id","label","price:number","colors","description"\n + "1","hoodie","19.99","purple","Hey, you will rock at summer time." + ' +``` +> Response code: 202 Accepted + +##### Error codes + +> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. +> - Too large payload according to the limit should return a `413 payload_too_large` error +> - Wrong encoding should return a `400 bad_request` error +> - Invalid CSV data should return a `400 bad_request` error + +### Add or Update Documents [πŸ“Ž](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents) + +```bash +curl \ + -X PUT 'http://localhost:7700/indexes/movies/documents' \ + -H 'Content-Type: text/csv' \ + --binary-data ' + "id","label","price:number","colors","description"\n + "1","hoodie","19.99","purple","Hey, you will rock at summer time." + ' +``` +> Response code: 202 Accepted + +##### Errors handling + +> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. +> - Too large payload according to the limit should return a `413 payload_too_large` error +> - Wrong encoding should return a `400 bad_request` error +> - Invalid CSV data should return a `400 bad_request` error + +### V. Impact on documentation + +This feature should impact MeiliSearch users documentation by adding mention of csv capability inside Documents scope at [Add or replace documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) and [Add or update documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents). It should also mention that a missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. + +We should also not only mention JSON format in `unsupported_media_type` section on the [errors page](https://docs.meilisearch.com/errors/#unsupported_media_type) and add CSV format. The documentation says "Currently, MeiliSearch supports only JSON payloads." + +Documentation should also guide the user in the correct way to properly format and send csv data. Adding a dedicated page for the purpose of formatting and sending CSV data should be considered. + +### VI. Impact on SDKs + +This feature should impact MeiliSearch SDKs in the future by adding the possibility to send a CSV to MeiliSearch on the previous explicit endpoints. Simplifying the typing of the headers could also be handled by the SDKs. + +## 2. Technical Aspects +N/A + +## 3. Future possibilities + +- Provide an interface in the future dashboard to upload CSV data into an index and optionally provide the headers types. +- Set a payload limit directly related to the type of data format. Currently, the payload size is equivalent to [JSON payload size](https://docs.meilisearch.com/reference/features/configuration.html#payload-limit-size). Metrics on feature usage and configuration update should help to choose a better suited value for this type of data format. From b72d23b8f219c58fe605c41df8fa543d91dcb4b0 Mon Sep 17 00:00:00 2001 From: Guillaume Mourier Date: Thu, 12 Aug 2021 15:37:26 +0200 Subject: [PATCH 03/10] Sort (#55) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * initialize sort specification * wip specification body * fix typos and details * Add OpenAPI changes * Rename API routes * fix typo * add a part about measuring success * specify metrics to send on Amplitude * add lexicographical order for string type * Typos, invalid_sort error definition * Apply suggestions from code review Co-authored-by: ClΓ©mentine Urquizar * remove non correct information about mistyped ranking rule that do not raise error * fix typos from reviews * Update text/0055-sort.md Co-authored-by: gui machiavelli Co-authored-by: ClΓ©mentine Urquizar Co-authored-by: gui machiavelli --- open-api.yaml | 1 - 1 file changed, 1 deletion(-) diff --git a/open-api.yaml b/open-api.yaml index 4415eb94..c1ed64ac 100644 --- a/open-api.yaml +++ b/open-api.yaml @@ -362,7 +362,6 @@ components: - sort - exactness - release_date:asc - examples: [] filterableAttributes: type: array description: | From eefce292c3279b44e946bb165cb3f9e7c41b538e Mon Sep 17 00:00:00 2001 From: Guillaume Mourier Date: Wed, 25 Aug 2021 15:28:32 +0200 Subject: [PATCH 04/10] Geosearch (#59) * Initialize draft specification for geo-search feature * add future possibilities * Update specification * mention errors and aspects about filterableAttributes and sortableAttributes * Add measure and finalized key changes * Add description in OpenApi * remove old falsy sentence * Add definition and explanation for error * fix rebase on develop * Specify missing edge cases (#63) * Initialize draft specification for geo-search feature * add future possibilities * Update specification * mention errors and aspects about filterableAttributes and sortableAttributes * Add measure and finalized key changes * Add description in OpenApi * remove old falsy sentence * Add definition and explanation for error * fix rebase on develop * update open-api.yml with description on _geoPoint built-in sort rule and _geo field * Apply suggestions from code review Co-authored-by: gui machiavelli * remove - char in geo-search * update invalid_geo_field error definition Co-authored-by: gui machiavelli --- open-api.yaml | 35 +++++- text/0059-geo-search.md | 243 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 276 insertions(+), 2 deletions(-) create mode 100644 text/0059-geo-search.md diff --git a/open-api.yaml b/open-api.yaml index c1ed64ac..d7f6e7e6 100644 --- a/open-api.yaml +++ b/open-api.yaml @@ -124,6 +124,7 @@ components: length: 5 - start: 155 length: 5 + description: '' properties: _formatted: type: object @@ -140,7 +141,9 @@ components: - string - number description: Retrieve attributes of the document. `attributesToRetrieve` controls these fields. - description: '' + _geoDistance: + type: number + description: 'Using _geoPoint({lat}, {lng}) built-in sort rule at search leads the engine to return a _geoDistance within the search results. This field represents the distance in meters of the document from the specified _geoPoint.' documentId: oneOf: - type: number @@ -156,6 +159,9 @@ components: - String: `"something > 1 AND genres=comedy AND (genres=horror OR title=batman)"` - Mixed: `["something > 1 AND genres=comedy", "genres=horror OR title=batman"]` + > info + > _geoRadius({lat}, {lng}, {distance_in_meters}) built-in filter rule can be used to filter documents within a geo circle. + > warn > Attribute(s) used in `filter` should be declared as filterable attributes. See [Filtering and Faceted Search](https://docs.meilisearch.com/reference/features/filtering_and_faceted_search.html). example: @@ -616,6 +622,8 @@ components: > warn > Attribute(s) used in `sort` should be declared as sortable attributes. See [Sorting](https://docs.meilisearch.com/reference/features/sorting.html). + > info + > _geoPoint({lat}, {long}) built-in sort rule can be used to sort documents around a geo point. filter: name: filter in: query @@ -631,6 +639,9 @@ components: - String: `something > 1 AND genres=comedy AND (genres=horror OR title=batman)` - Mixed: `["something > 1 AND genres=comedy", "genres=horror OR title=batman"]` + > info + > _geoRadius({lat}, {lng}, {distance_in_meters}) built-in filter rule can be used to filter documents within a geo circle. + > warn > Attribute(s) used in `filter` should be declared as filterable attributes. See [Filtering and Faceted Search](https://docs.meilisearch.com/reference/features/filtering_and_faceted_search.html). responses: @@ -684,6 +695,13 @@ components: type: apiKey in: header name: X-Meili-API-Key + description: |- + An API key is a token that you provide when making API calls. Include the token in a header parameter called `X-Meili-API-Key`. + + Example: `X-Meili-API-Key: 123` + + > info + > test examples: {} tags: - name: Indexes @@ -1047,6 +1065,9 @@ paths: > info > If the provided index does not exist, it will be created. + + > info + > Use the reserved `_geo` object to add geo coordinates to a document. `_geo` is an object made of `lat` and `lng` field. tags: - Documents security: @@ -1058,6 +1079,7 @@ paths: schema: type: array items: null + examples: {} responses: '202': $ref: '#/components/responses/202' @@ -1068,7 +1090,7 @@ paths: put: operationId: indexes.documents.upsert summary: Add or update documents - description: | + description: |- Add a list of documents or update them if they already exist. If you send an already existing document (same [id](https://docs.meilisearch.com/learn/core_concepts/documents.html#primary-key)) the old document will be only partially updated according to the fields of the new document. Thus, any fields not present in the new document are kept and remained unchanged. @@ -1077,6 +1099,9 @@ paths: > info > If the provided index does not exist, it will be created. + + > info + > Use the reserved `_geo` object to add geo coordinates to a document. `_geo` is an object made of `lat` and `lng` field. tags: - Documents security: @@ -1674,6 +1699,9 @@ paths: summary: Update sortable attributes description: | Update the list of [sortableAttributes](https://docs.meilisearch.com//reference/features/settings.html#sortable-attributes) of an index. + + > info + > In order to enable sorting capabilities on geographic data, the `_geo` field must be added as a sortableAttribute. tags: - Settings security: @@ -1887,6 +1915,9 @@ paths: description: | Update the [filterable attributes](https://docs.meilisearch.com/reference/features/settings.html#filterable-attributes) of an index. + > info + > In order to enable filtering capabilities on geographic data, the `_geo` field must be added as a filterableAttribute. + > info > If the provided index does not exist, it will be created. tags: diff --git a/text/0059-geo-search.md b/text/0059-geo-search.md new file mode 100644 index 00000000..c7958f69 --- /dev/null +++ b/text/0059-geo-search.md @@ -0,0 +1,243 @@ +- Title: Geosearch +- Start Date: 2021-08-02 +- Specification PR: [#59](https://github.com/meilisearch/specifications/pull/59) +- Discovery Issue: [#42](https://github.com/meilisearch/product/issues/42) + +# Geosearch + +## 1. Functional Specification + +### I. Summary + +The purpose of this specification is to add a first iteration of the **geosearch** feature to give geo-filtering and geosorting capabilities at search time. + +#### Summary Key points + +- Documents MUST have a `_geo` reserved object to be geosearchable. +- Filter documents by a given geo radius using the built-in filter `_geoRadius({lat}, {lng}, {distance_in_meters})`. It is possible to cumulate several geosearch filters within the `filter` field. +- Sort documents in ascending/descending order around a geo point. e.g. `_geoPoint({lat}, {lng}):asc`. +- It is possible to filter and/or sort by geographical criteria of the user's choice. +- `_geo` must be set as a filterable attribute to use geo filtering capabilities. +- `_geo` must be set as a sortable attribute to use geo sort capabilities. +- There is no `geo` ranking rule that can be manipulated by the user. This one is automatically integrated in the ranking rule `sort` by default and activated by sorting using the `_geoPoint({lat}, {lng})` built-in sort rule. +- Using `_geoPoint({lat}, {lng})` in the `sort` parameter at search leads the engine to return a `_geoDistance` within the search results. This field represents the distance in meters of the document from the specified `_geoPoint`. +- Add an `invalid_geo_field` error. + +### II. Motivation + +According to our user feedback, the lack of a geosearch feature is mentioned as one of the biggest deal-breakers for choosing MeiliSearch as a search engine. A search engine must offer this feature. Some use cases specifically require integrated geosearch capabilities. Moreover, a lot of direct competitors offer it. Users today must find workarounds like using geohash to be able to geosearch documents. We hope to better serve the needs of users by implementing this feature. It allows multiplying the use-cases to which MeiliSearch can respond. + +### III. Technical Explanations + +#### **As a developer, I want to add geospatial coordinates to a document so that the document can be geosearchable.** + +- Introduce a reserved field `_geo` for documents to store geo spatial data from an **object** made of `lat` and `lng` fields for a **JSON format**. +- Introduce a reserved column `_geo` for documents to store geo spatial data from a **string** made of `lat,lng` for a **CSV format**. + +##### **JSON Format** + +**`_geo` field definition** + +- Name: `_geo` +- Type: Object +- Format: `{lat:float, lng:float}` +- Not required + +> πŸ’‘ if `_geo` is found in the document payload, `lat` and `lng` are required. +> πŸ’‘ `lat` and `lng` must be of float value. + +##### **CSV Format** + +Following the format already defined in the https://github.com/meilisearch/specifications/pull/28/files specification for document indexing from a CSV format. A reserved column `_geo` can be added to specify the geographical coordinates of a document. + +csv format example +``` +"id:number","label","brand","_geo" +"1","F40","Ferrari","48.862725,2.287592" +``` + +**`_geo` column definition** + +- Name: `_geo` +- Type: String +- Format: `"lat:float,lng:float"` +- Not required + +#### POST Add or replace documents `/indexes/{indexUid}/documents` + +##### Request body +``` +[ + { + "id": 1, + "label": "F40", + "brand": "Ferrari", + "_geo": { + "lat": 48.862725, + "lng": 2.287592 + } + } +] +``` + +##### 202 Accepted - Response body + +``` +{ + "updateId": 1 +} +``` + +#### PUT Add or replace documents `/indexes/{indexUid}/documents` + +##### Request body +``` +[ + { + "id": 1, + "brand": "F40 LM", + "brand": "Ferrari", + "_geo": { + "lat": 48.862725, + "lng": 2.287592 + } + } +] +``` + +##### 202 Accepted - Response body + +``` +{ + "updateId": 2 +} +``` + +> πŸ”΄ Giving a bad formed `_geo` that do not conform to the format causes the `update` payload to fail. A new `invalid_geo_field` error is given in the `update` object. + +##### Errors Definition + +## invalid_geo_field + +### Context + +This error occurs when the `_geo` field of a document payload is not valid. + +### Error Definition + +```json +{ + "message": "The _geo field is invalid. :syntaxErrorHelper.", + "code": "invalid_geo_field", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_geo_field" +} +``` + +- The `:syntaxErrorHelper` is inferred when a syntax error is encountered. + +--- + +### **As an end-user, I want to filter documents within a geo radius.** + +- Introduce a `_geoRadius({lat}, {lng}, {distance_in_meters})` built-in filter rule to `filter` documents in a geo radius.shape. + +**`_geoRadius` built-in filter rule definition** + +- Name: `_geoRadius` +- Signature: ({lat:float}:required, {lng:float}:required, {distance_in_meters:int}:required) +- Not required +- `distance_in_meters` only accepts positive value. + +> The `_geo` field has to be set in `filterableAttributes` setting by the developer to activate geo filtering capabilities at search. + +#### GET Search `/indexes/{indexUid}/search` + +``` +...&filter="brand=Mercedes AND _geoRadius(48.862725, 2.287592, 2000)"` +``` + +#### POST Search `/indexes/{indexUid}/search` + +``` +{ + "filter": ["brand = Ferrari", "_geoRadius(48.862725, 2.287592, 2000)"] +} +``` + +> πŸ”΄ Specifying parameters that do not conform to the `_geoRadius` signature causes the API to return an `invalid_filter` error. The error message should indicate how `_geoRadius` should be used. See `_geoRadius` built-in filter rule definition part. + +--- + +### **As an end-user, I want to sort documents around a geo point.** + +- Introduce a `_geoPoint({lat}, {lng})` function parameter to `sort` documents around a central point. + +**`_geoPoint` built-in sort rule definition** + +- Name: `_geoPoint` +- Signature: ({lat:float}:required, {lng:float}:required) +- Not required + +Following the [`sort` specification feature](https://github.com/meilisearch/specifications/pull/55): +> The `_geo` field has to be set in `sortableAttributes` setting by the developer to activate geo sorting capabilities at search. +> +>There is no `geo` ranking rule as such. It is in fact within the `sort` ranking rule in an obfuscated way. +> +>`_geoPoint` built-in sort rule can sort documents in ascending or descending order. See Technical Aspects part. + +#### GET Search `/indexes/{indexUid}/search` + +``` + ...&sort=_geoPoint({lat, lng}):asc,price:desc +``` + +#### POST Search `/indexes/{indexUid}/search` + +``` +{ + "sort": "_geoPoint({lat, lng}):asc,price:desc" +} +``` +> πŸ”΄ Specifying parameters that do not conform to the `_geoPoint` signature causes the API to return an `invalid_sort` error. The error message should indicate how `_geoPoint` should be used. See `_geoPoint` built-in sort rule definition part. + +--- + +### **As an end-user, I want to know the document distance when I am sorting around a geo point.** + +- Introduce a `_geoDistance` parameter to the search result `hit` object. + +**`_geoDistance` field definition** + +- Name: `_geoDistance` +- Description: Return document distance when the end-user sorts document from a `_geoPoint` in meters. +- Type: int +- Not required + +> πŸ’‘ `_geoDistance` response field is only computed and shown when the end-user have sorted documents around a `_geoPoint`. So if the end-user filters documents using a `_geoRadius` built-in filter without sorting them around a `_geoPoint`, this field `_geoDistance` will not appear in the search response. + +### IV. Finalized Key Changes + +- Add a `_geo` reserved field on JSON and CSV format to index a geo point coordinates for a document. +- Add a `_geoPoint(lat, lng)` built-in sort rule. +- Add a `_geoRadius(lat, lng, distance_in_meters)` built-in filter rule. +- Return a `_geoDistance` in `hits` objects representing the distance in meters computed from the `_geoPoint` built-in sort rule. + +## 2. Technical Aspects + +### I. :desc case - Sorting documents around a geo point + +We may encounter technical difficulties to implement a descending order capability for the geo sorting. This first iteration will allow us to identify if this is a real technical problem. If we verify the existence of this problem, we will think at this moment of the best solution to bring on the table. + +> πŸ’‘ In a first step, we could not allow `:desc` on a geoPoint if it's a complex technical issue. + +### II. Measuring + +- `filterableAttribute` setting definition to evaluate `_geo` presence. +- `sortableAttribute` setting definition to evaluate `_geo` presence. + +## 3. Future Possibilities + +- Add built-in filter to filter documents within `polygon` and `bounding-box`. +- Handling array of geo points in the document object. +- Handling multiple geo formats for the `_geo` field. e.g. "{lat},{lng}", a geohash etc.. From 3141ac9b5b8daf461487cda82e88047a1ecdb2ea Mon Sep 17 00:00:00 2001 From: Guillaume Mourier Date: Tue, 31 Aug 2021 16:39:18 +0200 Subject: [PATCH 05/10] Patch GeoSearch specification to mention technical limit on `desc` ordering around a _geoPoint (#66) * mention decision and expected behavior for a desc ordering around a geoPoint * add desc ordering around a geoPoint as a future possibility --- text/0059-geo-search.md | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/text/0059-geo-search.md b/text/0059-geo-search.md index c7958f69..6c7d6371 100644 --- a/text/0059-geo-search.md +++ b/text/0059-geo-search.md @@ -15,7 +15,7 @@ The purpose of this specification is to add a first iteration of the **geosearch - Documents MUST have a `_geo` reserved object to be geosearchable. - Filter documents by a given geo radius using the built-in filter `_geoRadius({lat}, {lng}, {distance_in_meters})`. It is possible to cumulate several geosearch filters within the `filter` field. -- Sort documents in ascending/descending order around a geo point. e.g. `_geoPoint({lat}, {lng}):asc`. +- Sort documents in ascending order around a geo point. e.g. `_geoPoint({lat}, {lng}):asc`. Descending order will not be supported for this first iteration. - It is possible to filter and/or sort by geographical criteria of the user's choice. - `_geo` must be set as a filterable attribute to use geo filtering capabilities. - `_geo` must be set as a sortable attribute to use geo sort capabilities. @@ -185,6 +185,8 @@ Following the [`sort` specification feature](https://github.com/meilisearch/spec >There is no `geo` ranking rule as such. It is in fact within the `sort` ranking rule in an obfuscated way. > >`_geoPoint` built-in sort rule can sort documents in ascending or descending order. See Technical Aspects part. +> +> The `:desc` order is not supported due to a technical limit. See Technical Aspects part for more details. #### GET Search `/indexes/{indexUid}/search` @@ -200,6 +202,8 @@ Following the [`sort` specification feature](https://github.com/meilisearch/spec } ``` > πŸ”΄ Specifying parameters that do not conform to the `_geoPoint` signature causes the API to return an `invalid_sort` error. The error message should indicate how `_geoPoint` should be used. See `_geoPoint` built-in sort rule definition part. +> +> πŸ”΄ Specifying `:desc` for a `_geoPoint` sort rule will raise an `invalid_sort` error with a message explaining that `_geoPoint` can only be used with `:asc` order. --- @@ -231,6 +235,24 @@ We may encounter technical difficulties to implement a descending order capabili > πŸ’‘ In a first step, we could not allow `:desc` on a geoPoint if it's a complex technical issue. +--- + +Edit-date: 2021-08-31 + +As imagined during the first phases of discovery we encounter a technical difficulty to compute a descending order around a `_geoPoint`. + +The technical team will have to do some preparatory work to overcome this limitation. For the time being, it is not planned to spend more time on this as it is not a primary use case. The impact may be very small, so it's not worth the effort at this point. + +### Decision taken + +It was decided after a discussion with @irevoire and @Kerollmops that when the user specifies the sort rule `_geoPoint(lat, lng):desc` an error `invalid_sort` will be returned with a message explaining that `:desc` is not available. + +#### Why keep the :asc if :desc is not valid? + +To keep consistency and not to introduce a different syntax among the `sort` search parameter. This is something we'll re-evaluate later. + +--- + ### II. Measuring - `filterableAttribute` setting definition to evaluate `_geo` presence. @@ -239,5 +261,6 @@ We may encounter technical difficulties to implement a descending order capabili ## 3. Future Possibilities - Add built-in filter to filter documents within `polygon` and `bounding-box`. +- Handling `:desc` order around a geoPoint - Handling array of geo points in the document object. - Handling multiple geo formats for the `_geo` field. e.g. "{lat},{lng}", a geohash etc.. From 1580697d1460715f88c64c030d7bbf1b0863d82e Mon Sep 17 00:00:00 2001 From: Guillaume Mourier Date: Tue, 31 Aug 2021 17:07:58 +0200 Subject: [PATCH 06/10] Patch error codes for csv and ndjson formats specs (#64) * specify error codes dedicated to payload format for post/put documents endpoints * Udpdate error codes naming * Add errors definition * update errors and cURL examples --- text/0028-indexing-csv.md | 152 +++++++++++++++++++++++++++-------- text/0029-indexing-ndjson.md | 152 ++++++++++++++++++++++++++--------- 2 files changed, 233 insertions(+), 71 deletions(-) diff --git a/text/0028-indexing-csv.md b/text/0028-indexing-csv.md index 8eeb9ab9..b4dc3e99 100644 --- a/text/0028-indexing-csv.md +++ b/text/0028-indexing-csv.md @@ -1,11 +1,11 @@ - Title: Indexing CSV - Start Date: 2021-04-9 - Specification PR: [PR-#28](https://github.com/meilisearch/specifications/pull/28) -- MeiliSearch Tracking-Issues: +- Discovery Issue: n/a # Indexing CSV -## 1. Feature Description and Interaction +## 1. Functional Specification ### I. Summary @@ -15,6 +15,12 @@ A [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) data format is bro Also, in order to boost write performance CSV data format is more suited than JSON for consequent datasets, as keys are not duplicated for every document. +#### Summary Key Points + +- The header of the csv payload allows to name the attributes and type them. +- `text/csv` Content-Type header is now supported. +- The error cases have been strengthened and completed. See Errors part. + ### II. Motivation We want to provide our users with an always improved usage experience. Currently, the engine only accepts JSON format as a data source. We want to give users the possibility of another simple data format, well known, to use. Thus, give them more versatility at the data source choices for the indexing (add and update) step. @@ -23,12 +29,9 @@ Since most SQL engines or SQL clients can easily dump data as CSV, it will facil Writing performance is also considered as a motivation since CSV parsing is less CPU and memory intensive than parsing Json due to the streamable capability of the CSV format. -### III. Additional Materials -N/A - -### IV.Explanation +### III.Explanation -#### Csv Formatting Rules +#### CSV Formatting Rules While there's [RFC 4180](https://tools.ietf.org/html/rfc4180) as a try to add a specification for CSV format, we will find a lot of variations from that. MeiliSearch features capabilities requires CSV data to be formatted the proper way to be parsable by the engine. @@ -95,63 +98,142 @@ the search result should be displayed as #### API Endpoints -> Each API endpoints mentioned above will now require a `text/csv` as `Content-Type` header to process CSV data. -> ⚠ A missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. +> Each API endpoints mentioned above will now require a `text/csv` as `Content-Type` header to be processed as CSV data. -#### Add or Replace Documents [πŸ“Ž](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) +**As a developer, I want to upload a CSV payload of documents so that end-user can search them** + +**POST documents** `/indexes/:indexUid/documents` ```bash curl \ -X POST 'http://localhost:7700/indexes/movies/documents' \ -H 'Content-Type: text/csv' \ - --binary-data ' + --data--binary ' "id","label","price:number","colors","description"\n "1","hoodie","19.99","purple","Hey, you will rock at summer time." ' ``` -> Response code: 202 Accepted - -##### Error codes - -> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. -> - Too large payload according to the limit should return a `413 payload_too_large` error -> - Wrong encoding should return a `400 bad_request` error -> - Invalid CSV data should return a `400 bad_request` error +> 202 Accepted - Response -### Add or Update Documents [πŸ“Ž](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents) +**PUT documents** `/indexes/:indexUid/documents` ```bash curl \ -X PUT 'http://localhost:7700/indexes/movies/documents' \ -H 'Content-Type: text/csv' \ - --binary-data ' + --data-binary ' "id","label","price:number","colors","description"\n "1","hoodie","19.99","purple","Hey, you will rock at summer time." ' ``` -> Response code: 202 Accepted +> 202 Accepted - Response + +##### Errors + +- πŸ”΄ Omitted `Content-Type` header will lead to a 415 Unsupported Media Type - **missing_content_type** error code. +- πŸ”΄ Sending an empty `Content-Type` will lead to a 415 Unsupported Media Type - **invalid_content_type** error code. +- πŸ”΄ Sending a different `Content-Type` than `application/json`, `application/x-ndjson` or `text/csv` will lead to 415 Unsupported Media Type **invalid_content_type** error code. +- πŸ”΄ Sending an empty payload will lead to a 400 Bad Request - **missing_payload** error code. +- πŸ”΄ Sending a different payload type than the `Content-Type` header should return a 400 Bad Request - **malformed_payload** error code. +- πŸ”΄ Sending a payload excessing the limit will lead to a 413 Payload Too Large - **payload_too_large** error code. +- πŸ”΄ Sending an invalid CSV format will lead to a 400 bad_request - **malformed_payload** error code. +- πŸ”΄ Sending a CSV header that does not conform to the specification will lead to a 400 bad_request - **malformed_payload** error code. + +##### Errors Definition + +## missing_content_type + +### Context + +This error occurs when the Content-Type header is missing. + +### Error Definition + +HTTP Code: `415 Unsupported Media Type` + +```json +{ + "message": "A Content-Type header is missing. Accepted values for Content-Type are: :contentTypeList", + "code": "missing_content_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#missing_content_type" +} +``` + +- The `:contentTypeList` is inferred when the message is generated. The values are separated by a `,` char. e.g. `application/json`, `text/csv`. + +--- -##### Errors handling +## invalid_content_type -> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. -> - Too large payload according to the limit should return a `413 payload_too_large` error -> - Wrong encoding should return a `400 bad_request` error -> - Invalid CSV data should return a `400 bad_request` error +### Context -### V. Impact on documentation +This error occurs when the provided content-type is not handled by the API method. -This feature should impact MeiliSearch users documentation by adding mention of csv capability inside Documents scope at [Add or replace documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) and [Add or update documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents). It should also mention that a missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. +### Error Definition -We should also not only mention JSON format in `unsupported_media_type` section on the [errors page](https://docs.meilisearch.com/errors/#unsupported_media_type) and add CSV format. The documentation says "Currently, MeiliSearch supports only JSON payloads." +HTTP Code: `415 Unsupported Media Type` -Documentation should also guide the user in the correct way to properly format and send csv data. Adding a dedicated page for the purpose of formatting and sending CSV data should be considered. +```json +{ + "message": "The Content-Type :contentType is invalid. Accepted values for Content-Type are: :contentTypeList", + "code": "invalid_content_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_content_type" +} +``` + +- The `:contentTypeList` is inferred when the message is generated. The values are separated by a `,` char. e.g. `application/json`, `text/csv`. + +--- + +## missing_payload + +### Context + +This error occurs when the client does not provide a mandatory payload to the request. + +### Error Definition + +HTTP Code: `400 Bad Request` + +```json +{ + "message": "A :payloadType payload is missing.", + "code": "missing_payload", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#missing_payload" +} +``` + +- The `:payloadType` is inferred when the message is generated. e.g. `json`, `ndjson`, `csv` + +--- + +## malformed_payload + +### Context + +This error occurs when the format sent in the payload is malformed. The payload contains a syntax error. + +### Error Definition + +HTTP Code: `400 Bad Request` + +```json + "message": ":syntaxErrorHelper. The :payloadType payload provided is malformed.", + "code": "malformed_payload", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#malformed_payload" +``` -### VI. Impact on SDKs +- The `:payloadType` is inferred when the message is generated. e.g. `json`, `ndjson`, `csv` +- The `:syntaxErrorHelper` is inferred when the message is generated. -This feature should impact MeiliSearch SDKs in the future by adding the possibility to send a CSV to MeiliSearch on the previous explicit endpoints. Simplifying the typing of the headers could also be handled by the SDKs. +--- -## 2. Technical Aspects -N/A +## 2. Technical details +n/a ## 3. Future possibilities diff --git a/text/0029-indexing-ndjson.md b/text/0029-indexing-ndjson.md index e4f48516..bf2dcdd5 100644 --- a/text/0029-indexing-ndjson.md +++ b/text/0029-indexing-ndjson.md @@ -1,11 +1,11 @@ - Title: Indexing NDJSON - Start Date: 2021-04-12 - Specification PR: [PR-#29](https://github.com/meilisearch/specifications/pull/29) -- MeiliSearch Tracking-Issues: TBD +- Discovery Issue: n/a # Indexing NDJSON -## 1. Feature Description and Interaction +## 1. Functional Specification ### I. Summary @@ -13,6 +13,11 @@ To index documents, the body of the add documents request has to match a specifi An [NDJSON](http://ndjson.org/) data format is easier to use than a CSV format because it propose a convenient format for storing structured data. +#### Summary Key Points + +- `application/x-ndjson` Content-Type header is now supported. +- The error cases have been strengthened and completed. See Errors part. + ### II. Motivation Currently, the engine only accepts JSON format as a data source. We want to give users the possibility of another simple data format to use. Thus, give them more versatility at the data source choices for the indexing step. @@ -23,11 +28,7 @@ While we give the ability to Meilisearch to ingest CSV data for indexing in this Representing nested structures in a JSON object is easy and convenient. -### III. Additional Materials - -TBD - -### IV. Explanation +### III. Explanation Newline-delimited JSON (`ndjson`), line-delimited JSON (`ldjson`), JSON lines (`jsonl`) are three terms expressing the same formats primarily intended for JSON streaming. @@ -74,63 +75,142 @@ the search result should be displayed as #### API Endpoints > Each API endpoints mentioned above will now require a `application/x-ndjson` as `Content-Type` header to be processed as NDJSON data. -> ⚠ A missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. -#### Add or Replace Documents [πŸ“Ž](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) +**As a developer, I want to upload a NDJSON payload of documents so that end-user can search them** -```curl +**POST documents** `/indexes/:indexUid/documents` + +```bash curl \ -X POST 'http://localhost:7700/indexes/movies/documents' \ -H 'Content-Type: application/x-ndjson' \ - --binary-data ' + --data-binary ' {"id":1, "label": "t-shirt", "price": 4.99, "colors": ["red", "green", "blue"]}\n {"id":499, "label": "hoodie", "price": 19.99, "colors": ["purple"]} ' ``` -> Response code: 202 Accepted - -##### Error codes - -> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. -> - Too large payload according to the limit should return a `413 payload_too_large` error -> - Wrong encoding should return a `400 bad_request` error -> - Invalid NDJSON data should return a `400 bad_request` error +> 202 Accepted - Response -### Add or Update Documents [πŸ“Ž](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents) +**PUT documents** `/indexes/:indexUid/documents` -```curl +```bash curl \ -X PUT 'http://localhost:7700/indexes/movies/documents' \ -H 'Content-Type: application/x-ndjson' \ - --binary-data ' + --data-binary ' {"id":1, "label": "t-shirt", "price": 4.99, "colors": ["red", "green", "blue"]}\n {"id":499, "label": "hoodie", "price": 19.99, "colors": ["purple"]} ' ``` -> Response code: 202 Accepted +> 202 Accepted - Response -##### Errors handling +##### Errors -> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. -> - Too large payload according to the limit should return a `413 payload_too_large` error -> - Wrong encoding should return a `400 bad_request` error -> - Invalid NDJSON data should return a `400 bad_request` error +- πŸ”΄ Omitted `Content-Type` header will lead to a 415 Unsupported Media Type - **missing_content_type** error code. +- πŸ”΄ Sending an empty `Content-Type` will lead to a 415 Unsupported Media Type - **invalid_content_type** error code. +- πŸ”΄ Sending a different `Content-Type` than `application/json`, `application/x-ndjson` or `text/csv` will lead to 415 Unsupported Media Type **invalid_content_type** error code. +- πŸ”΄ Sending an empty payload will lead to a 400 Bad Request - **missing_payload** error code. +- πŸ”΄ Sending a different payload type than the `Content-Type` header should return a 400 Bad Request - **malformed_payload** error code. +- πŸ”΄ Sending a payload excessing the limit will lead to a 413 Payload Too Large - **payload_too_large** error code. +- πŸ”΄ Sending an invalid ndjson format will lead to a 400 bad_request - **malformed_payload** error code. -### V. Impact on documentation +##### Errors Definition -This feature should impact MeiliSearch users documentation by adding the possibility to use `ndjson` as an accepted format in the Documents scope at [Add or replace documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) and [Add or update documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents). It should also mention that a missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. +## missing_content_type -We should also not only mention JSON format in `unsupported_media_type` section on the [errors page](https://docs.meilisearch.com/errors/#unsupported_media_type) and add `ndjson` format. The documentation says "Currently, MeiliSearch supports only JSON payloads." +### Context -Documentation should also guide the user in the correct way to properly format and send ndjson data. Adding a dedicated page for the purpose of formatting and sending ndjson data should be considered. +This error occurs when the Content-Type header is missing. -### VI. Impact on SDKs +### Error Definition + +HTTP Code: `415 Unsupported Media Type` + +```json +{ + "message": "A Content-Type header is missing. Accepted values for the Content-Type header are: :contentTypeList", + "code": "missing_content_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#missing_content_type" +} +``` + +- The `:contentTypeList` is inferred when the message is generated. The values are separated by a `,` char. e.g. `application/json`, `text/csv`. + +--- + +## invalid_content_type + +### Context + +This error occurs when the provided content-type is not handled by the API method. + +### Error Definition + +HTTP Code: `415 Unsupported Media Type` + +```json +{ + "message": "The Content-Type :contentType is invalid. Accepted values for the Content-Type header are: :contentTypeList", + "code": "invalid_content_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_content_type" +} +``` -This feature should impact MeiliSearch SDK's in the future by adding the possibility to send ndjson data to MeiliSearch on the previous explicited endpoints. +- The `:contentTypeList` is inferred when the message is generated. The values are separated by a `,` char. e.g. `application/json`, `text/csv`. -## 2. Technical Aspects -N/A +--- + +## missing_payload + +### Context + +This error occurs when the client does not provide a mandatory payload to the request. + +### Error Definition + +HTTP Code: `400 Bad Request` + +```json +{ + "message": "A :payloadType payload is missing.", + "code": "missing_payload", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#missing_payload" +} +``` + +- The `:payloadType` is inferred when the message is generated. e.g. `json`, `ndjson`, `csv` + +--- + +## malformed_payload + +### Context + +This error occurs when the format sent in the payload is malformed. The payload contains a syntax error. + +### Error Definition + +HTTP Code: `400 Bad Request` + +```json + "message": ":syntaxErrorHelper. The :payloadType payload provided is malformed.", + "code": "malformed_payload", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#malformed_payload" +``` + +- The `:payloadType` is inferred when the message is generated. e.g. `json`, `ndjson`, `csv` +- The `:syntaxErrorHelper` is inferred when the message is generated. + +--- + +## 2. Technical details +n/a ## 3. Future possibilities + - Provide an interface in the future dashboard to upload NDJSON data into an index. - Set a payload limit directly related to the type of data format. Currently, the payload size is equivalent to [JSON payload size](https://docs.meilisearch.com/reference/features/configuration.html#payload-limit-size). Metrics on feature usage and configuration update should help to choose a better suited value for this type of data. From ca0e165f1ea6e81b59b25a7cf41691cc2cb4609b Mon Sep 17 00:00:00 2001 From: Guillaume Mourier Date: Mon, 6 Sep 2021 18:19:02 +0200 Subject: [PATCH 07/10] Add alternative message for reserved keyword and update invalid_criterion error definition (#67) * add alternative message for reserved keyword and update invalid_criterion error * update error name in link field for invalid_ranking_rule error * update invalid_geo_field error message * fix typo --- text/0059-geo-search.md | 61 +++++++++++++++++++++++++++++++---------- 1 file changed, 46 insertions(+), 15 deletions(-) diff --git a/text/0059-geo-search.md b/text/0059-geo-search.md index 6c7d6371..5cf3dae5 100644 --- a/text/0059-geo-search.md +++ b/text/0059-geo-search.md @@ -22,6 +22,8 @@ The purpose of this specification is to add a first iteration of the **geosearch - There is no `geo` ranking rule that can be manipulated by the user. This one is automatically integrated in the ranking rule `sort` by default and activated by sorting using the `_geoPoint({lat}, {lng})` built-in sort rule. - Using `_geoPoint({lat}, {lng})` in the `sort` parameter at search leads the engine to return a `_geoDistance` within the search results. This field represents the distance in meters of the document from the specified `_geoPoint`. - Add an `invalid_geo_field` error. +- Add an alternative message for `invalid_sort` and `invalid_filter` error to handle reserved keywords. +- `invalid_criterion` is renamed to `invalid_ranking_rule` and add an alternative message to handle reserved keywords. ### II. Motivation @@ -66,7 +68,7 @@ csv format example #### POST Add or replace documents `/indexes/{indexUid}/documents` ##### Request body -``` +```json [ { "id": 1, @@ -82,7 +84,7 @@ csv format example ##### 202 Accepted - Response body -``` +```json { "updateId": 1 } @@ -91,7 +93,8 @@ csv format example #### PUT Add or replace documents `/indexes/{indexUid}/documents` ##### Request body -``` + +```json [ { "id": 1, @@ -107,13 +110,13 @@ csv format example ##### 202 Accepted - Response body -``` +```json { "updateId": 2 } ``` -> πŸ”΄ Giving a bad formed `_geo` that do not conform to the format causes the `update` payload to fail. A new `invalid_geo_field` error is given in the `update` object. +- πŸ”΄ Giving a bad formed `_geo` that do not conform to the format causes the `update` payload to fail. A new `invalid_geo_field` error is given in the `update` object. ##### Errors Definition @@ -127,14 +130,15 @@ This error occurs when the `_geo` field of a document payload is not valid. ```json { - "message": "The _geo field is invalid. :syntaxErrorHelper.", + "message": "The document with the id: `:documentId` contains an invalid _geo field: :syntaxErrorHelper.", "code": "invalid_geo_field", "type": "invalid_request", "link": "https://docs.meilisearch.com/errors#invalid_geo_field" } ``` -- The `:syntaxErrorHelper` is inferred when a syntax error is encountered. +- The `:documentId` is inferred when the message is generated. +- The `:syntaxErrorHelper` is inferred when the message is generated. --- @@ -159,13 +163,14 @@ This error occurs when the `_geo` field of a document payload is not valid. #### POST Search `/indexes/{indexUid}/search` -``` +```json { "filter": ["brand = Ferrari", "_geoRadius(48.862725, 2.287592, 2000)"] } ``` -> πŸ”΄ Specifying parameters that do not conform to the `_geoRadius` signature causes the API to return an `invalid_filter` error. The error message should indicate how `_geoRadius` should be used. See `_geoRadius` built-in filter rule definition part. +- πŸ”΄ Specifying parameters that do not conform to the `_geoRadius` signature causes the API to return an `invalid_filter` error. The error message should indicate how `_geoRadius` should be used. See `_geoRadius` built-in filter rule definition part. +- πŸ”΄ Using `_geo`, `_geoDistance`, `_geoPoint` in a filter expression cause the API to return an `invalid_filter` error. `message` should be `:reservedKeyword is a reserved keyword and thus can't be used as a filter expression.` --- @@ -184,7 +189,7 @@ Following the [`sort` specification feature](https://github.com/meilisearch/spec > >There is no `geo` ranking rule as such. It is in fact within the `sort` ranking rule in an obfuscated way. > ->`_geoPoint` built-in sort rule can sort documents in ascending or descending order. See Technical Aspects part. +>`_geoPoint` built-in sort rule can sort documents in ascending order only. > > The `:desc` order is not supported due to a technical limit. See Technical Aspects part for more details. @@ -196,14 +201,14 @@ Following the [`sort` specification feature](https://github.com/meilisearch/spec #### POST Search `/indexes/{indexUid}/search` -``` +```json { "sort": "_geoPoint({lat, lng}):asc,price:desc" } ``` -> πŸ”΄ Specifying parameters that do not conform to the `_geoPoint` signature causes the API to return an `invalid_sort` error. The error message should indicate how `_geoPoint` should be used. See `_geoPoint` built-in sort rule definition part. -> -> πŸ”΄ Specifying `:desc` for a `_geoPoint` sort rule will raise an `invalid_sort` error with a message explaining that `_geoPoint` can only be used with `:asc` order. +- πŸ”΄ Specifying parameters that do not conform to the `_geoPoint` signature causes the API to return an `invalid_sort` error. The error message should indicate how `_geoPoint` should be used. See `_geoPoint` built-in sort rule definition part. +- πŸ”΄ Specifying `:desc` for a `_geoPoint` sort rule will raise an `invalid_sort` error with a message explaining that `_geoPoint` can only be used with `:asc` order. +- πŸ”΄ Using `_geo`, `_geoDistance`, `_geoRadius` in a sort expression cause the API to return an `invalid_sort` error. `message` should be `:reservedKeyword is a reserved keyword and thus can't be used as a sort expression.` --- @@ -220,6 +225,32 @@ Following the [`sort` specification feature](https://github.com/meilisearch/spec > πŸ’‘ `_geoDistance` response field is only computed and shown when the end-user have sorted documents around a `_geoPoint`. So if the end-user filters documents using a `_geoRadius` built-in filter without sorting them around a `_geoPoint`, this field `_geoDistance` will not appear in the search response. +--- + +### `invalid_criterion` error changes + +The error is currently marked as an internal error thus the name is not explicit and consistent with the term `Ranking Rule` a user can encounter in the documentation and in the API resource name. A new definition of this error is proposed. + +#### invalid_ranking_rule + +#### Context + +This error is raised asynchronously when the user tries to specify an invalid ranking rule in the ranking rules setting. + +#### Error Definition + +```json + "message": ":rankingRule ranking rule is invalid. Valid ranking rules are Words, Typo, Sort, Proximity, Attribute, Exactness and custom ranking rules." + "code": "invalid_ranking_rule" + "type": "invalid_request" + "link": "https://docs.meilisearch.com/errors#invalid_ranking_rule" +``` + +- πŸ”΄ Specifying an invalid ranking rule name raises an `invalid_ranking_rule` error. See `message` defined in the error definition part. +- πŸ”΄ Specifying a custom ranking rule with `_geo` or `_geoDistance` raises an `invalid_ranking_rule` error. The message is `:reservedKeyword is a reserved keyword and thus can't be used as a ranking rule.`. + +--- + ### IV. Finalized Key Changes - Add a `_geo` reserved field on JSON and CSV format to index a geo point coordinates for a document. @@ -263,4 +294,4 @@ To keep consistency and not to introduce a different syntax among the `sort` sea - Add built-in filter to filter documents within `polygon` and `bounding-box`. - Handling `:desc` order around a geoPoint - Handling array of geo points in the document object. -- Handling multiple geo formats for the `_geo` field. e.g. "{lat},{lng}", a geohash etc.. +- Handling multiple geo formats for the `_geo` field. e.g. "{lat},{lng}", a geohash etc. From 3bcc07a5e7cc564250c91c999836bc89ac3d4c8b Mon Sep 17 00:00:00 2001 From: Guillaume Mourier Date: Tue, 14 Sep 2021 15:49:33 +0200 Subject: [PATCH 08/10] Add draft specification --- text/0073-task-resource-lists-filtering.md | 203 +++++++++++++++++++++ 1 file changed, 203 insertions(+) create mode 100644 text/0073-task-resource-lists-filtering.md diff --git a/text/0073-task-resource-lists-filtering.md b/text/0073-task-resource-lists-filtering.md new file mode 100644 index 00000000..4b353ed4 --- /dev/null +++ b/text/0073-task-resource-lists-filtering.md @@ -0,0 +1,203 @@ +- Title: Task Resource Lists Filtering +- Start Date: 2021-09-13 +- Specification PR: [#73](https://github.com/meilisearch/specifications/pull/73) + +# Task Resource Lists Filtering + +## 1. Functional Specification + +### I. Summary + +Add filtering capabilities to the `tasks` endpoints to facilitate the management of an instance and its indexes, or a specific index. + +This first iteration adds filters on the `status` and `type` attributes of the `task` API resource. + +#### Summary Key Points + +- Add filtering capabilities on `type` and `status` for `GET` `task` lists endpoints. + +### II. Motivation + +Following the specification aiming to stabilize the `task` API resource, we want to give users the capability to refine the lists of task according to several criteria to find precise information more quickly. + +### III. Technical Explanations + +#### Query parameters definition + +| parameter | type | required | description | +|------|------|----------|----------------------------| +| status | string | No | Possible values are all the status of a task. By default, when `status` query parameter is not set, all task statuses are returned. | +| type | string | No | Possible values are all the types of a task. By default, when `type` is not set in, all task types are returned. | + +### Usages examples + +This specification demonstrates filetring on `/tasks`, but it should be equivalent for `indexes/:uid/tasks`. + +--- + +**No filtering** + +`GET` - `/tasks` + +```json +{ + "results": [ + { + "uid": 1350, + "indexUid": "movies", + "status": "failed", + "type": "documentsAddition", + ..., + }, + ..., + { + "uid": 1330, + "indexUid": "movies_reviews", + "status": "succeeded", + "type": "documentsDeletion", + ... + } + ], + ... +} +``` + +**Filter `tasks` that have a `failed` `status`** + +`GET` - `/tasks?status=failed` + +```json +{ + "results": [ + { + "uid": 1350, + "indexUid": "movies", + "status": "failed", + "type": "documentsAddition", + ..., + }, + ..., + { + "uid": 1279, + "indexUid": "movies", + "status": "failed", + "type": "settingsUpdate", + ..., + } + ], + ... +} +``` + +**Filter `tasks` that are of `documentsAddition` type** + +`GET` - `/tasks?type=documentsAddition` + +```json +{ + "results": [ + { + "uid": 1350, + "indexUid": "movies", + "status": "failed", + "type": "documentsAddition", + ..., + }, + ..., + { + "uid": 1343, + "indexUid": "movies", + "type": "succeeded", + "type": "documentsAddition", + ..., + } + ], + ... +} +``` + +- πŸ’‘ `status` and `type` can be used together. The two parameters are cumulated and a `AND` operation is performed between the two filters. + +**Filter `tasks` that are of `documentsAddition` type and have a `failed` status** + +`GET` - `/tasks?type=documentsAddition&status=failed` + +```json +{ + "results": [ + { + "uid": 1350, + "indexUid": "movies", + "status": "failed", + "type": "documentsAddition", + ..., + }, + ..., + { + "uid": 1346, + "indexUid": "movies", + "status": "failed", + "type": "documentsAddition", + ..., + } + ], + ... +} +``` + +- `type` and `status` query parameters can be read as is `type=documentsAddition AND status=failed`. + +--- + +### Behaviors for `status` and `type` query parameters. + +#### `status` + +- πŸ”΄ If the `status` value is not consistent with one of the task statuses, an `invalid_task_status` error is returned. + +#### `invalid_task_status` Error Definition + +HTTP Code: `400 Bad Request` + +```json +{ + "message": ":status is invalid. Available task statuses are: :taskStatuses.", + "code": "invalid_task_status", + "type": "invalid_request", + "link":"https://docs.meilisearch.com/errors#invalid_task_status" +} +``` + +- The `:status` is inferred when the message is generated. +- The `:taskStatuses` is inferred when the message is generated. + +#### `type` + +- πŸ”΄ If the `type` value is not consistent with one of the task types, an `invalid_task_type` error is returned. + +#### `invalid_task_type` Error Definition + +HTTP Code: `400 Bad Request` + +```json +{ + "message": ":type is invalid. Available task types are: :taskTypes.", + "code": "invalid_task_type", + "type": "invalid_request", + "link":"https://docs.meilisearch.com/errors#invalid_task_type" +} +``` + +- The `:type` is inferred when the message is generated. +- The `:taskTypes` is inferred when the message is generated. + +#### Empty `results` + +πŸ’‘ If no results match the filters. A response is returned with an empty `results` array. + +## 2. Technical Aspects +n/a + +## 3. Future Possibilities + +- Filter `task` lists according to multiple types or statuses by separating several values with the `,` character. This character would be interpreted as an `OR`. e.g. `?status=documentsAddition,settingsUpdate&type=failed,enqueued` \ No newline at end of file From 774ffb8e64c870b4ac1aa97726f7127774f2dd54 Mon Sep 17 00:00:00 2001 From: Guillaume Mourier Date: Tue, 14 Sep 2021 16:54:13 +0200 Subject: [PATCH 09/10] fix typo --- text/0073-task-resource-lists-filtering.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0073-task-resource-lists-filtering.md b/text/0073-task-resource-lists-filtering.md index 4b353ed4..48c56922 100644 --- a/text/0073-task-resource-lists-filtering.md +++ b/text/0073-task-resource-lists-filtering.md @@ -31,7 +31,7 @@ Following the specification aiming to stabilize the `task` API resource, we want ### Usages examples -This specification demonstrates filetring on `/tasks`, but it should be equivalent for `indexes/:uid/tasks`. +This specification demonstrates filtering on `/tasks`, it should be equivalent for `indexes/:uid/tasks`. --- From 45167ad201c5b2824e458a463045cf7e149d070f Mon Sep 17 00:00:00 2001 From: Guillaume Mourier Date: Tue, 14 Sep 2021 17:01:45 +0200 Subject: [PATCH 10/10] Add a future possibility for date range filter --- text/0073-task-resource-lists-filtering.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/text/0073-task-resource-lists-filtering.md b/text/0073-task-resource-lists-filtering.md index 48c56922..c79259f8 100644 --- a/text/0073-task-resource-lists-filtering.md +++ b/text/0073-task-resource-lists-filtering.md @@ -200,4 +200,5 @@ n/a ## 3. Future Possibilities -- Filter `task` lists according to multiple types or statuses by separating several values with the `,` character. This character would be interpreted as an `OR`. e.g. `?status=documentsAddition,settingsUpdate&type=failed,enqueued` \ No newline at end of file +- Filter `task` lists according to multiple types or statuses by separating several values with the `,` character. This character would be interpreted as an `OR`. e.g. `?status=documentsAddition,settingsUpdate&type=failed,enqueued` +- Add a date range filter for `enqueuedAt`, `startedAt` and `finishedAt` attributes. \ No newline at end of file