diff --git a/text/0028-indexing-csv.md b/text/0028-indexing-csv.md index 8eeb9ab9..b4dc3e99 100644 --- a/text/0028-indexing-csv.md +++ b/text/0028-indexing-csv.md @@ -1,11 +1,11 @@ - Title: Indexing CSV - Start Date: 2021-04-9 - Specification PR: [PR-#28](https://github.com/meilisearch/specifications/pull/28) -- MeiliSearch Tracking-Issues: +- Discovery Issue: n/a # Indexing CSV -## 1. Feature Description and Interaction +## 1. Functional Specification ### I. Summary @@ -15,6 +15,12 @@ A [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) data format is bro Also, in order to boost write performance CSV data format is more suited than JSON for consequent datasets, as keys are not duplicated for every document. +#### Summary Key Points + +- The header of the csv payload allows to name the attributes and type them. +- `text/csv` Content-Type header is now supported. +- The error cases have been strengthened and completed. See Errors part. + ### II. Motivation We want to provide our users with an always improved usage experience. Currently, the engine only accepts JSON format as a data source. We want to give users the possibility of another simple data format, well known, to use. Thus, give them more versatility at the data source choices for the indexing (add and update) step. @@ -23,12 +29,9 @@ Since most SQL engines or SQL clients can easily dump data as CSV, it will facil Writing performance is also considered as a motivation since CSV parsing is less CPU and memory intensive than parsing Json due to the streamable capability of the CSV format. -### III. Additional Materials -N/A - -### IV.Explanation +### III.Explanation -#### Csv Formatting Rules +#### CSV Formatting Rules While there's [RFC 4180](https://tools.ietf.org/html/rfc4180) as a try to add a specification for CSV format, we will find a lot of variations from that. MeiliSearch features capabilities requires CSV data to be formatted the proper way to be parsable by the engine. @@ -95,63 +98,142 @@ the search result should be displayed as #### API Endpoints -> Each API endpoints mentioned above will now require a `text/csv` as `Content-Type` header to process CSV data. -> ⚠ A missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. +> Each API endpoints mentioned above will now require a `text/csv` as `Content-Type` header to be processed as CSV data. -#### Add or Replace Documents [📎](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) +**As a developer, I want to upload a CSV payload of documents so that end-user can search them** + +**POST documents** `/indexes/:indexUid/documents` ```bash curl \ -X POST 'http://localhost:7700/indexes/movies/documents' \ -H 'Content-Type: text/csv' \ - --binary-data ' + --data--binary ' "id","label","price:number","colors","description"\n "1","hoodie","19.99","purple","Hey, you will rock at summer time." ' ``` -> Response code: 202 Accepted - -##### Error codes - -> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. -> - Too large payload according to the limit should return a `413 payload_too_large` error -> - Wrong encoding should return a `400 bad_request` error -> - Invalid CSV data should return a `400 bad_request` error +> 202 Accepted - Response -### Add or Update Documents [📎](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents) +**PUT documents** `/indexes/:indexUid/documents` ```bash curl \ -X PUT 'http://localhost:7700/indexes/movies/documents' \ -H 'Content-Type: text/csv' \ - --binary-data ' + --data-binary ' "id","label","price:number","colors","description"\n "1","hoodie","19.99","purple","Hey, you will rock at summer time." ' ``` -> Response code: 202 Accepted +> 202 Accepted - Response + +##### Errors + +- 🔴 Omitted `Content-Type` header will lead to a 415 Unsupported Media Type - **missing_content_type** error code. +- 🔴 Sending an empty `Content-Type` will lead to a 415 Unsupported Media Type - **invalid_content_type** error code. +- 🔴 Sending a different `Content-Type` than `application/json`, `application/x-ndjson` or `text/csv` will lead to 415 Unsupported Media Type **invalid_content_type** error code. +- 🔴 Sending an empty payload will lead to a 400 Bad Request - **missing_payload** error code. +- 🔴 Sending a different payload type than the `Content-Type` header should return a 400 Bad Request - **malformed_payload** error code. +- 🔴 Sending a payload excessing the limit will lead to a 413 Payload Too Large - **payload_too_large** error code. +- 🔴 Sending an invalid CSV format will lead to a 400 bad_request - **malformed_payload** error code. +- 🔴 Sending a CSV header that does not conform to the specification will lead to a 400 bad_request - **malformed_payload** error code. + +##### Errors Definition + +## missing_content_type + +### Context + +This error occurs when the Content-Type header is missing. + +### Error Definition + +HTTP Code: `415 Unsupported Media Type` + +```json +{ + "message": "A Content-Type header is missing. Accepted values for Content-Type are: :contentTypeList", + "code": "missing_content_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#missing_content_type" +} +``` + +- The `:contentTypeList` is inferred when the message is generated. The values are separated by a `,` char. e.g. `application/json`, `text/csv`. + +--- -##### Errors handling +## invalid_content_type -> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. -> - Too large payload according to the limit should return a `413 payload_too_large` error -> - Wrong encoding should return a `400 bad_request` error -> - Invalid CSV data should return a `400 bad_request` error +### Context -### V. Impact on documentation +This error occurs when the provided content-type is not handled by the API method. -This feature should impact MeiliSearch users documentation by adding mention of csv capability inside Documents scope at [Add or replace documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) and [Add or update documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents). It should also mention that a missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. +### Error Definition -We should also not only mention JSON format in `unsupported_media_type` section on the [errors page](https://docs.meilisearch.com/errors/#unsupported_media_type) and add CSV format. The documentation says "Currently, MeiliSearch supports only JSON payloads." +HTTP Code: `415 Unsupported Media Type` -Documentation should also guide the user in the correct way to properly format and send csv data. Adding a dedicated page for the purpose of formatting and sending CSV data should be considered. +```json +{ + "message": "The Content-Type :contentType is invalid. Accepted values for Content-Type are: :contentTypeList", + "code": "invalid_content_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_content_type" +} +``` + +- The `:contentTypeList` is inferred when the message is generated. The values are separated by a `,` char. e.g. `application/json`, `text/csv`. + +--- + +## missing_payload + +### Context + +This error occurs when the client does not provide a mandatory payload to the request. + +### Error Definition + +HTTP Code: `400 Bad Request` + +```json +{ + "message": "A :payloadType payload is missing.", + "code": "missing_payload", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#missing_payload" +} +``` + +- The `:payloadType` is inferred when the message is generated. e.g. `json`, `ndjson`, `csv` + +--- + +## malformed_payload + +### Context + +This error occurs when the format sent in the payload is malformed. The payload contains a syntax error. + +### Error Definition + +HTTP Code: `400 Bad Request` + +```json + "message": ":syntaxErrorHelper. The :payloadType payload provided is malformed.", + "code": "malformed_payload", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#malformed_payload" +``` -### VI. Impact on SDKs +- The `:payloadType` is inferred when the message is generated. e.g. `json`, `ndjson`, `csv` +- The `:syntaxErrorHelper` is inferred when the message is generated. -This feature should impact MeiliSearch SDKs in the future by adding the possibility to send a CSV to MeiliSearch on the previous explicit endpoints. Simplifying the typing of the headers could also be handled by the SDKs. +--- -## 2. Technical Aspects -N/A +## 2. Technical details +n/a ## 3. Future possibilities diff --git a/text/0029-indexing-ndjson.md b/text/0029-indexing-ndjson.md index e4f48516..bf2dcdd5 100644 --- a/text/0029-indexing-ndjson.md +++ b/text/0029-indexing-ndjson.md @@ -1,11 +1,11 @@ - Title: Indexing NDJSON - Start Date: 2021-04-12 - Specification PR: [PR-#29](https://github.com/meilisearch/specifications/pull/29) -- MeiliSearch Tracking-Issues: TBD +- Discovery Issue: n/a # Indexing NDJSON -## 1. Feature Description and Interaction +## 1. Functional Specification ### I. Summary @@ -13,6 +13,11 @@ To index documents, the body of the add documents request has to match a specifi An [NDJSON](http://ndjson.org/) data format is easier to use than a CSV format because it propose a convenient format for storing structured data. +#### Summary Key Points + +- `application/x-ndjson` Content-Type header is now supported. +- The error cases have been strengthened and completed. See Errors part. + ### II. Motivation Currently, the engine only accepts JSON format as a data source. We want to give users the possibility of another simple data format to use. Thus, give them more versatility at the data source choices for the indexing step. @@ -23,11 +28,7 @@ While we give the ability to Meilisearch to ingest CSV data for indexing in this Representing nested structures in a JSON object is easy and convenient. -### III. Additional Materials - -TBD - -### IV. Explanation +### III. Explanation Newline-delimited JSON (`ndjson`), line-delimited JSON (`ldjson`), JSON lines (`jsonl`) are three terms expressing the same formats primarily intended for JSON streaming. @@ -74,63 +75,142 @@ the search result should be displayed as #### API Endpoints > Each API endpoints mentioned above will now require a `application/x-ndjson` as `Content-Type` header to be processed as NDJSON data. -> ⚠ A missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. -#### Add or Replace Documents [📎](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) +**As a developer, I want to upload a NDJSON payload of documents so that end-user can search them** -```curl +**POST documents** `/indexes/:indexUid/documents` + +```bash curl \ -X POST 'http://localhost:7700/indexes/movies/documents' \ -H 'Content-Type: application/x-ndjson' \ - --binary-data ' + --data-binary ' {"id":1, "label": "t-shirt", "price": 4.99, "colors": ["red", "green", "blue"]}\n {"id":499, "label": "hoodie", "price": 19.99, "colors": ["purple"]} ' ``` -> Response code: 202 Accepted - -##### Error codes - -> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. -> - Too large payload according to the limit should return a `413 payload_too_large` error -> - Wrong encoding should return a `400 bad_request` error -> - Invalid NDJSON data should return a `400 bad_request` error +> 202 Accepted - Response -### Add or Update Documents [📎](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents) +**PUT documents** `/indexes/:indexUid/documents` -```curl +```bash curl \ -X PUT 'http://localhost:7700/indexes/movies/documents' \ -H 'Content-Type: application/x-ndjson' \ - --binary-data ' + --data-binary ' {"id":1, "label": "t-shirt", "price": 4.99, "colors": ["red", "green", "blue"]}\n {"id":499, "label": "hoodie", "price": 19.99, "colors": ["purple"]} ' ``` -> Response code: 202 Accepted +> 202 Accepted - Response -##### Errors handling +##### Errors -> - Sending a different payload than the `Content-Type` header should return a `400 bad_request` error. -> - Too large payload according to the limit should return a `413 payload_too_large` error -> - Wrong encoding should return a `400 bad_request` error -> - Invalid NDJSON data should return a `400 bad_request` error +- 🔴 Omitted `Content-Type` header will lead to a 415 Unsupported Media Type - **missing_content_type** error code. +- 🔴 Sending an empty `Content-Type` will lead to a 415 Unsupported Media Type - **invalid_content_type** error code. +- 🔴 Sending a different `Content-Type` than `application/json`, `application/x-ndjson` or `text/csv` will lead to 415 Unsupported Media Type **invalid_content_type** error code. +- 🔴 Sending an empty payload will lead to a 400 Bad Request - **missing_payload** error code. +- 🔴 Sending a different payload type than the `Content-Type` header should return a 400 Bad Request - **malformed_payload** error code. +- 🔴 Sending a payload excessing the limit will lead to a 413 Payload Too Large - **payload_too_large** error code. +- 🔴 Sending an invalid ndjson format will lead to a 400 bad_request - **malformed_payload** error code. -### V. Impact on documentation +##### Errors Definition -This feature should impact MeiliSearch users documentation by adding the possibility to use `ndjson` as an accepted format in the Documents scope at [Add or replace documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-replace-documents) and [Add or update documents](https://docs.meilisearch.com/reference/api/documents.html#add-or-update-documents). It should also mention that a missing Content-Type will be interpreted as `application/json` since it's the current behavior. Giving an `application/json` Content-Type leads to the same behavior. +## missing_content_type -We should also not only mention JSON format in `unsupported_media_type` section on the [errors page](https://docs.meilisearch.com/errors/#unsupported_media_type) and add `ndjson` format. The documentation says "Currently, MeiliSearch supports only JSON payloads." +### Context -Documentation should also guide the user in the correct way to properly format and send ndjson data. Adding a dedicated page for the purpose of formatting and sending ndjson data should be considered. +This error occurs when the Content-Type header is missing. -### VI. Impact on SDKs +### Error Definition + +HTTP Code: `415 Unsupported Media Type` + +```json +{ + "message": "A Content-Type header is missing. Accepted values for the Content-Type header are: :contentTypeList", + "code": "missing_content_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#missing_content_type" +} +``` + +- The `:contentTypeList` is inferred when the message is generated. The values are separated by a `,` char. e.g. `application/json`, `text/csv`. + +--- + +## invalid_content_type + +### Context + +This error occurs when the provided content-type is not handled by the API method. + +### Error Definition + +HTTP Code: `415 Unsupported Media Type` + +```json +{ + "message": "The Content-Type :contentType is invalid. Accepted values for the Content-Type header are: :contentTypeList", + "code": "invalid_content_type", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#invalid_content_type" +} +``` -This feature should impact MeiliSearch SDK's in the future by adding the possibility to send ndjson data to MeiliSearch on the previous explicited endpoints. +- The `:contentTypeList` is inferred when the message is generated. The values are separated by a `,` char. e.g. `application/json`, `text/csv`. -## 2. Technical Aspects -N/A +--- + +## missing_payload + +### Context + +This error occurs when the client does not provide a mandatory payload to the request. + +### Error Definition + +HTTP Code: `400 Bad Request` + +```json +{ + "message": "A :payloadType payload is missing.", + "code": "missing_payload", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#missing_payload" +} +``` + +- The `:payloadType` is inferred when the message is generated. e.g. `json`, `ndjson`, `csv` + +--- + +## malformed_payload + +### Context + +This error occurs when the format sent in the payload is malformed. The payload contains a syntax error. + +### Error Definition + +HTTP Code: `400 Bad Request` + +```json + "message": ":syntaxErrorHelper. The :payloadType payload provided is malformed.", + "code": "malformed_payload", + "type": "invalid_request", + "link": "https://docs.meilisearch.com/errors#malformed_payload" +``` + +- The `:payloadType` is inferred when the message is generated. e.g. `json`, `ndjson`, `csv` +- The `:syntaxErrorHelper` is inferred when the message is generated. + +--- + +## 2. Technical details +n/a ## 3. Future possibilities + - Provide an interface in the future dashboard to upload NDJSON data into an index. - Set a payload limit directly related to the type of data format. Currently, the payload size is equivalent to [JSON payload size](https://docs.meilisearch.com/reference/features/configuration.html#payload-limit-size). Metrics on feature usage and configuration update should help to choose a better suited value for this type of data.