meilisearch · gmourier · Jun 10, 2021 · Jun 16, 2021 · Aug 12, 2021 · Aug 25, 2021
diff --git a/open-api.yaml b/open-api.yaml
@@ -124,6 +124,7 @@ components:
                 length: 5
               - start: 155
                 length: 5
+      description: ''
       properties:
         _formatted:
           type: object
@@ -140,7 +141,9 @@ components:
             - string
             - number
           description: Retrieve attributes of the document. `attributesToRetrieve` controls these fields.
-      description: ''
+        _geoDistance:
+          type: number
+          description: 'Using _geoPoint({lat}, {lng}) built-in sort rule at search leads the engine to return a _geoDistance within the search results. This field represents the distance in meters of the document from the specified _geoPoint.'
     documentId:
       oneOf:
         - type: number
@@ -156,6 +159,9 @@ components:
         - String: `"something > 1 AND genres=comedy AND (genres=horror OR title=batman)"`
         - Mixed: `["something > 1 AND genres=comedy", "genres=horror OR title=batman"]`
 
+        > info
+        > _geoRadius({lat}, {lng}, {distance_in_meters}) built-in filter rule can be used to filter documents within a geo circle.
+
         > warn
         > Attribute(s) used in `filter` should be declared as filterable attributes. See [Filtering and Faceted Search](https://docs.meilisearch.com/reference/features/filtering_and_faceted_search.html).
       example:
@@ -362,7 +368,6 @@ components:
         - sort
         - exactness
         - release_date:asc
-      examples: []
     filterableAttributes:
       type: array
       description: |
@@ -617,6 +622,8 @@ components:
 
         > warn
         > Attribute(s) used in `sort` should be declared as sortable attributes. See [Sorting](https://docs.meilisearch.com/reference/features/sorting.html).
+        > info
+        > _geoPoint({lat}, {long}) built-in sort rule can be used to sort documents around a geo point.
     filter:
       name: filter
       in: query
@@ -632,6 +639,9 @@ components:
         - String: `something > 1 AND genres=comedy AND (genres=horror OR title=batman)`
         - Mixed: `["something > 1 AND genres=comedy", "genres=horror OR title=batman"]`
 
+        > info
+        > _geoRadius({lat}, {lng}, {distance_in_meters}) built-in filter rule can be used to filter documents within a geo circle.
+
         > warn
         > Attribute(s) used in `filter` should be declared as filterable attributes. See [Filtering and Faceted Search](https://docs.meilisearch.com/reference/features/filtering_and_faceted_search.html).
   responses:
@@ -685,6 +695,13 @@ components:
       type: apiKey
       in: header
       name: X-Meili-API-Key
+      description: |-
+        An API key is a token that you provide when making API calls. Include the token in a header parameter called `X-Meili-API-Key`.
+
+        Example: `X-Meili-API-Key: 123`
+
+        > info
+        > test
   examples: {}
 tags:
   - name: Indexes
@@ -1048,6 +1065,9 @@ paths:
 
         > info
         > If the provided index does not exist, it will be created.
+
+        > info
+        > Use the reserved `_geo` object to add geo coordinates to a document. `_geo` is an object made of `lat` and `lng` field.
       tags:
         - Documents
       security:
@@ -1059,6 +1079,7 @@ paths:
             schema:
               type: array
               items: null
+            examples: {}
       responses:
         '202':
           $ref: '#/components/responses/202'
@@ -1069,7 +1090,7 @@ paths:
     put:
       operationId: indexes.documents.upsert
       summary: Add or update documents
-      description: |
+      description: |-
         Add a list of documents or update them if they already exist.
 
         If you send an already existing document (same [id](https://docs.meilisearch.com/learn/core_concepts/documents.html#primary-key)) the old document will be only partially updated according to the fields of the new document. Thus, any fields not present in the new document are kept and remained unchanged.
@@ -1078,6 +1099,9 @@ paths:
 
         > info
         > If the provided index does not exist, it will be created.
+
+        > info
+        > Use the reserved `_geo` object to add geo coordinates to a document. `_geo` is an object made of `lat` and `lng` field.
       tags:
         - Documents
       security:
@@ -1675,6 +1699,9 @@ paths:
       summary: Update sortable attributes
       description: |
         Update the list of [sortableAttributes](https://docs.meilisearch.com//reference/features/settings.html#sortable-attributes) of an index.
+
+        > info
+        > In order to enable sorting capabilities on geographic data, the `_geo` field must be added as a sortableAttribute.
       tags:
         - Settings
       security:
@@ -1888,6 +1915,9 @@ paths:
       description: |
         Update the [filterable attributes](https://docs.meilisearch.com/reference/features/settings.html#filterable-attributes) of an index.
 
+        > info
+        > In order to enable filtering capabilities on geographic data, the `_geo` field must be added as a filterableAttribute.
+
         > info
         > If the provided index does not exist, it will be created.
       tags:

diff --git a/text/0028-indexing-csv.md b/text/0028-indexing-csv.md
@@ -0,0 +1,241 @@
+- Title: Indexing CSV
+- Start Date: 2021-04-9
+- Specification PR: [PR-#28](https://github.com/meilisearch/specifications/pull/28)
+- Discovery Issue: n/a
+
+# Indexing CSV
+
+## 1. Functional Specification
+
+### I. Summary
+
+To index documents, the body of the add documents request has to match a specific format. That specific format is then parsed and tokenized inside MeiliSearch. After which, the documents added are in the pool of searchable and returnable documents.
+
+A [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) data format is broadly used to store and exchange data in a simple format.
+
+Also, in order to boost write performance CSV data format is more suited than JSON for consequent datasets, as keys are not duplicated for every document.
+
+#### Summary Key Points
+
+- The header of the csv payload allows to name the attributes and type them.
+- `text/csv` Content-Type header is now supported.
+- The error cases have been strengthened and completed. See Errors part.
+
+### II. Motivation
+
+We want to provide our users with an always improved usage experience. Currently, the engine only accepts JSON format as a data source. We want to give users the possibility of another simple data format, well known, to use. Thus, give them more versatility at the data source choices for the indexing (add and update) step.
+
+Since most SQL engines or SQL clients can easily dump data as CSV, it will facilitate MeiliSearch adoption by extending the indexing step on a wider range of customer cases than before.
+
+Writing performance is also considered as a motivation since CSV parsing is less CPU and memory intensive than parsing Json due to the streamable capability of the CSV format.
+
+### III.Explanation
+
+#### CSV Formatting Rules
+
+While there's [RFC 4180](https://tools.ietf.org/html/rfc4180) as a try to add a specification for CSV format, we will find a lot of variations from that. MeiliSearch features capabilities requires CSV data to be formatted the proper way to be parsable by the engine.
+
+- CSV data format needs to contain a first line representing the list of attributes with the optionally chosen type separated from the attribute name by `:` character. The type is case insensitive.
+
+> An attribute can be specificed with two types: `string` or `number`. A missing type will be interpreted as a `string` by default.
+>
+> Valid headline example: "id:number","title:string","author","price:number"
+
+- The following CSV lines will represent a document for MeiliSearch.
+- A CSV value should be enclosed in double-quotes when it contains a comma character or a newline to escape it.
+- Using double-quotes to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote as mentioned in [RFC 4180](https://tools.ietf.org/html/rfc4180).
+- Float value should be written with a `.` character, like `3.14`.
+- CSV text should be encoded in UTF8.
+- The format can't handle array cell values. We are providing `nd-json` format to deal with theses types of attribute in a easier way.
+
+##### Example with a comma inside a cell
+
+Given the CSV payload
+```
+"id:number","label","price:number","colors","description"
+"1","t-shirt","4.99","red","Thus, you will rock at summer time."
+```
+the search result should be displayed as
+```json
+{
+  "hits": [
+    {
+      "id": 1,
+      "label": "t-shirt",
+      "price": 4.99,
+      "colors": "red",
+      "description": "Hey, you will rock at summer time."
+    }
+  ],
+  ...
+}
+```
+
+##### Example with a double quote inside a cell
+
+Given the CSV payload
+```
+"id:number","label","price","colors","description"
+"1","t-shirt","4.99","red","Hey, you will ""rock"" at summer time."
+```
+the search result should be displayed as
+```json
+{
+  "hits": [
+    {
+      "id": 1,
+      "label": "t-shirt",
+      "price": "4.99",
+      "colors": "red",
+      "description": "Hey, you will rock at summer time.",
+    }
+  ],
+  ...
+}
+```
+
+> Note that the price attribute was not typed as a number. By default, MeiliSearch type it as a string.
+
+#### API Endpoints
+
+> Each API endpoints mentioned above will now require a `text/csv` as `Content-Type` header to be processed as CSV data.
+
+**As a developer, I want to upload a CSV payload of documents so that end-user can search them**
+
+**POST documents** `/indexes/:indexUid/documents`
+
+```bash
+curl \
+  -X POST 'http://localhost:7700/indexes/movies/documents' \
+  -H 'Content-Type: text/csv' \
+  --data--binary '
+    "id","label","price:number","colors","description"\n
+    "1","hoodie","19.99","purple","Hey, you will rock at summer time."
+  '
+```
+> 202 Accepted - Response
+
+**PUT documents** `/indexes/:indexUid/documents`
+
+```bash
+curl \
+  -X PUT 'http://localhost:7700/indexes/movies/documents' \
+  -H 'Content-Type: text/csv' \
+  --data-binary '
+    "id","label","price:number","colors","description"\n
+    "1","hoodie","19.99","purple","Hey, you will rock at summer time."
+  '
+```
+> 202 Accepted - Response
+
+##### Errors
+
+- 🔴 Omitted `Content-Type` header will lead to a 415 Unsupported Media Type - **missing_content_type** error code.
+- 🔴 Sending an empty `Content-Type` will lead to a 415 Unsupported Media Type - **invalid_content_type** error code.
+- 🔴 Sending a different `Content-Type` than `application/json`, `application/x-ndjson` or `text/csv` will lead to 415 Unsupported Media Type  **invalid_content_type** error code.
+- 🔴 Sending an empty payload will lead to a 400 Bad Request - **missing_payload** error code.
+- 🔴 Sending a different payload type than the `Content-Type` header should return a 400 Bad Request - **malformed_payload** error code.
+- 🔴 Sending a payload excessing the limit will lead to a 413 Payload Too Large - **payload_too_large** error code.
+- 🔴 Sending an invalid CSV format will lead to a 400 bad_request - **malformed_payload** error code.
+- 🔴 Sending a CSV header that does not conform to the specification will lead to a 400 bad_request - **malformed_payload** error code.
+
+##### Errors Definition
+
+## missing_content_type
+
+### Context
+
+This error occurs when the Content-Type header is missing.
+
+### Error Definition
+
+HTTP Code: `415 Unsupported Media Type`
+
+```json
+{
+    "message": "A Content-Type header is missing. Accepted values for Content-Type are: :contentTypeList",
+    "code": "missing_content_type",
+    "type": "invalid_request",
+    "link": "https://docs.meilisearch.com/errors#missing_content_type"
+}
+```
+
+- The `:contentTypeList` is inferred when the message is generated. The values are separated by a `,` char. e.g. `application/json`, `text/csv`.
+
+---
+
+## invalid_content_type
+
+### Context
+
+This error occurs when the provided content-type is not handled by the API method.
+
+### Error Definition
+
+HTTP Code: `415 Unsupported Media Type`
+
+```json
+{
+    "message": "The Content-Type :contentType is invalid. Accepted values for Content-Type are: :contentTypeList",
+    "code": "invalid_content_type",
+    "type": "invalid_request",
+    "link": "https://docs.meilisearch.com/errors#invalid_content_type"
+}
+```
+
+- The `:contentTypeList` is inferred when the message is generated. The values are separated by a `,` char. e.g. `application/json`, `text/csv`.
+
+---
+
+## missing_payload
+
+### Context
+
+This error occurs when the client does not provide a mandatory payload to the request.
+
+### Error Definition
+
+HTTP Code: `400 Bad Request`
+
+```json
+{
+    "message": "A :payloadType payload is missing.",
+    "code": "missing_payload",
+    "type": "invalid_request",
+    "link": "https://docs.meilisearch.com/errors#missing_payload"
+}
+```
+
+- The `:payloadType` is inferred when the message is generated. e.g. `json`, `ndjson`, `csv`
+
+---
+
+## malformed_payload
+
+### Context
+
+This error occurs when the format sent in the payload is malformed. The payload contains a syntax error.
+
+### Error Definition
+
+HTTP Code: `400 Bad Request`
+
+```json
+    "message": ":syntaxErrorHelper. The :payloadType payload provided is malformed.",
+    "code": "malformed_payload",
+    "type": "invalid_request",
+    "link": "https://docs.meilisearch.com/errors#malformed_payload"
+```
+
+- The `:payloadType` is inferred when the message is generated. e.g. `json`, `ndjson`, `csv`
+- The `:syntaxErrorHelper` is inferred when the message is generated.
+
+---
+
+## 2. Technical details
+n/a
+
+## 3. Future possibilities
+
+- Provide an interface in the future dashboard to upload CSV data into an index and optionally provide the headers types.
+- Set a payload limit directly related to the type of data format. Currently, the payload size is equivalent to [JSON payload size](https://docs.meilisearch.com/reference/features/configuration.html#payload-limit-size). Metrics on feature usage and configuration update should help to choose a better suited value for this type of data format.