-
Notifications
You must be signed in to change notification settings - Fork 51
External Index API
FullTextSearch comes with a range of tools to allow an administrator to index the content of Nextcloud in an external search engine, and keep it up-to-date.
A collection
is the list of all documents available on your Nextcloud and their current index status.
Each document is identified by the document provider and the id of the document itself.
The status is updated when
- The document is indexed by your external search engine (marked as indexed)
- The document is modified locally on your Nextcloud (marked as not indexed)
You can create as much collections as you want to link external search engines.
You manage your collections from the occ command:
To list all collections:
./occ fulltextsearch:collection:list
To create a new collection:
./occ fulltextsearch:collection:init <collectionName>
To destroy a collection:
./occ fulltextsearch:collection:delete <collectionName>
Important note: Using the OCS API require to identify as a Nextcloud account with admin rights.
Once a collection is created, you can start requesting your Nextcloud to:
- get a list of documents which needs to be indexed by your external search engine,
- get content of a document,
- mark the document as indexed.
In the following examples, test
is the name used at the creation of the collection
**Get list of documents that needs to be indexed (or re-indexed): **
curl -X GET "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/index?format=json&length=50" -H "OCS-APIRequest: true" -u "admin:password"
{
"ocs": {
"meta": {
"status": "ok",
"statuscode": 200,
"message": "OK"
},
"data": [
{
"url": "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/document/files/597996",
"status": 28
}
]
}
}
-
length
is used to set a limit to the number of result to be returned by the request.
Each entry from data
returned by this request represents a document that needs to be indexed:
-
url
is the url to be requested to get content and metadata from the document (see next step) -
status
is a bitflag describing the difference between the current document and last time it was indexed:-
4
means metadata have been modified lately, -
8
means content have been modified lately, -
16
means sub-parts have been modified lately, -
28
means all data should be re-indexed, -
32
means document is not available anymore and index should be removed from the external search engine.
-
Get details about a document
Running a GET
request using the url
from the previous step will returns metadata, sub-parts and content from a document.
curl -X GET "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/document/files/597996" -H "OCS-APIRequest: true" -u "admin:password"
{
"ocs": {
"meta": {
"status": "ok",
"statuscode": 200,
"message": "OK"
},
"data": {
"id": "597996",
"providerId": "files",
"access": {
"ownerId": "cult",
"viewerId": "",
"users": ['test1', 'test2'],
"groups": [],
"circles": [],
"links": []
},
"index": {
"ownerId": "cult",
"providerId": "files",
"collection": "test",
"source": "files_local",
"documentId": "597996",
"lastIndex": 0,
"errors": [],
"errorCount": 0,
"status": 28,
"options": []
},
"title": "640-240-max.png",
"link": "http://cloud.example.net/index.php/f/597996",
"parts": {
"comments": "<test1> This is a comment !"
},
"content": "VGhlIHF1aWNrIGJyb3duIGZveApqdW1wcyBvdmVyCnRoZSBsYXp5IGRvZy4=",
"isContentEncoded": 1
}
}
}
Notes:
- if
isContentEncoded
is1
thencontent
is encoded withbase64
. - in case of Office document, the whole content of the file is sent; it is up to your search engine to extract its text content.
- in case of image, and if
files_fulltextsearch_tesseract
is installed and configured, the image is OCR and the text content is returned.
Set document as indexed:
Once the document is indexed on the external search engine, you need FullTextSearch to tell about it;
run a POST request, using the url
from the first step:
curl -X POST "https://cloud.example.net/ocs/v2.php/apps/fulltextsearch/collection/test/document/files/597996/done" -H "OCS-APIRequest: true" -u "admin:password"
{
"ocs": {
"meta": {
"status": "ok",
"statuscode": 200,
"message": "OK"
},
"data": []
}
}
After this, the document will not be listed when retrieving the list of documents that needs to be indexed, unless document is modified.