You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tokenizers are pieces of C code which allow sqlite to break a sequence of characters into a series of tokens which the can be used for more accurate full-text-search (via the fts5 extension).
The basic idea here is to allow people to implement their own custom tokenizers. This is partly achieved right now.
Add a tokenizers key into package.json, it's an array of strings with the name of each tokenizer
A header file is generated with the name of the functions to be implemented. Basically the entry point to register the tokenizer
An empty implementation file is created. The user will then have to inside of this tokenizer file and implement the corresponding C code to create their tokenizer.
Via macros the function is injected and executed when the db connection is opened.
This approach gives a lot of flexibility for each app to implement their own custom tokenizers without digging into op-sqlite code. The ability to generate code and inject it into the compilation process also opens the possibility for other cool stuff like custom aggregators and functions, all implemented in C and made available through sqlite's SQL queries.
Currently, this is partly implemented but looking for sponsors to finish the work as it needs:
Android support. Figure out how to replicate the codegen and file inclusion into CMakeLists.
Polish the inclusion of the header files to make sure it is robust
Tests
Here an example of a working simple tokenizer which is already working on the test branch and inside of the sample app.
Upvote & Fund
We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.
The text was updated successfully, but these errors were encountered:
What do you need?
Tokenizers are pieces of C code which allow sqlite to break a sequence of characters into a series of tokens which the can be used for more accurate full-text-search (via the fts5 extension).
The basic idea here is to allow people to implement their own custom tokenizers. This is partly achieved right now.
tokenizers
key into package.json, it's an array of strings with the name of each tokenizerThis approach gives a lot of flexibility for each app to implement their own custom tokenizers without digging into op-sqlite code. The ability to generate code and inject it into the compilation process also opens the possibility for other cool stuff like custom aggregators and functions, all implemented in C and made available through sqlite's SQL queries.
Currently, this is partly implemented but looking for sponsors to finish the work as it needs:
Here an example of a working simple tokenizer which is already working on the test branch and inside of the sample app.
Upvote & Fund
The text was updated successfully, but these errors were encountered: