Nextcloud Full Text Search - SQL Platform

This is an extension to the Full text search framework.

It allows you to index your content into the usual Nextcloud database.

Warning: This app will store all indexed content in your Nextcloud database twice (roughly) - once as plain text and once as searchable index. This means that your database can easily grow hundreds of megabytes or even gigabytes in size if you have much indexable content (e.g. documents).

Compatibility

The extension requires your Nextcloud database to be MySQL or PostgreSQL.

Status

This is currently just a proof of concept. I just wanted to find out why Nextcloud enforces the use of additional components.

What works:

Indexing of plain text
Indexing of text in PDF documents
- This is done by extracting the text via Smalot/PdfParser.
- This app itself does NOT do optical chracter recognition (OCR)! If your files don't already contain the extracted text, maybe the files_fulltextsearch_tesseract app is for you. I haven't tested it together with this app.
MySQL (tested in CI pipeline and in real world usage)
PostgreSQL (tested in CI pipeline)
- Plainly assumes "english" configuration (which influences stopwords and normalization)
Basic searching
- If the database is MySQL, it uses Boolean Full-Text Searches, so you can use operators like + and -, as well as a trailing * wildcard
- If the database is PostgreSQL, the query is converted using websearch_to_tsquery, so you can use - for exclusions and quote text to enforce word groups
Passing the occ fulltextsearch:test harness

What does NOT work:

Indexing of Office documents: The upstream fulltextsearch_elasticsearch app simply passes the files on to the Elasticsearch Attachment processor, which in turn uses Apache Tika for processing. Since I want to keep this app lean, I don't want to pull in any Java dependencies.
"Advanced" features of the full text search framework. There are fields for tags, metatags, subtags, parts and whatnot. I have no idea yet what they are used for. The app just stores them on indexing and returns them in search results, but doesn't search those fields.
SQLite: Might be implementable, but I haven't spent more time than a quick search for "fulltext search sqlite"

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/workflows		.github/workflows
appinfo		appinfo
js		js
lib		lib
templates		templates
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
composer.lock		composer.lock
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nextcloud Full Text Search - SQL Platform

Compatibility

Status

About

Uh oh!

Releases 6

Contributors 3

Uh oh!

Languages

License

jplitza/fulltextsearch_sql

Folders and files

Latest commit

History

Repository files navigation

Nextcloud Full Text Search - SQL Platform

Compatibility

Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Contributors 3

Uh oh!

Languages