TF-Query vs Full-text search #66
Replies: 4 comments 7 replies
-
Here is an idea. When we speak of full text search, we mean searching the full text of a feature by means of an index. Ideally, I would like to leverage existing full-text search tools on feature files. When we get the results back, we are not only interested in the result strings, but foremost into the nodes that We can explode Then we index the feature file, and we expect that search results return line numbers as well. |
Beta Was this translation helpful? Give feedback.
-
The TF-Query algorithm starts with a stage where it evaluates all conditions on feature values separately. When we have full-text search, we can consult the index instead of walking through all nodes, and I expect a significant gain for many queries. |
Beta Was this translation helpful? Give feedback.
-
Related to this is a query where we look for the equality of values of (different) features for (different) nodes. This is currently a rather slow operation. Maybe we can speed this up as well by means of a full text index. |
Beta Was this translation helpful? Give feedback.
-
Currently I am building a phonological search interface for a modestly sized corpus of Neo-Aramaic texts. The idea is that users can search in the full-text with regexes, but also have ways to specify CV patterns (consonant-vowel), articulation-place patterns (dental-labial-velar etc), and more. So I create full-texts in different representations, remembering the mapping between character positions in each representation and the original character position. Then users can write a query for each layer and the results will be intersected and highlighted. I use Text-Fabric to prepare a big blob of json data, and then I write a Javascript program to do the search. The result will be a single-page web app, that needs a big load of static data files (several MB). |
Beta Was this translation helpful? Give feedback.
-
One thing TF-Query does not do particularly well: full-text search.
TF-Query is a walker at heart: if walks over nodes and inspects feature values.
It is not a helicopter: it does not make use of full-text indexes.
How can we do better?
We could go over to other interfaces that are good in full-text search, e.g.
blacklab
It is not that hard to export a corpus to blacklab and do full-text searches there.
But:
can be stated in terms of text topology)
While I intend to go the blacklab route (because blacklab has other nice things), I do not want to give up on full-text search inside TF-Query.
Beta Was this translation helpful? Give feedback.
All reactions