You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PreTokenizedString is a way to provide pre tokenized text to tantivy.
I think this can be removed and be handled by the user provided tokenizer instead.
E.g. in the example from pre_tokenized_text.rs below, the field title could contain the serialized JSON instead, which the tokenizer would parse to return the tokens.
let short_man_json = r#"{ "title":[{ "text":"The Old Man", "tokens":[ {"offset_from":0,"offset_to":3,"position":0,"text":"The","position_length":1}, {"offset_from":4,"offset_to":7,"position":1,"text":"Old","position_length":1}, {"offset_from":8,"offset_to":11,"position":2,"text":"Man","position_length":1} ] }]}"#;
The text was updated successfully, but these errors were encountered:
PreTokenizedString
is a way to provide pre tokenized text to tantivy.I think this can be removed and be handled by the user provided tokenizer instead.
E.g. in the example from
pre_tokenized_text.rs
below, the fieldtitle
could contain the serialized JSON instead, which the tokenizer would parse to return the tokens.The text was updated successfully, but these errors were encountered: