Skip to content
Rudi edited this page May 4, 2018 · 2 revisions

Quick Start

The basic API of PreDict is pretty simple:

  • In a first step, instantiate the PreDict class with an object of some PreDictCustomizing implementation. Using the NoopPreDictCustomizing leads to a setup without any modifications to the basic PreDict functionality. You can also use the CommunityCustomizing for a pretty good preset modification. Check the PreDictFactory class for example code.

  • Use the "indexWord(String)" method to add strings, that should be searchable trough that created index. Adding the same word several times can be used, to increase it's weight in relation to other words (=> term frequency that can be accessed trough the SuggestItems at PreDict::adjustFinalResult).

  • To retrieve words in a fuzzy manner, use the method "findSimilarWords(String)". It will return a list of strings with "relevant" strings based on the used Settings and Customization.

Example

// init predict
PreDictSettings a = new PreDictSettings();
a.setEditDistanceMax(2);
a.setAccuracyLevel(PreDict.AccuracyLevel.maximum);
a.setTopK(10);
    
CommunityCustomization ce = new CommunityCustomization(a);
PreDict dictIndex = new PreDict(ce);

// index the data you want to search:
for(String word : new String[]{"ingest", "words", "from", "some", "file", "or", "word", "database"}) {
    dictIndex.indexWord(word);
}

// then you can search on it:
String queryWord = "wort";
List<String> foundWords = dictIndex.findSimilarWords(queryWord);
System.out.println("found words: "+foundWords);

// the index can also be extended afterwards
dictIndex.indexWord("more");
List<String> foundWords = dictIndex.findSimilarWords("mores");
System.out.println("found words: "+foundWords);

Customizing PreDict

To influence the behaviour of PreDict, either use PreDictSettings or a PreDictCustomizing implementation for advanced usage.

Configuration with PreDictSettings

The construtor of PreDict excepts a PreDictCustomizing object, that in turn contains the PreDictSettings. If only the settings should be changed, use the NoopPreDictCustomizing which acts a "transport" object for the configuration. As soon as a PreDict index is constructed, the settings can't be changed for that index. Use separate indexes to compare different settings or different dictionaries.

Options

  • editDistanceMax: configures the maximal accepted edit distance for found words. Setting this to a higher value lowers search performance and increases memory consumption of an index.
  • accuracyLevel: this defines, how precise and how fast the index will work.
    • maximum: the algorithm will work with maximal recall and precision (maximal accuracy)
    • fast: the algorithm will work faster in trade of a lower accuracy
    • topHit: similar to fast, but addionally will always return one hit as a maxium
  • topK: this will limit the maximal result of found words. If a customization is used, it will also receive maximally that amount of possible hits at the "adjustFinalResult()" method.
  • deletionWeight / insertionWeight / replaceWeight / transpositionWeight: these values influence the Damerau-Levenshtein-Distance calculation by using different weights for each according "edit-operation". This way it's possible to prefer results with certain edit operations (e.g. favor replacements over deletions).

PreDictCustomizing

If you like to influence the result in more detail, consider implementing your version of the PreDictCustomizing interface. Check the JavaDoc of the methods provided with that interface and check the CommunityCustomizing class as a sample implementation.

Clone this wiki locally