Skip to content

0.22.1 - Influence Weights in Vector Creation

Compare
Choose a tag to compare
@etiennedi etiennedi released this 04 Feb 09:32

Docker image/tag: semitechnologies/weaviate:0.22.1
See also: example docker compose files in english and dutch.

Breaking Changes

none

New Features

  • Override weights on vector creation (#1070 and #1074)
    Prior to this release the weight of each individual word when creating a vector from an object was out of the user's control. The contextionary uses an algorithm based on the general occurrence of the word in its training data, to suggest how each word should be weighted. The underlying assumption is that a rare word should take more precedence over a very common word, similar to tf-idf.

    This works well in most cases, but in some use-case specific domain languages common words get a new meaning and therefore their importance should change. Imagine the words "far" and "near". They are quite common in overall language, so - especially when mixed with rarer words - they wouldn't get a great weight. However, now assume you're in the domain of optometry or manufacturing glasses. In the terms "far-sighted" and "near-sighted", the words "near" and "far" make a very important distinction. Imagine you were trying to classify objects based on those terms. With the changes in 0.22.1 you can now influence - or even completely override - the weights of individual words when creating vectors.

    To do so, the field vectorWeights was introduced to the Thing and Action objects. The field is a key-value map where both the keys and the values must be strings. The keys are the words you want to influence and the value is a mathematical expression to set the new weight. You can use additions, subtractions, multiplications, divisions or simply overwrite the weight with a fixed number. To reference the original weights, use the single-letter variable w. Some examples:

    • "vectorWeights": {"far": "10 * w"}
      Give the word "far" 10 times its original weight

    • "vectorWeights": {"far": "w + 0.5", "near": "w - 0.5"}
      Give the word "far" an absolute boost of 0.5, while penalizing the word "near" by 0.5.

    • "vectorWeights": {"sighted": "0.7", "glasses": "2 - 4 * w"}
      Let the word "sighted" have a fixed weight of 0.7 whereas the word "glasses" is calculated by subtracting 4 times the original weight from the number 2.

    Some important things to note:

    • For this feature to work you need a contextionary version of at least ...v0.4.7. The example docker-compose files linked above have already been updated to the required version.
    • Spaces in math expressions have no meaning.
    • A word that is not referenced in "vectorWeights" will simply use its original weight as returned by the contextionary.
    • Custom vectorWeights only affect the object which they are set on, there is no option to globally manipulate a specific word. If the same vectorWeights are required for multiple objects, simply attach them to all objects where needed.
    • Whenever the mathematical expression is not a fixed number (such as "17") an operator must be present. It is not valid to use implicit operators, such as "2w" which would mean "two times the original weight". In this case explicitly use the multiplication operator, e.g. "2 * w" or "w*2".

    Full example

    Here's a full example for importing a thing object

    POST /v1/things

    {
     "class": "Glasses",
     "schema": {
       "description": "These glasses are meant for far-sighted people"
     },
     "vectorWeights": {
       "far": "5 * w",
       "near": "5 * w"
     }
    }
    

    The above example will boost the words "far" or "near" by a factor of 5. Note that the object does not contain the word "near", so only the word "far" is boosted. The other unreferenced words maintain their original weights.

Fixes

none