-
Notifications
You must be signed in to change notification settings - Fork 118
Command Quick Reference
The following describes the create, append, and search commands previously mentioned and the reconstruct-graph command. Since this is part of the NGT command reference, please see the reference for full details.
Create and initialize the specified directory as an index, and insert the specified data into the index.
ngt create -d no_of_dimensions [-o object_type] [-D distance_function] [-E no_of_edges] [-S no_of_edges_at_search_time] index [registration_data]
index
Specify the name of the directory for the index to be created as the index. The directory consists of multiple files for the index.
registration_data
Specify the vector data to be registered. These data consist of one object (data item) per line, and each dimensional element is delimited by a space or tab. If omitted, the specified directory is just created and initialized as the index.
-d no_of_dimensions
Specify the number of dimensions of registration data. Specification is unnecessary if each row of the registration data file consists of only dimensional elements. However, any attribute information or other types of data following the dimensional elements will be discarded.
-o object_type
Specify the data object type.
Option | Type | Size (byte) |
---|---|---|
f | Float (default) | 4 |
c | Positive integer (binary) | 1 |
Positive integers are used to handle not only positive integers, but also binary data, with each integer value represented as one 8-bit binary. The number of dimensions refers to the number of 8-bit binary values, for example, if a user's data requires 64 bits, 8 dimensions are specified. If the number of bits is not a multiple of 8, zeroes must be added, for example, a 60-bit binary value has 4 zero bits added to equal 64. Hamming and Jaccard distances can only be represented as binary data.
-D distance_function
Specify the distance function as follows.
Option | Distance | Type |
---|---|---|
1 | L1 distance | Float | Positive integer |
2 | L2 distance (default) | Float | Positive integer |
a | Angle distance | Float | Positive integer |
A | Normalized angle distance*1 | Float | Positive integer |
c | Cosine similarity | Float | Positive integer |
C | Normalized cosine similarity*1 | Float | Positive integer |
h | Hamming distance | Positive integer |
j | Jaccard distance | Positive integer |
*1: The specified data are automatically normalized to be appended to the index.
-E no_of_edges (default = 10)
Specify the number of initial edges of each node at graph generation time. Once an index has been generated, the number of edges of each node will be equal to or greater than the value specified here.
-S no_of_edges_at_search_time (default=40)
Specify the number of edges at search time accompanying or following index generation. This value is used when not specifying the number of edges by the search command. It is specified to conduct searches by a number of edges less than the actual number of edges of each node in the graph. If -2 is specified, the optimal number of edges is calculated by the search command.
Append the specified data to the specified index.
ngt append index registration_data
index
Specify the name of the existing index.
registration_data
Specify the vector data to be registered. These data consist of one object (data item) per line and each dimensional element is delimited by a space or tab.
Search the index using the specified query data.
ngt search [-n no_of_searches] [-r search_radius] [-e search_range_coefficient] [-E max_no_of_edges] index query_data
index
Specify the name of the existing index.
query_data
Specify the name of the file containing query data. This file consists of one item of query data per line, and each dimensional element is delimited by a space or tab, like the registration data. Each search is sequentially performed when multiple queries are provided.
-n no_of_search_results (default: 20)
Specify the number of search results.
-e search_range_coefficient (default = 0.1 recommended)
Specify the magnification coefficient (epsilon) of the search range. A larger value means greater accuracy but slower performance, while a smaller value means a drop in accuracy but faster performance. While it is desirable to adjust this value within the range of 0–0.3, a negative value may also be specified.
-E max_no_of_edges (default = value specified by the create command or 40)
Specify the maximum number of edges to be used in the search. This option is specified when conducting a search with fewer edges than the number of edges of each node on the graph.
Construct the index with the reconstructed graph from the specified index.
$ ngt reconstruct-graph [-m mode] [-I graph_type] -o no_of_outgoing_edges -i no_of_incoming_edges input_index reconstructed_index
input_index
Specify the name of the existing index.
reconstructed_index
Specify the name of the reconstructed index.
-o no_of_outgoing_edges
Specify the number of edges for each node to add to the reconstructed graph from the input graph. The specified number also means the lower bound of the outdegrees of the reconstructed graph.
-i no_of_incoming_edges
Specify the number of edges for each node to add to the reconstructed graph from the input graph. Unlike no_of_ooutgoing_edges, after the direction of the edges are reveresed, the edges are added to the reconstructed graph. The specified number also means the lower bound of the indegrees of the reconstructed graph.
-m mode
Specify the mode of the shortcut reduction.
- S: Shortcut reduction (default)
- s: No shortcut reduction
-s mode
Specify the mode of the search parameter optimization.
- s: Search edge parameter optimization
- p: Prefetch parameter optimization
- a: Accuracy table generation
- -: All of the above (default)
-I graph_type
Specify the type of the specified index as input_index. For not ANNG, the index is converted to ANNG before graph reconstruction.
- a: ANNG
- o: The others
Command line tool
Python
C++