-
Notifications
You must be signed in to change notification settings - Fork 1
QueriesAndResults
Here are a few idioms and code snippets that show how to do queries with HGDB and how to work with result sets. Full documentation of the API is within the Javadocs.
Unlike other parts of the HyperGraphDB, the querying system is not yet extensible because we don't have a robust framework for interpreting and optimizing queries yet. On the other, since even low-level APIs (such as indexing) are public it is possible implement any custom query conceivable, but with more work.
The current querying paradigm takes the universe of atoms stored in a HyperGraphDB as the starting result set and lets you restrict it via various conditions constraining the atoms' values, types and linkage structure. The result of a query is always a stream of atoms.
Because there is no query language for HGDB at the time of this writing (though we've outlined some ideas on TowardsHyperGraphQueryLanguage page), queries are build up as query conditions, classes implementing the HGQueryCondition
interface, and submitted via a call to HyperGraph.find
. For example:
HGQueryCondition cond = new And(new AtomTypeCondition(MyLink.class), new IncidentCondition(atom));
HGSearchResult<HGHandle> rs = graph.find(cond);
while (rs.hasNext()) System.out.println(rs.next());
rs.close();
will retrieve all links of type MyLink
that point to atom
(a HGHandle
). A more coding friendly API is provided by the HGQuery.hg
class which, with Java 5 static imports, can be used like this:
import org.hypergraphdb.HGQuery.hg;
HGQueryCondition cond = hg.and(hg.type(MyLink.class), hg.incident(atom));
HGSearchResult<HGHandle> rs = graph.find(cond);
// etc...
In both cases a HGSearchResult
is returned which behaves roughly like a JDBC result set. It implements the standard java.util.Iterator
interface and an extension to it for moving backwards org.hypergraphdb.TwoWayIterator
. Thus, it can be traversed back and forth via calls to its hasNext, next, hasPrev
and prev
methods.
The HGQuery.hg
interface offers a few additional convenience methods to avoid having to deal with HGSearchResult
:
findOne | Retrieve the first item from the query result set |
---|---|
getOne | Same as findOne, but implicitly dereference the item, assuming it is a HGHandle |
findAll | Retrieve all results from a query and put them in a java.util.List |
getAll | Same as findAll, but implicitly dereference each item. |
In general, we strongly advise using those methods when the result set is not very large and when the whole set will have to be traversed. More on the rationale of this recommendation below.
In addition, to every HGQueryCondition
class that one finds in the org.hypergraphdb.query
package, there corresponds a convenience constructor method in HGQuery.hg
interface. To get a full view of this interface, visit its Javadocs. In what follows, we won't be using the "raw" conditions, but this convenience API.
The three standard logical operators and, or and not are supported via calls to the variable argument methods hg.and
, hg.or
and the hg.not
method. It must be noted that the not operator can result in inefficient queries because most of the time it cannot be translated to an index lookup and the negated condition needs to be used as a predicate while scanning a potentially large result set.
Each atom has a type and a value. And values may be composite records. To constrain the type of an atom use the hg.type
or hg.typePlus
methods. The first returns a condition constraining the result set to a specific type and the second returns a condition constraining it to a all sub-types of a given type. Both take either the HGHandle
of the type of interest or a Java Class
that has been mapped to a HyperGraphDB type. The hg.typePlus
condition is equivalent to an "or" between all sub-types. For example, if class (or interface) A has two derived class B and C then:
List<HGHandle> result = hg.findAll(graph, hg.typePlus(A.class));
is equivalent to:
List<HGHandle> result = hg.findAll(graph, hg.or(hg.type(B.class), hg.type(C.class)));
Note that (since a Java can implement mulitple interfaces), it might make sense to have a query like hg.and(hg.typePlus(X), hg.typePlus(Y))
, but an "and" between two exact types like hg.and(hg.type(X), hg.type(Y))
would always return an empty result set.
Atom values are constrained with the following set of operators (i.e. methods in the HGQuery.hg
interface):
eq(Object x) | The value must be equal to the passed in object. |
---|---|
eq(String part, Object x) | The property part of a complex value must be equals to x. |
lt, lte | Less-than and less-than-or-equal operators. Note that this only works when atom values actually have an order relation defined on them. |
gt, gte | Greater-than and greater-than-or-equal operators. |
Note that all the operator in the above table work both on atom values proper and on value parts (e.g. bean properties).
For instance, here is a query that find all atoms of a hypothetical type PathLink
whose label
property has the value "highway" and whose length
property is greater than 1000:
List<PathLink> longHighways = hg.getAll(graph, hg.and(hg.type(PathLink.class), hg.eq("label", "highway"), hg.gt("length", 1000)));
The atoms in the result can be constrained to be links pointing to a set of atoms and/or targets to a set links. The two main operators are hg.target(linkHandle)
and hg.incident(atomHandle)
. The first states that an atom in the results should be a target of (i.e. a member of the outgoing set of, i.e. pointed to by) the link identified by linkHandle
. Conversely, the second states that an atom in the result set should be a link pointing to (i.e. incident to) the atom identified by atomHandle
.
When searching for links that have a known target set, or where a subset of the target set is known, use the hg.link
or hg.orderedLink
operators. Both take an arbitrary number of atom handles as parameters. The hg.link
operator ignores the order of the atoms in a target set while the hg.orderedLink
operator doesn't. When searching for ordered links, which is probably the most common case, one must pay attention to the link's arity in addition to listing its target set in the desired order. An expression like
hg.orderedLink(x, y, z)
while return all links that point to x, y and z in that order regardless of whether their target sets contain other atoms before, after or in-between x, y and z. Thus a link with a target set [a, x, b, y, z, c, d] will match. One could further constrain the link's arity:
hg.and(hg.orderedLink(x, y, z), hg.arity(3))
But it is much more common to have a type constraint in the conjunction:
hg.and(hg.type(linkType), hg.orderedLink(x, y, z))
because in general links of a given type all have the same arity and a type condition is perhaps the most direct and efficient way of reducing the set of possible results (otherwise all atoms in the graph must be scanned).
A very common situation is when you know some of the link's targets at specific positions, but not all. Take the HGSubsumes
link for example. Such a link is added between two types A and B whenever A is a more general type (e.g. a parent class) of which B is a specific case (a derived class). To find all HGSubsumes
links with A is the parent the following conditions wouldn't work:
hg.orderedLink(A)
because it doesn't say whether A must appear in the first or in the second position of the link. To remedy the problem, one should full specify the target set by putting an any indicator at positions where the actual target is unknown. This any indicator is a special HGHandle
constant defined in the HGHandleFactory
class and also available by calling hg.anyHandle()
. Thus the above condition should be rewritten as:
hg.orderedLink(A, hg.anyHandle())
Standard graph traversals are also implemented as search operators in the form of hg.bfs
(for breadth-first search) and hg.dfs
(depth-first search) family of methods. Traversal search conditions are translated to the implementation in the org.hypergraphdb.algorithms
package and use the default adjency list generator to do the traversal.
The stream of atoms resulting from a query can be transformed by applying mapping to each item. This is done with the hg.apply
operator. There are several common predefined mappings in the HGQuery.hg
class such as hg.deref
, hg.linkProjection
and hg.targetAt
. All of those method return an instance of the org.hypergraphdb.utils.Mapping
interface which essentially defines a one argument function. For example, to obtain the actual sub-types from the set of HGSubsumes
link resulting from the condition above, one could write:
hg.apply(hg.linkProjection(1, hg.and(hg.type(HGSubsumes.class), hg.orderedLink(A, hg.anyHandle()), hg.incident(A))))
This condition says: all targets at position 1 (i.e. the second target) of HGSubsumes
links whose first target is A.
While it is not possible to define custom query conditions at the moment, one can easily define new mappings for use in expressions such as above.
It is very important that HGSearchResult
instances be closed properly and promptly (i.e. as soon as possible). They will generally hold an open cursor on the filesystem and must be closed like any other external resource, be it an open file or a socket connection, or an SQL result set. And unlike the example above, as a good coding practice working with a HGSearchResult
should always be enclosed in a try ... finally
block:
HGSearchResult<HGHandle> rs = graph.find(cond);
try { use rs here }
finally { HGUtils.closeNoException(rs); }
Note the use of HGUtils.closeNoException
above. In the HGDB API, we've chosen not to throw any checked exceptions because it's annoying when you don't care about them. However, as with many other HGDB methods, a call the HGSearchResult
may well throw an HGException
. If you don't want that exception to propagate and interrupt your program, then the HGUtils.closeNoException
is a convenient shorthand for code such as try { rs.close(); } catch (Throwable t) { }
.
The most common case of a deadlock in HGDB happens due to a non-closed search result set. Whenever a result set is not closed, it keeps a lock on some part of the database which makes it impossible to write data to that part. This can happen in a multithreaded as well as in a single-threaded application, but it mostly happens within a single thread! In multi-threaded applications, conflicts are resolved by repeatedly retrying transactions in a random order. Thus, if there is a thread scanning a result set while another thread is trying to write some data, the latter transaction will be retried until the result set from the former gets closed and the database unlocked. But in a single thread, an open result set followed by an attempt to write will lock indefinitely because there result set never gets the chance to close. For this reason, it is a good coding practice to first read all results that one cares about (e.g. in a Java collection), close the result set, and then continue with actual processing.