Skip to content

Commit c1007cf

Browse files
shirly121siyuan0322longbinlai
authored
[GIE Compiler] Introduce cypher service to accept queries from neo4j ecosystem (#2848)
<!-- Thanks for your contribution! please review https://github.com/alibaba/GraphScope/blob/main/CONTRIBUTING.md before opening an issue. --> ## What do these changes do? 1. introdure `GraphServer` which wrappers `IrGremlinServer` (gremlin service) and `CommunityBootstrapper` (cypher service) 2. remove cypher service from gremlin stack 3. add document of neo4j ecosystem <!-- Please give a short brief about these changes. --> ## Related issue number <!-- Are there any issues opened that will be resolved by merging this change? --> #2598 --------- Co-authored-by: siyuan0322 <[email protected]> Co-authored-by: Longbin Lai <[email protected]> Co-authored-by: longbinlai <[email protected]>
1 parent 7637aaf commit c1007cf

File tree

52 files changed

+2029
-861
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+2029
-861
lines changed

charts/gie-standalone/templates/frontend/statefulset.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ spec:
9696
cd /opt/graphscope/interactive_engine/compiler && ./set_properties.sh
9797
java -cp ".:./target/libs/*:./target/compiler-0.0.1-SNAPSHOT.jar" \
9898
-Djna.library.path=../executor/ir/target/release \
99-
-Dgraph.schema=/etc/groot/config/$GRAPH_SCHEMA com.alibaba.graphscope.gremlin.service.GraphServiceMain
99+
-Dgraph.schema=/etc/groot/config/$GRAPH_SCHEMA com.alibaba.graphscope.GraphServer
100100
{{- end }}
101101
env:
102102
- name: GAIA_RPC_PORT

charts/graphscope-store/templates/configmap.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ data:
5252
5353
## Frontend Config
5454
gremlin.server.port=12312
55+
## disable neo4j when launching groot server by default
56+
neo4j.bolt.server.disabled=true
5557
5658
executor.worker.per.process={{ .Values.executorWorkerPerProcess }}
5759
executor.query.thread.count={{ .Values.executorQueryThreadCount }}

docs/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ and the vineyard store that offers efficient in-memory data transfers.
6464
interactive_engine/getting_started
6565
interactive_engine/deployment
6666
interactive_engine/tinkerpop_eco
67+
interactive_engine/neo4j_eco
6768
.. interactive_engine/guide_and_examples
6869
interactive_engine/design_of_gie
6970
.. interactive_engine/supported_gremlin_steps
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# GIE for Cypher
2+
This document will provide you with step-by-step guidance on how to connect your Cypher applications to the GIE's
3+
FrontEnd service, which offers functionalities similar to the official Tinkerpop service.
4+
5+
Your first step is to obtain the Bolt Connector of GIE Frontend service:
6+
- Follow the [instruction](./dev_and_test.md#manually-start-the-gie-services) while starting GIE on a local machine.
7+
8+
## Connecting via Python Driver
9+
10+
GIE makes it easy to connect to a loaded graph with Neo4j's [Python Driver]](https://pypi.org/project/neo4j/).
11+
12+
You first install the dependency:
13+
```bash
14+
pip3 install neo4j
15+
```
16+
17+
Then connect to the service and run queries:
18+
19+
```Python
20+
from neo4j import GraphDatabase, RoutingControl
21+
22+
URI = "neo4j://localhost:7687" # the bolt connector you've obtained
23+
AUTH = ("", "") # We have not implemented authentication yet
24+
25+
def print_top_10(driver):
26+
records, _, _ = driver.execute_query(
27+
"MATCH (n) RETURN n Limit 10",
28+
routing_=RoutingControl.READ,
29+
)
30+
for record in records:
31+
print(record["n"])
32+
33+
34+
with GraphDatabase.driver(URI, auth=AUTH) as driver:
35+
print_top_10(driver)
36+
```
37+
38+
39+
## Connecting via Cypher-Shell
40+
1. Download and extract `cypher-shell`
41+
```bash
42+
wget https://dist.neo4j.org/cypher-shell/cypher-shell-4.4.19.zip
43+
unzip cypher-shell-4.4.19.zip && cd cypher-shell
44+
```
45+
2. Connect to the Bolt Connector
46+
```bash
47+
./cypher-shell -a neo4j://localhost:7687
48+
```
49+
3. Run Queries
50+
```bash
51+
@neo4j> Match (n) Return n Limit 10;
52+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Cypher Support
2+
This document outlines the current capabilities of GIE in supporting Neo4j's Cypher queries and
3+
compares them to the [syntax](https://neo4j.com/docs/cypher-manual/current/syntax/) specified in Neo4j.
4+
While our goal is to comply with Neo4j's syntax, GIE currently has some limitations.
5+
One major constraint is that we solely support the **read** path in Cypher.
6+
Therefore, functionalities associated with writing, such as adding vertices/edges or modifying their properties, remain **unaddressed**.
7+
8+
We provide in-depth details regarding Cypher's support in GIE, mainly including data types, operators and clauses.
9+
We further highlight planned features that we intend to offer in the near future.
10+
While all terminologies, including data types, operators, and keywords in clauses, are case-insensitive in this document, we use capital and lowercase letters for the terminologies of Neo4j and GIE, respectively, to ensure clarity.
11+
12+
## Data Types
13+
As [Neo4j](https://neo4j.com/docs/cypher-manual/current/values-and-types), we have provided support for
14+
data value of types in the categories of **property**, **structural** and **constructed**.
15+
However, the specific data types that we support are slightly modified from those in Cypher to ensure compatibility with our storage system. Further details will be elaborated upon.
16+
17+
### Property Types
18+
The available data types stored in the vertices (equivalent of nodes in Cypher) and edges (equivalent of relationships in Cypher), known as property types, are divided into several categories including Boolean, Integer, Float, String, Bytes, Placeholder and Temporal. These property types are extensively utilized and can be commonly utilized in queries and as parameters -- making them the most commonly used data types.
19+
20+
| Category | Cypher Type | GIE Type | Supported | Todo |
21+
|:---|:---|:---|:---:|:---|
22+
| Boolean | BOOLEAN | bool | <input type="checkbox" disabled checked /> | |
23+
| Integer | INTEGER | int32/uint32/int64/uint64 | <input type="checkbox" disabled checked /> | |
24+
| Float | FLOAT | float/double | <input type="checkbox" disabled checked /> | |
25+
| String | STRING | string | <input type="checkbox" disabled checked /> | |
26+
| Bytes| BYTE_ARRAY | bytes | <input type="checkbox" disabled checked /> | |
27+
| Placeholder | NULL | none | <input type="checkbox" disabled /> | Planned |
28+
| Temporal | DATE | date | <input type="checkbox" disabled /> | Planned |
29+
| Temporal | DATETIME (ZONED) | datetime (Zoned) | <input type="checkbox" disabled /> | Planned |
30+
| Temporal | TIME (ZONED) | time (Zoned) | <input type="checkbox" disabled /> | Planned |
31+
32+
### Structural types
33+
In a graph, Structural Types are the first-class citizens and are comprised of the following:
34+
- Vertex: It encodes the information of a particular vertex in the graph. The information includes the id, label, and a map of properties. However, it is essential to note that multiple labels in a vertex are currently unsupported in GIE.
35+
- Edge: It encodes the information of a particular edge in the graph. The information comprises the id, edge label, a map of properties, and a pair of vertex ids that refer to source/destination vertices.
36+
- Path: It encodes the alternating sequence of vertices and conceivably edges while traversing the graph.
37+
38+
|Category | Cypher Type | GIE Type | Supported | Todo |
39+
|:---|:---|:---|:---:|:---|
40+
|Graph | NODE | vertex | <input type="checkbox" disabled checked /> | |
41+
|Graph | RELATIONSHIP | edge | <input type="checkbox" disabled checked /> | |
42+
|Graph | PATH | path | <input type="checkbox" disabled checked /> | |
43+
44+
### Constructed Types
45+
Constructed types mainly include the categories of Array and Map.
46+
47+
| Category | Cypher Type | GIE Type | Supported | Todo |
48+
|:---|:---|:---|:---:|:---|
49+
| Array | LIST<INNER_TYPE> | int32/int64/double/string/pair Array | <input type="checkbox" disabled checked /> | |
50+
| Map | MAP | N/A | <input type="checkbox" disabled />| only used in Vertex/Edge |
51+
52+
## Operators
53+
We list GIE's support of the operators in the categories of Aggregation, Property, Mathematical,
54+
Comparison, String and Boolean. Examples and functionalities of these operators are the same
55+
as in [Neo4j](https://neo4j.com/docs/cypher-manual/current/syntax/operators/).
56+
Note that some Aggregator operators, such as `max()`, we listed here are implemented in Neo4j as
57+
[functions](https://neo4j.com/docs/cypher-manual/current/functions/). We have not introduced functions at this moment.
58+
59+
60+
| Category | Description | Cypher Operation | GIE Operation | Supported | Todo |
61+
|:---|:----|:---|:----|:---:|:---|
62+
| Aggregate | Average value | AVG() | avg() | <input type="checkbox" disabled checked /> | |
63+
| Aggregate | Minimum value | MIN() | min() | <input type="checkbox" disabled checked /> | |
64+
| Aggregate | Maximum value |MAX() | max() | <input type="checkbox" disabled checked /> | |
65+
| Aggregate | Count the elements |COUNT() | count() | <input type="checkbox" disabled checked /> | |
66+
| Aggregate | Count the distinct elements | COUNT(DISTINCT) | count(distinct) | <input type="checkbox" disabled checked /> | |
67+
| Aggregate | Summarize the value | SUM() | sum() | <input type="checkbox" disabled checked /> | |
68+
| Aggregate | Collect into a list | COLLECT() | collect() | <input type="checkbox" disabled checked /> | |
69+
| Aggregate | Collect into a set | COLLECT(DISTINCT) | collect(distinct) | <input type="checkbox" disabled checked /> | |
70+
| Property | Get property of a vertex/edge | [N\|R]."KEY" | [v\|e]."key" | <input type="checkbox" disabled checked /> | |
71+
| Mathematical | Addition | + | + | <input type="checkbox" disabled checked /> | |
72+
| Mathematical | Subtraction | - | - | <input type="checkbox" disabled checked /> | |
73+
| Mathematical | Multiplication | * | * | <input type="checkbox" disabled checked /> | |
74+
| Mathematical | Division | / | / | <input type="checkbox" disabled checked /> | |
75+
| Mathematical | Modulo division | % | % | <input type="checkbox" disabled checked /> | |
76+
| Mathematical | Exponentiation | ^ | ^^ | <input type="checkbox" disabled checked /> | |
77+
| Comparison | Equality | = | = | <input type="checkbox" disabled checked /> | |
78+
| Comparison | Inequality| <> | <> | <input type="checkbox" disabled checked /> | |
79+
| Comparison | Less than | < | < | <input type="checkbox" disabled checked /> | |
80+
| Comparison | Less than or equal | <= | <= | <input type="checkbox" disabled checked /> | |
81+
| Comparison | Greater than | > | > | <input type="checkbox" disabled checked /> | |
82+
| Comparison | Greater than or equal | >= | >= | <input type="checkbox" disabled checked /> | |
83+
| Comparison | Verify as `NULL`| IS NULL | is null | <input type="checkbox" disabled /> | planned |
84+
| Comparison | Verify as `NOT NULL`| IS NOT NULL | is not null | <input type="checkbox" disabled /> | planned |
85+
| Comparison | String starts with | STARTS WITH | starts with | <input type="checkbox" disabled />| planned |
86+
| Comparison | String ends with | ENDS WITH | ends with | <input type="checkbox" disabled />| planned |
87+
| Comparison | String contains | CONTAINS | contains | <input type="checkbox" disabled />| planned |
88+
| Boolean | Conjunction | AND | and | <input type="checkbox" disabled checked /> | |
89+
| Boolean | Disjunction | OR | or | <input type="checkbox" disabled checked /> | |
90+
| Boolean | Exclusive Disjunction | XOR | xor | <input type="checkbox" disabled /> | planned |
91+
| Boolean | Negation | NOT | not | <input type="checkbox" disabled /> | planned |
92+
| BitOpr | Bit and | via function | & | <input type="checkbox" disabled checked /> | |
93+
| BitOpr | Bit or | via function | \| | <input type="checkbox" disabled checked /> | |
94+
| Boolean | Bit xor | via function | ^ | <input type="checkbox" disabled checked /> | |
95+
| BitOpr | Bit reverse | via function | ~ | <input type="checkbox" disabled checked /> | |
96+
| BitOpr | Bit left shift | via function | << | <input type="checkbox" disabled />| planned |
97+
| BitOpr | Bit right shift | via function | >> | <input type="checkbox" disabled />| planned |
98+
| Branch | Use with `Project` and `Return` | CASE WHEN | CASE WHEN | <input type="checkbox" disabled />| planned |
99+
100+
101+
102+
## Clause
103+
A notable limitation for now is that we do not
104+
allow specifying multiple `MATCH` clauses in **one** query. For example,
105+
the following code will not compile:
106+
```Cypher
107+
MATCH (a) -[]-> (b)
108+
WITH a, b
109+
MATCH (a) -[]-> () -[]-> (b) # second MATCH clause
110+
RETURN a, b;
111+
```
112+
113+
| Keyword | Comments | Supported | Todo
114+
|:---|---|:---:|:---|
115+
| MATCH | only one Match clause is allowed | <input type="checkbox" disabled checked /> |
116+
| OPTIONAL MATCH | implements as left outer join | <input type="checkbox" disabled /> | planned |
117+
| RETURN .. [AS] | | <input type="checkbox" disabled checked /> | |
118+
| WITH .. [AS] | project, aggregate, distinct | <input type="checkbox" disabled checked /> | |
119+
| WHERE | | <input type="checkbox" disabled checked /> | |
120+
| NOT EXIST (an edge/path) | implements as anti join | <input type="checkbox" disabled />| |
121+
| ORDER BY | | <input type="checkbox" disabled checked /> | |
122+
| LIMIT | | <input type="checkbox" disabled checked /> | |
123+

docs/interactive_engine/neo4j_eco.md

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Neo4j Ecosystem
2+
3+
[Neo4j](https://neo4j.com/) is a graph database management system that utilizes graph natively to store and process data.
4+
Unlike traditional relational databases that rely on relational schemas, Neo4j leverages the power of interconnected nodes and relationships,
5+
forming a highly flexible and expressive data model. GIE implements Neo4j's HTTP and TCP protocol so that the system can
6+
seamlessly interact with the Neo4j ecosystem, including development tools such as [cypher-shell] (https://dist.neo4j.org/cypher-shell/cypher-shell-4.4.19.zip)
7+
and [drivers] (https://neo4j.com/developer/language-guides/).
8+
9+
The following documentations will guide you through empowering the Neo4j ecosystem
10+
with GIE's distributed capability for large-scale graph.
11+
12+
```{toctree} arguments
13+
---
14+
caption: GIE For Tinkerpop Ecosystem
15+
maxdepth: 2
16+
---
17+
neo4j/cypher_sdk
18+
neo4j/supported_cypher
19+
```

docs/interactive_engine/supported_gremlin_steps.md renamed to docs/interactive_engine/tinkerpop/supported_gremlin_steps.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@
1717
3. [Aggregate(Group)](#aggregate-group)
1818
4. [Limitations](#limitations)
1919
## Introduction
20-
This documentation guides you how to work with the [gremlin](https://tinkerpop.apache.org/docs/current/reference) graph traversal language in GraphScope. On the one hand we retain the original syntax of most steps from the standard gremlin, on the other hand the usages of some steps are further extended to denote more complex situations in real-world scenarios.
20+
This documentation guides you how to work with the [Gremlin](https://tinkerpop.apache.org/docs/current/reference) graph traversal language in GraphScope. On the one hand we retain the original syntax of most steps from the standard Gremlin, on the other hand the usages of some steps are further extended to denote more complex situations in real-world scenarios.
2121
## Standard Steps
22-
We retain the original syntax of the following steps from the standard gremlin.
22+
We retain the original syntax of the following steps from the standard Gremlin.
2323
### Source
2424
#### [V()](https://tinkerpop.apache.org/docs/current/reference/#v-step)
2525
The V()-step is meant to iterate over all vertices from the graph. Moreover, `vertexIds` can be injected into the traversal to select a subset of vertices.
@@ -308,7 +308,7 @@ g.V().valueMap("name")
308308
g.V().valueMap("name", "age")
309309
```
310310
#### [values()](https://tinkerpop.apache.org/docs/current/reference/#values-step)
311-
The values()-step is meant to map the graph element to the values of the associated properties given the provide property keys. Here we just allow only one property key as the argument to the `values()` to implement the step as a map instead of a flat-map, which may be a little different from the standard gremlin.
311+
The values()-step is meant to map the graph element to the values of the associated properties given the provide property keys. Here we just allow only one property key as the argument to the `values()` to implement the step as a map instead of a flat-map, which may be a little different from the standard Gremlin.
312312
313313
Parameters: </br>
314314
propertyKey - the property to retrieve its value from.
@@ -504,7 +504,7 @@ g.V().union(out(), out().out())
504504
The match()-step provides a declarative form of graph patterns to match with. With match(), the user provides a collection of "sentences," called patterns, that have variables defined that must hold true throughout the duration of the match(). For most of the complex graph patterns, it is usually much easier to express via match() than with single-path traversals.
505505
506506
Parameters: </br>
507-
matchSentences - define a collection of patterns. Each pattern consists of a start tag, a serials of gremlin steps (binders) and an end tag.
507+
matchSentences - define a collection of patterns. Each pattern consists of a start tag, a serials of Gremlin steps (binders) and an end tag.
508508
509509
Supported binders within a pattern: </br>
510510
* Expand: in()/out()/both(), inE()/outE()/bothE(), inV()/outV()/otherV/bothV
@@ -709,7 +709,7 @@ gremlin> g.V().select(expr("@.name"))
709709
==>peter
710710
```
711711
### Aggregate (Group)
712-
The group()-step in standard gremlin has limited capabilities (i.e. grouping can only be performed based on a single key, and only one aggregate calculation can be applied in each group), which cannot be applied to the requirements of performing group calculations on multiple keys or values; Therefore, we further extend the capabilities of the group()-step, allowing multiple variables to be set and different aliases to be configured in key by()-step and value by()-step respectively.
712+
The group()-step in standard Gremlin has limited capabilities (i.e. grouping can only be performed based on a single key, and only one aggregate calculation can be applied in each group), which cannot be applied to the requirements of performing group calculations on multiple keys or values; Therefore, we further extend the capabilities of the group()-step, allowing multiple variables to be set and different aliases to be configured in key by()-step and value by()-step respectively.
713713
714714
Usages of the key by()-step:
715715
```bash

docs/interactive_engine/tinkerpop_gremlin.md renamed to docs/interactive_engine/tinkerpop/tinkerpop_gremlin.md

+12-5
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,22 @@
1-
# GIE For Gremlin
1+
# GIE for Gremlin
22
This document will provide you with step-by-step guidance on how to connect your gremlin applications to the GIE's
33
FrontEnd service, which offers functionalities similar to the official Tinkerpop service.
44

55
Your first step is to obtain the endpoint of GIE Frontend service:
66
- Follow the [instruction](./deployment.md#deploy-your-first-gie-service) while deploying GIE in a K8s cluster,
77
- Follow the [instruction](./dev_and_test.md#manually-start-the-gie-services) while starting GIE on a local machine.
88

9-
## Connecting Gremlin within Python
9+
## Connecting via Python SDK
1010

1111
GIE makes it easy to connect to a loaded graph with Tinkerpop's [Gremlin-Python](https://pypi.org/project/gremlinpython/).
1212

13+
You first install the dependency:
14+
```bash
15+
pip3 install gremlinpython
16+
```
17+
18+
Then connect to the service and run queries:
19+
1320
```Python
1421
import sys
1522
from gremlin_python import statics
@@ -61,7 +68,7 @@ resultIterationBatchSize: 64
6168

6269
```
6370

64-
## Connecting Gremlin within Java
71+
## Connecting via Java SDK
6572
See [Gremlin-Java](https://tinkerpop.apache.org/docs/current/reference/#gremlin-java) for connecting Gremlin
6673
within the Java language.
6774

@@ -81,7 +88,7 @@ client.close();
8188
cluster.close();
8289
```
8390

84-
## Gremlin Console
91+
## Connecting via Gremlin-Console
8592
1. Download Gremlin console and unpack to your local directory.
8693
```bash
8794
# if the given version (3.6.4) is not found, try to access https://dlcdn.apache.org to
@@ -91,7 +98,7 @@ cluster.close();
9198
cd apache-tinkerpop-gremlin-console-3.6.4
9299
```
93100

94-
2. In the directory of gremlin console, modify the `hosts` and `port` in `conf/remote.yaml` to the GIE Frontend Service endpoint, as
101+
2. In the directory of Gremlin console, modify the `hosts` and `port` in `conf/remote.yaml` to the GIE Frontend Service endpoint, as
95102
```bash
96103
hosts: [your_endpoint_address]
97104
port: [your_endpoint_port]

0 commit comments

Comments
 (0)