|
1 |
| -# GAIA |
| 1 | +# Overview |
| 2 | +GAIA (GrAph Interactive Analytics) is a full-fledged system for large-scale interactive graph analytics in the distributed context. |
| 3 | +GAIA is based on the Tinkerpop's Gremlin query language (https://tinkerpop.apache.org/). Given a Gremlin query, Gaia |
| 4 | +will compile it into a dataflow with the help of the powerful Scope abstraction, and then schedule the computation in |
| 5 | +a distributed runtime. |
| 6 | + |
| 7 | +GAIA has been deployed at [Alibaba Corporation](https://www.alibaba.com/) to support a wide range of businesses from |
| 8 | +e-commerce to cybersecurity. This repository contains three main components of its architecture: |
| 9 | +* GAIA compiler: As a main technical contribution of GAIA, we propose a powerful abstraction |
| 10 | + called *Scope* in order to hide the complex control flow (e.g. conditional and loop) and fine-grained dependency in |
| 11 | + a Gremlin query from the dataflow engine. Taking a Gremlin query as input, the GAIA compiler is responsible for |
| 12 | + compiling it to a dataflow (with Scope abstraction) in order to be executed in the dataflow engine. The compiler |
| 13 | + is built on top of the [Gremlin server](http://tinkerpop.apache.org/docs/3.4.3/reference/##connecting-gremlin-server) |
| 14 | + interface so that the system can seamlessly interact with the TinkerPop ecosystem, including development tools |
| 15 | + such as [Gremlin Console](http://tinkerpop.apache.org/docs/3.4.3/reference/##gremlin-console) |
| 16 | + and language wrappers such as Java and Python. |
| 17 | +* Distributed runtime: The GAIA execution runtime provides automatic support for efficient execution of Gremlin |
| 18 | + queries at scale. Each query is compiled by the GAIA compiler into a distributed execution plan that is |
| 19 | + partitioned across multiple compute nodes for parallel execution. Each partition runs on a separate compute node, |
| 20 | + managed by a local executor, that schedules and executes computation on a multi-core server. |
| 21 | +* Distributed graph store: The storage layer maintains an input graph that is hash-partitioned across a cluster, |
| 22 | + with each vertex being placed together with its adjacent (both incoming and outgoing) edges and their attributes. |
| 23 | + Here we assume that the storage is coupled with the execution runtime for simplicity, that is each |
| 24 | + local executor holds a separate graph partition. In production, more functionalities of storage have been developed, |
| 25 | + including snapshot isolation, fault tolerance and extensible apis for cloud storage services, while they are |
| 26 | + excluded from the open-sourced stack for conflict of interest. |
| 27 | + |
| 28 | +# Preparement |
| 29 | +## Dependencies |
| 30 | +GAIA builds, runs, and has been tested on GNU/Linux (more specifically Centos 7). |
| 31 | +Even though GAIA may build on systems similar to Linux, we have not tested correctness or performance, |
| 32 | +so please beware. |
| 33 | + |
| 34 | +At the minimum, Galois depends on the following software: |
| 35 | +* [Rust](https://www.rust-lang.org/) (>= 1.49): GAIA currently works on Rust 1.49, but we suppose that it also works |
| 36 | + for any later version. |
| 37 | +* Java (jdk 8): Due to a known issue of gRPC that uses an older version of java annotation apis, the project is |
| 38 | + subject to jdk 8 for now. |
| 39 | +* Protobuf (3.0): The rust codegen is powered by [prost](https://github.com/danburkert/prost). |
| 40 | +* gRPC: gRPC is used for communication between Rust (engine) and Java (Gremlin server/client). The Rust |
| 41 | +implementation is powered by [tonic](https://github.com/hyperium/tonic) |
| 42 | +* Other Rust and Java dependencies, check |
| 43 | + * `./gremlin/compiler/pom.xml` |
| 44 | + * `./gremlin/gremlin_core/Cargo.toml` |
| 45 | + * `./graph_store/Cargo.toml` |
| 46 | + * `./pegasus/Cargo.toml` |
| 47 | + |
| 48 | +## Building codes |
| 49 | +TODO |
| 50 | + |
| 51 | +## Generate Graph Data |
| 52 | +Please refer to `./graph_store/README.rd` for details. |
| 53 | + |
| 54 | +# Deployment |
| 55 | +## Deploy GAIA services |
| 56 | +TODO |
| 57 | +### Single-machine Deployment |
| 58 | +### Distributed Deployment |
| 59 | +## Start Gremlin Server |
| 60 | +After successfully building the codes, you can find `gremlin-server-plugin-1.0-SNAPSHOT-jar-with-dependencies.jar` in |
| 61 | +`./gremlin/compiler/gremlin-server-plugin/target`, copy it to wherever you want to start the server |
| 62 | +``` |
| 63 | +cp ./gremlin/compiler/gremlin-server-plugin/target/gremlin-server-plugin-1.0-SNAPSHOT-jar-with-dependencies.jar /path/to/your/dir |
| 64 | +cp -r ./gremlin/compiler/conf /path/to/your/dir |
| 65 | +cd /path/to/your/dir |
| 66 | +``` |
| 67 | + |
| 68 | +There are some configurations to make in `./conf`: |
| 69 | +* Gremlin server address and port: TODO |
| 70 | +* The graph storage schema: For your reference, we've provided the schema file |
| 71 | +`./conf/modern.schema.json` for [Tinkerpop's modern graph](https://tinkerpop.apache.org/docs/current/tutorials/getting-started/), |
| 72 | +and `./conf/ldbc.schema.json` for [LDBC generated data](https://github.com/ldbc/ldbc_snb_datagen). |
| 73 | +TODO: How to customize the schema |
| 74 | + |
| 75 | +Then start up the Gremlin server using |
| 76 | +``` |
| 77 | +java -cp .:gremlin-server-plugin-1.0-SNAPSHOT-jar-with-dependencies.jar com.compiler.demo.server.GremlinServiceMain |
| 78 | +``` |
| 79 | + |
| 80 | +## Run Query |
| 81 | +- Download TinkerPop's official [gremlin-console](https://archive.apache.org/dist/tinkerpop/3.4.9/apache-tinkerpop-gremlin-console-3.4.9-bin.zip) |
| 82 | +- cd `path/to/gremlin/console`, modify `conf/remote.yaml` |
| 83 | + ``` |
| 84 | + hosts: [localhost] # TODO: The hosts and port should align to the above server configuration? |
| 85 | + port: 8182 |
| 86 | + serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }} |
| 87 | + ``` |
| 88 | +- Console startup |
| 89 | + ``` |
| 90 | + ./bin/gremlin.sh |
| 91 | + :remote connect tinkerpop.server conf/remote.yaml |
| 92 | + :remote console |
| 93 | + ``` |
| 94 | +- Submit query in console. Have fun!! |
| 95 | + |
| 96 | +# Contact |
| 97 | +TODO |
| 98 | + |
| 99 | +# Acknowledge |
| 100 | +TODO |
| 101 | + |
| 102 | +# Publications |
| 103 | +1. GAIA: A System for Interactive Analysis on Distributed Graphs Using a High-Level Language, Zhengping Qian, |
| 104 | +Chenqiang Min, Longbin Lai, Yong Fang, Gaofeng Li, Youyang Yao, Bingqing Lyu, Xiaoli Zhou, Zhimin Chen, Jingren Zhou, |
| 105 | + 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2021), to appear. |
0 commit comments