Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
siyuan0322 committed Mar 31, 2021
1 parent 8cab00d commit 1f06870
Show file tree
Hide file tree
Showing 6 changed files with 288 additions and 1 deletion.
149 changes: 149 additions & 0 deletions docs/graph_transformation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
.. _graph_transformation:

Graph Transformations
=====================

We introduce a series of method that can append more labels to a existed grpah, and
do projection over existed graph. We will also show how to make a complex property graph
compatible with algorithms that can only run on simple graph. Finally, we show how to add
the query result of algorithm back to graph as a property on vertex.

More specically, :class:`Graph` provides two methods for append labels, and one method for
projection.


.. code:: python
def add_vertices(self, vertices, label="_", properties=[], vid_field=0):
pass
def add_edges(self, edges, label="_", properties=[], src_label=None, dst_label=None, src_field=0, dst_field=1):
pass
def project(self, vertices, edges):
pass
We have already seem `add_vertices` and `add_edges` in :ref:`loading graphs<loading_graphs>`, we use them
to build a graph iteratively.

Further, we can use them to attach more vertex labels and edge labels to a existed graph.
But this won't modify the source graph, instead, it will return a new graph, which is based
on the source graph.


Attach new labels
-----------------

Take LDBC-SNB Property Graph as an example,We now load a subset of labels, as the source graph.


.. code:: python
import graphscope
from pathlib import Path
from graphscope.framework.loader import Loader
sess = graphscope.session()
graph = sess.g(directed=directed)
graph = graph.add_vertices(Loader("person_0_0.csv", delimiter="|"), "person")
graph = graph.add_edges(Loader("person_knows_person_0_0.csv", delimiter="|"),
"knows", src_label="person", dst_label="person"
)
# graph has 1 vertex label "person"
print(graph.schema)
Now we have an loaded graph, let's attach some new labels to it.

.. code:: python
graph1 = graph.add_vertices(Loader("comment_0_0.csv", delimiter="|"), "comment")
# Now graph1 has 2 vertex labels "person" and "comment"
print(graph1.schema)
graph2 = graph1.add_edges(Loader("comment_replyOf_comment_0_0.csv", delimiter="|"),
"replyOf", src_label="comment", dst_label="comment"
)
# graph2 has 2 edge labels "knows" and "replyOf"
print(graph2.schema)
We can see each operation of `add` will produce a new graph.
In implementation detail, their common labels will share the common memory, so it won't
copy the source graph.


Projection
----------

In some scenario, we need to extract a subgraph from a complex graph. We do that by `project`.


.. code:: python
def project(
self,
vertices: Mapping[str, Union[List[str], None]],
edges: Union[Mapping[str, Union[List[str], None]], None]
):
pass
The parameter definition means it's a `dict`, the key is the label name, the value is a `list` of `str`, which is the name of properties. Specifically, if the value is `None`, it means select all properties.

A graph that produced by `project` should just like a normal property graph, and can be projected further.

Here's some examples.

.. code:: python
sub_graph = graph2.project(vertices={"person": ["firstName", "lastName"]}, edges={"knows": None})
# contains 1 vertex label "person", and 1 edge label "knows", with selected properties.
print(sub_graph.schema)
sub_graph2 = sub_graph.project(vertices={"person": []}, edges={"knows": ["creationDate"]})
# No properties on the vertex, and 1 property on the edge.
print(sub_graph2.schema)
Transform to simple graph implicitly
------------------------------------

When an algorithm that only works on simple graph query a property graph, the property graph will
be converted to a simple graph implicitly. If such transformation cannot be performed (Graph has more than 1 vertex
label or edge label, or has more than 1 property), an exception will be raised.

.. code:: python
from graphscope import wcc
ret = wcc(sub_graph2)
# wcc(graph2) # Error! More than 1 vertex label / edge label
# wcc(sub_graph) # Error! More than 1 property.
Add results back to graph as a property
---------------------------------------

The result `ret` produced in previous step can be add to a graph as a property of vertex.

Note the result can not only be added to the graph it directly queried on, but also the graph which produced
the queried graph by `project`, as long as the vertex label that will be mutated is the same between the two graphs.

.. code:: python
new_graph = sub_graph2.add_column(ret, selector={'cc': 'r'})
new_graph = sub_graph.add_column(ret, selector={'cc': 'r'})
new_graph = graph.add_column(ret, selector={'cc': 'r'})
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
contain the root `toctree` directive.
GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba
==================================================
======================================================================

GraphScope is a unified distributed graph computing platform
that provides a one-stop environment for performing diverse graph
Expand All @@ -25,6 +25,7 @@ and the vineyard store that offers efficient in-memory data transfers.
tutorials
deployment
loading_graph
graph_transformation
interactive_engine
analytics_engine
learning_engine
Expand Down
1 change: 1 addition & 0 deletions docs/loading_graph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,7 @@ If there is only one vertex label in the graph, the label of vertices can be omi
GraphScope will infer the source and destination vertex label is that very label.

.. code:: python
graph = sess.g()
graph = graph.add_vertices("file:///home/admin/student.v", label="student")
graph = graph.add_edges("file:///home/admin/group.e", label="group")
Expand Down
134 changes: 134 additions & 0 deletions docs/zh/graph_transformation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
.. _graph_transformation:


图的变换操作
=========================

我们将介绍一系列可以在图上进行新增/投影的方法,以及如何将一个复杂的图转换为可以适配普通算法应用的方法。
最后,我们展示如何将算法得到的结果加回到图中去。

具体而言,图 :class:`Graph` 提供了两个增加标签的函数, 和一个投影的函数。

.. code:: python
def add_vertices(self, vertices, label="_", properties=[], vid_field=0):
pass
def add_edges(self, edges, label="_", properties=[], src_label=None, dst_label=None, src_field=0, dst_field=1):
pass
def project(self, vertices, edges):
pass
其中,我们已经在 :ref:`载图<loading_graphs>` 一节见到过 `add_vertices` 和 `add_edges` 这两个函数,当时我们用它来构建一张图。
进一步的,当图构建好并载入了 Vineyard 中之后,我们仍然可以用其增加更多的标签。当然这一步并不会在原图上修改,而是会返回基于原图之上,
增加了新的标签的新图。


添加新的标签
----------------

以 LDBC-SNB 属性图为例,我们现在载入其中一部分标签,作为接下来一系列转换操作的起始图。

.. code:: python
import graphscope
from pathlib import Path
from graphscope.framework.loader import Loader
sess = graphscope.session()
graph = sess.g(directed=directed)
graph = graph.add_vertices(Loader("person_0_0.csv", delimiter="|"), "person")
graph = graph.add_edges(Loader("person_knows_person_0_0.csv", delimiter="|"),
"knows", src_label="person", dst_label="person"
)
# graph has 1 vertex label "person"
print(graph.schema)
到这里, 我们已经载入了一张图。接下来我们在这张图上再添加几个标签。

.. code:: python
graph1 = graph.add_vertices(Loader("comment_0_0.csv", delimiter="|"), "comment")
# Now graph1 has 2 vertex labels "person" and "comment"
print(graph1.schema)
graph2 = graph1.add_edges(Loader("comment_replyOf_comment_0_0.csv", delimiter="|"),
"replyOf", src_label="comment", dst_label="comment"
)
# graph2 has 2 edge labels "knows" and "replyOf"
print(graph2.schema)
可以看到每次 `add` 都会产生一张新的图,在底层,他们共有的部分会指向同一块内存,所以并不会将原图的数据复制一份。


投影
-------

在某些场景下,我们需要将从一张复杂的图提取出一个子图。这个操作可以借助 `project` 来完成。

.. code:: python
def project(
self,
vertices: Mapping[str, Union[List[str], None]],
edges: Union[Mapping[str, Union[List[str], None]], None]
):
pass
`project` 包含两个参数 `vertices` 和 `edges`,其值为一个字典,字典的键是标签名,值是要取的属性的列表。值可以为 None,
代表选择所有的属性。

`project` 的返回值也是一个属性图,并且可以被进一步 `project`。
以下是几个例子。

.. code:: python
sub_graph = graph2.project(vertices={"person": ["firstName", "lastName"]}, edges={"knows": None})
# 包含一个点标签 "person" 和一个边标签 "knows", 以及所选择的属性。
print(sub_graph.schema)
sub_graph2 = sub_graph.project(vertices={"person": []}, edges={"knows": ["creationDate"]})
# 现在点上没有属性,边上有一个属性
print(sub_graph2.schema)
自动转换为简单图
--------------------------

当执行一个仅可以跑在简单图上的算法时,其会默认将其参数中的属性图转换为简单图,如果不能进行这种转换(即多于一个点标签和一个边标签,或多于一个属性),
那么就会报错。

.. code:: python
from graphscope import wcc
ret = wcc(sub_graph2)
# wcc(graph2) # 错误! 转换不合法,多于一个点/边标签
# wcc(sub_graph) # 错误!转换不合法,多于一个属性
将计算结果作为新的属性加入图中
------------------------------------------------

上一步算法的运行结果可以被加入一张图中, 作为点的一个属性。

不仅可以加入运算结果到直接被查询的图上,还可以将这个查询结果加到被 `project` 而得到被查询的图上,只要被加入属性的点标签相同。

.. code:: python
new_graph = sub_graph2.add_column(ret, selector={'cc': 'r'})
new_graph = sub_graph.add_column(ret, selector={'cc': 'r'})
new_graph = graph.add_column(ret, selector={'cc': 'r'})
1 change: 1 addition & 0 deletions docs/zh/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ GraphScope 的交互查询引擎的论文已被 NSDI 2021录用。
deployment
tutorials
loading_graph
graph_transformation
interactive_engine
analytics_engine
learning_engine
Expand Down
1 change: 1 addition & 0 deletions docs/zh/loading_graph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ GraphScope 以
GraphScope 将会推断起始点标签和终点标签为这一个点标签。

.. code:: python
graph = sess.g()
graph = graph.add_vertices("file:///home/admin/student.v", label="student")
graph = graph.add_edges("file:///home/admin/group.e", label="group")
Expand Down

0 comments on commit 1f06870

Please sign in to comment.