Skip to content

Commit

Permalink
Redesign the way to initialize a graph and revert remove (#199)
Browse files Browse the repository at this point in the history
  • Loading branch information
siyuan0322 authored Mar 26, 2021
1 parent 4f5db65 commit c357855
Show file tree
Hide file tree
Showing 50 changed files with 152 additions and 402 deletions.
2 changes: 1 addition & 1 deletion README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ GraphScope 以属性图(property graph)建模图数据。属性图中,点
请下载数据并将其解压缩到本地的挂载目录(在本例中为`〜/test_data`)。

```python
g = graphscope.Graph(sess)
g = sess.g()
g = (
g.add_vertices("/testingdata/ogbn_mag_small/paper.csv", label="paper")
.add_vertices("/testingdata/ogbn_mag_small/author.csv", label="author")
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ To load this graph to GraphScope, one may use the code below with the [data file


```python
g = graphscope.Graph(sess)
g = sess.g()
g = (
g.add_vertices("/testingdata/ogbn_mag_small/paper.csv", label="paper")
.add_vertices("/testingdata/ogbn_mag_small/author.csv", label="author")
Expand Down
2 changes: 1 addition & 1 deletion demo/node_classification_on_citation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
" Returns:\n",
" :class:`graphscope.Graph`: A Graph object which graph type is ArrowProperty\n",
" \"\"\"\n",
" graph = Graph(sess)\n",
" graph = sess.g()\n",
" graph = (\n",
" graph.add_vertices(os.path.join(prefix, \"paper.csv\"), \"paper\")\n",
" .add_vertices(os.path.join(prefix, \"author.csv\"), \"author\")\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/analytics_engine.rst
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ To run your own algorithms, you may trigger it in place where you defined it.
import graphscope
sess = graphscope.session()
g = graphscope.Graph(sess)
g = sess.g()
# load my algorithm
my_app = SSSP_Pregel()
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ To load this graph to GraphScope, one may use the code below.

.. code:: python
g = graphscope.Graph(sess)
g = sess.g()
g = (
g.add_vertices("paper.csv", label="paper")
.add_vertices("author.csv", label="author")
Expand Down
129 changes: 16 additions & 113 deletions docs/loading_graph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,27 @@ in which the edges/vertices are labeled and each label may have many properties.
Building a Graph
-------------------------

To load a property graph to GraphScope, we provide a class `Graph`, and several methods:
To load a property graph to GraphScope, we provide a method `g()` defined in `Session`.

First, we create a session, then a graph instance inside that session.

.. code:: python
def add_vertices(self, vertices, label="_", properties=[], vid_field=0):
pass
sess = graphscope.session()
graph = sess.g()
def add_edges(self, edges, label="_", properties=[], src_label=None, dst_label=None, src_field=0, dst_field=1):
pass
The class `Graph` has several methods:

def remove_vertices(self, label):
.. code:: python
def add_vertices(self, vertices, label="_", properties=[], vid_field=0):
pass
def remove_edges(self, label, src_label=None, dst_label=None):
def add_edges(self, edges, label="_", properties=[], src_label=None, dst_label=None, src_field=0, dst_field=1):
pass
These methods helps users to construct the schema of the property graph iteratively.

First, we create a session, then a graph instance inside that session.

.. code:: python
sess = graphscope.session()
graph = graphscope.Graph(sess)
We can add a kind of vertices to graph.

The parameters contain:
Expand Down Expand Up @@ -168,7 +163,7 @@ If there is only one vertex label in the graph, the label of vertices can be omi
GraphScope will infer the source and destination vertex label is that very label.

.. code:: python
graph = graphscope.Graph(sess)
graph = sess.g()
graph = graph.add_vertices("file:///home/admin/student.v", label="student")
graph = graph.add_edges("file:///home/admin/group.e", label="group")
# GraphScope will assign `src_label` and `dst_label` to `student` automatically.
Expand All @@ -183,7 +178,7 @@ It only serve the most simple cases.

.. code:: python
graph = graphscope.Graph(sess)
graph = sess.g()
graph.add_edges("file:///home/admin/group.e", label="group")
# After loaded, the graph will have an vertex label called `_`, and an edge label called `group`.
Expand All @@ -200,7 +195,7 @@ Let's make the example complete:
.. code:: python
sess = graphscope.session()
graph = graphscope.Graph(sess)
graph = sess.g()
graph = graph.add_vertices(
"/home/admin/student.v",
Expand Down Expand Up @@ -247,7 +242,7 @@ from pandas dataframes or numpy ndarrays.
# use a dataframe as datasource, properties omitted, col_0/col_1 will be used as src/dst by default.
# (for vertices, col_0 will be used as vertex_id by default)
graph = graphscope.Graph(sess).add_vertices(df_v).add_edges(df_e)
graph = sess.g().add_vertices(df_v).add_edges(df_e)
Or load from numpy ndarrays
Expand All @@ -259,7 +254,7 @@ Or load from numpy ndarrays
array_e = [df_e[col].values for col in ['leader_student_id', 'member_student_id', 'member_size']]
array_v = [df_v[col].values for col in ['student_id', 'lesson_nums', 'avg_score']]
graph = graphscope.Graph(sess).add_vertices(array_v).add_edges(array_e)
graph = sess.g().add_vertices(array_v).add_edges(array_e)
Graphs from Given Location
Expand Down Expand Up @@ -288,96 +283,4 @@ directly be passed to corresponding storage class. Like `host` and `port` to `HD
User can implement customized driver to support additional data sources. Take `ossfs <https://github.com/alibaba/libvineyard/blob/main/modules/io/adaptors/ossfs.py>`_ as an example, User need to subclass `AbstractFileSystem`, which
is used as resolve to specific protocol scheme, and `AbstractBufferFile` to do read and write.
The only methods user need to override is ``_upload_chunk``,
``_initiate_upload`` and ``_fetch_range``. In the end user need to use ``fsspec.register_implementation('protocol_name', 'protocol_file_system')`` to register corresponding resolver.


Understand the lazy evaluation of graph.
---------------------------------------

Graphs in GraphScope are not loaded until used.
When we say **used**, we means that anything related to the remote is touched, such as
the `key` of the graph, the `vineyard_id`, the complete schema with data types, or
applications is quering the query, etc.

When building graph iteratively, graph itself will store some basic schema, user are free to
inspect the basic schema without trigger the loading process by `print(graph)`.
Let's see an example:

.. code:: python
sess = graphscope.session()
graph = graphscope.Graph(sess)
graph = graph.add_vertices("/home/admin/student.v", "student")
graph = graph.add_edges( "file:///home/admin/group.e", "group", src_label="student", dst_label="student")
# This will not actually load the graph.
print(graph)
# But these will load the graph, cause more detailed information can only be known after loading.
print(graph.key)
print(graph.schema)
graphscope.sssp(graph, src=6)
# call `loaded` also will automatically load the graph.
assert graph.loaded() == True
Thanks to the lazy evaluation of graph loading, we can remove some vertices or edges before the actually loading,
but we cannot remove after the graph is loaded.

.. code:: python
sess = graphscope.session()
graph = graphscope.Graph(sess)
graph = graph.add_vertices("/home/admin/student.v", "student")
graph = graph.add_vertices( "/home/admin/teacher.v", "teacher")
graph = graph.add_edges("file:///home/admin/group.e", "group", src_label="student", dst_label="student")
graph = graph.add_edges("file:///home/admin/group_for_teacher_student.e", "group", src_label="teacher", dst_label="student")
# inspect the schema without loading
print(graph)
# the related edge must be removed before an vertex is removed.
# graph = graph.remove_vertices("teacher") # Error, cause some edges is rely on that vertex.
# src_label and dst_label is used to filter edges. When not specified, means remove the edge label entirely.
graph = graph.remove_edges("group", src_label="teacher", dst_label="student")
# Now we can remove the vertex
graph = graph.remove_vertices("teacher")
print(graph)
# Trigger the loading.
print(graph.key)
# Now the remove is forbidden.
# graph = graph.remove_edges("group")
But we can add more vertices and edges to a loaded graph.
The adding is also lazy evaluated, so we can even remove unprocessed vertices and edges.

.. code:: python
sess = graphscope.session()
graph = graphscope.Graph(sess)
graph = graph.add_vertices("/home/admin/student.v", "student")
graph = graph.add_edges("file:///home/admin/group.e", "group", src_label="student", dst_label="student")
print(graph.key) # trigger the loading
# Add more vertices and edges to a loaded graph.
graph = graph.add_vertices("/home/admin/teacher.v", "teacher")
graph = graph.add_edges("file:///home/admin/group_for_teacher_student.e", "group", src_label="teacher", dst_label="student")
print(graph) # does not trigger the loading.
# So we can remove unprocessed vertices or edges
graph = graph.remove_edges("group", src_label="teacher", dst_label="student")
graph = graph.remove_vertices("teacher")
# But cannot remove the labels that are in loaded graph.
# graph = graph.remove_edges("group", src_label="student", dst_label="student")
``_initiate_upload`` and ``_fetch_range``. In the end user need to use ``fsspec.register_implementation('protocol_name', 'protocol_file_system')`` to register corresponding resolver.
4 changes: 2 additions & 2 deletions docs/zh/analytics_engine.rst
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ GraphScope 图分析引擎内置了许多常用的图分析算法,包括连通
import graphscope
sess = graphscope.session()
g = graphscope.Graph(sess)
g = sess.g()
# 加载自己的算法
my_app = SSSP_Pregel()
Expand All @@ -301,7 +301,7 @@ GraphScope 图分析引擎内置了许多常用的图分析算法,包括连通
import graphscope
sess = graphscope.session()
g = graphscope.Graph(sess)
g = sess.g()
# 从gar包中加载自己的算法
my_app = load_app('SSSP_Pregel', 'file:///var/graphscope/udf/my_sssp_pregel.gar')
Expand Down
2 changes: 1 addition & 1 deletion docs/zh/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ GraphScope 以属性图(property graph)建模图数据。属性图中,点

.. code:: python
g = graphscope.Graph(sess)
g = sess.g()
g = (
g.add_vertices("paper.csv", label="paper")
.add_vertices("author.csv", label="author")
Expand Down
96 changes: 4 additions & 92 deletions docs/zh/loading_graph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ GraphScope 以
.. code:: python
sess = graphscope.session()
graph = graphscope.Graph(sess)
graph = sess.g()
我们可以向图内添加一个点标签。相关的参数含义如下:

Expand Down Expand Up @@ -165,7 +165,7 @@ GraphScope 以
GraphScope 将会推断起始点标签和终点标签为这一个点标签。

.. code:: python
graph = graphscope.Graph(sess)
graph = sess.g()
graph = graph.add_vertices("file:///home/admin/student.v", label="student")
graph = graph.add_edges("file:///home/admin/group.e", label="group")
# GraphScope 会将 `src_label` 和 `dst_label` 自动赋值为 `student`.
Expand All @@ -176,7 +176,7 @@ GraphScope 将会推断起始点标签和终点标签为这一个点标签。

.. code:: python
graph = graphscope.Graph(sess)
graph = sess.g()
graph.add_edges("file:///home/admin/group.e", label="group")
# 载图后,图中将会包含一个点标签,名为 `_`, 和一个边标签,名为 `group`.
Expand All @@ -192,7 +192,7 @@ GraphScope 将会推断起始点标签和终点标签为这一个点标签。
.. code:: python
sess = graphscope.session()
graph = graphscope.Graph(sess)
graph = sess.g()
graph = graph.add_vertices(
"/home/admin/student.v",
Expand Down Expand Up @@ -274,91 +274,3 @@ GraphScope 将会推断起始点标签和终点标签为这一个点标签。
用户可以方便的实现自己的driver来支持更多的数据源,比如参照 `ossfs <https://github.com/alibaba/libvineyard/blob/main/modules/io/adaptors/ossfs.py>`_ driver的实现方式。
用户需要继承 `AbstractFileSystem` 类用来做scheme对应的resolver, 以及 `AbstractBufferedFile`。用户仅需要实现 ``_upload_chunk``,
``_initiate_upload`` and ``_fetch_range`` 这几个方法就可以实现基本的read,write功能。最后通过 ``fsspec.register_implementation('protocol_name', 'protocol_file_system')`` 注册自定义的resolver。


理解惰性载图
----------

GraphScope 中的图直到被使用时才会被真正载入。
**被使用** 指任何涉及到远端的东西被用到时,比如图的 `key`, `vineyard_id`,完整的带有数据类型的图的定义,或者
是有应用在图上查询,等等。

当迭代式地建图时,图内部会存储一些基本的点标签,边标签信息,可以通过 `print(graph)` 来查看这些信息并不会触发载图过程。
来看一个例子

.. code:: python
sess = graphscope.session()
graph = graphscope.Graph(sess)
graph = graph.add_vertices("/home/admin/student.v", "student")
graph = graph.add_edges( "file:///home/admin/group.e", "group", src_label="student", dst_label="student")
# 这里并不会真正载入图
print(graph)
# 这一步将会触发载图,因为有些信息只有在载入图后才能获得
print(graph.key)
print(graph.schema)
graphscope.sssp(graph, src=6)
# 调用 `loaded()` 也会自动触发载图
assert graph.loaded() == True
得益于惰性的载图,我们可以在真正载图前去除一些点或边标签,以解决偶尔写错的情况。
但是当图已经被载入后,便不可以去除。

.. code:: python
sess = graphscope.session()
graph = graphscope.Graph(sess)
graph = graph.add_vertices("/home/admin/student.v", "student")
graph = graph.add_vertices( "/home/admin/teacher.v", "teacher")
graph = graph.add_edges("file:///home/admin/group.e", "group", src_label="student", dst_label="student")
graph = graph.add_edges("file:///home/admin/group_for_teacher_student.e", "group", src_label="teacher", dst_label="student")
# 查看基本的schema,不触发载图
print(graph)
# 不可以去除尚有被边引用的点
# graph = graph.remove_vertices("teacher") # 错误。存在边的起点或终点为这个点
# src_label 和 dst_label 可以被用来过滤边. 若没有指定,便去除整个边标签
graph = graph.remove_edges("group", src_label="teacher", dst_label="student")
# 现在我们可以去除点标签 `teacher`
graph = graph.remove_vertices("teacher")
print(graph)
# 触发载图
print(graph.key)
# 现在不可以再去除边
# graph = graph.remove_edges("group")
然而,我们可以再为已经载入的图添加点或边。
这一步仍然是惰性的,所以我们可以去除尚未被载入的点或边。

.. code:: python
sess = graphscope.session()
graph = graphscope.Graph(sess)
graph = graph.add_vertices("/home/admin/student.v", "student")
graph = graph.add_edges("file:///home/admin/group.e", "group", src_label="student", dst_label="student")
print(graph.key) # 触发载图
# 为载入的图加入更多点和边
graph = graph.add_vertices("/home/admin/teacher.v", "teacher")
graph = graph.add_edges("file:///home/admin/group_for_teacher_student.e", "group", src_label="teacher", dst_label="student")
print(graph) # 不触发载图
# 可以去除掉尚未被载入的点或边
graph = graph.remove_edges("group", src_label="teacher", dst_label="student")
graph = graph.remove_vertices("teacher")
# 但是不能去除原图中被载入的点或边
# graph = graph.remove_edges("group", src_label="student", dst_label="student")
Loading

0 comments on commit c357855

Please sign in to comment.