Skip to content

Commit

Permalink
Merge branch 'main' into 14-contributing-add-release
Browse files Browse the repository at this point in the history
  • Loading branch information
acezen authored Jan 1, 2023
2 parents 081665f + d512a33 commit 6124896
Show file tree
Hide file tree
Showing 28 changed files with 1,283 additions and 41 deletions.
2 changes: 1 addition & 1 deletion docs/api-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ Id Type
Data Type
~~~~~~~~~~~~~~~~~~~

.. doxygenstruct:: GraphArchive::DataType
.. doxygenclass:: GraphArchive::DataType
:members:
:undoc-members:

Expand Down
Binary file modified docs/images/edge_physical_table1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/edge_physical_table2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/vertex_physical_table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions docs/user-guide/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ GAR Data Files
Property data
`````````````
The vertex properties are stored in vertex property chunks with the chunk size specified by the vertex information file. Different property groups correspond to individual groups of data files.
In our example, the property group ("first name", "last name", "gender") for vertex chunk 0 of "person" vertices are stored in `./vertex/person/firstName_lastName_gender/part0`_.
In our example, the property group ("first name", "last name", "gender") for vertex chunk 0 of "person" vertices are stored in `./vertex/person/firstName_lastName_gender/chunk0`_.

In practice of graph processing, it is common to only query a subset of columns of the properties. Thus, the column-oriented formats like Apache ORC and Apache Parquet are more efficient, which eliminate the need to read columns that are not relevant. We also provide data files in ORC and Parquet for the example graph in the `test data`_.

Expand All @@ -63,7 +63,7 @@ For example, the file `./edge/person_knows_person/ordered_by_source/adj_list/par

.. note::

If the edges are ordered, there may be offset chunks to construct the index for accessing edges of a single vertex. It stores the start offset for each vertex's edges, see `./edge/person_knows_person/ordered_by_source/offset/part0`_ as an example.
If the edges are ordered, there may be offset chunks to construct the index for accessing edges of a single vertex. It stores the start offset for each vertex's edges, see `./edge/person_knows_person/ordered_by_source/offset/chunk0`_ as an example.


How to Use GAR
Expand Down Expand Up @@ -184,15 +184,15 @@ Please refer to `more examples <../applications/out-of-core.html>`_ for learning

.. _person_knows_person.edge.yml: https://github.com/GraphScope/gar-test/blob/main/ldbc_sample/csv/person_knows_person.edge.yml

.. _./vertex/person/firstName_lastName_gender/part0: https://github.com/GraphScope/gar-test/blob/main/ldbc_sample/csv/vertex/person/firstName_lastName_gender/part0
.. _./vertex/person/firstName_lastName_gender/chunk0: https://github.com/GraphScope/gar-test/blob/main/ldbc_sample/csv/vertex/person/firstName_lastName_gender/chunk0

.. _test data: https://github.com/GraphScope/gar-test/blob/main/ldbc_sample/

.. _./edge/person_knows_person/ordered_by_source/creationDate/part0/chunk0: https://github.com/GraphScope/gar-test/blob/main/ldbc_sample/csv/edge/person_knows_person/ordered_by_source/creationDate/part0/chunk0

.. _./edge/person_knows_person/ordered_by_source/adj_list/part0/chunk0: https://github.com/GraphScope/gar-test/blob/main/ldbc_sample/csv/edge/person_knows_person/ordered_by_source/adj_list/part0/chunk0

.. _./edge/person_knows_person/ordered_by_source/offset/part0: https://github.com/GraphScope/gar-test/blob/main/ldbc_sample/csv/edge/person_knows_person/ordered_by_source/offset/part0
.. _./edge/person_knows_person/ordered_by_source/offset/chunk0: https://github.com/GraphScope/gar-test/blob/main/ldbc_sample/csv/edge/person_knows_person/ordered_by_source/offset/chunk0

.. _example program: https://github.com/alibaba/GraphAr/blob/main/examples/construct_info_example.cc

Expand Down
2 changes: 1 addition & 1 deletion examples/construct_info_example.cc
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ int main(int argc, char* argv[]) {
assert(!vertex_info.IsPrimaryKey(gender.name).status().ok());
assert(vertex_info.GetPropertyType(id.name).value() == id.type);
assert(vertex_info.GetFilePath(group1, 0).value() ==
"vertex/person/id/part0/chunk0");
"vertex/person/id/chunk0");

// extend property groups & validate
auto result = vertex_info.Extend(group2);
Expand Down
8 changes: 4 additions & 4 deletions include/gar/graph_info.h
Original file line number Diff line number Diff line change
Expand Up @@ -254,8 +254,8 @@ class VertexInfo {
return Status::KeyError(
"Vertex info does not contain the property group.");
}
return prefix_ + property_group.GetPrefix() + "part" +
std::to_string(chunk_index) + "/" + "chunk0";
return prefix_ + property_group.GetPrefix() + "chunk" +
std::to_string(chunk_index);
}

/// Get the chunk files directory path of property group
Expand Down Expand Up @@ -561,8 +561,8 @@ class EdgeInfo {
if (!ContainAdjList(adj_list_type)) {
return Status::KeyError("The adj list type is not found in edge info.");
}
return prefix_ + adj_list2prefix_.at(adj_list_type) + "offset/part" +
std::to_string(vertex_chunk_index) + "/" + "chunk0";
return prefix_ + adj_list2prefix_.at(adj_list_type) + "offset/chunk" +
std::to_string(vertex_chunk_index);
}

/// Get the adj list offset chunk file directory path of adj list type
Expand Down
2 changes: 1 addition & 1 deletion requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
breathe
docutils==0.16
docutils
furo # sphinx theme
nbsphinx
sphinx>=3.0.2
Expand Down
72 changes: 71 additions & 1 deletion spark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,11 @@
<scala.binary.version>2.12</scala.binary.version>
<PermGen>512m</PermGen>
<MaxPermGen>1024m</MaxPermGen>
<spark.version>3.1.1</spark.version>
<spark.version>3.2.0</spark.version>
<maven.compiler.release>8</maven.compiler.release>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<cupid.sdk.version>3.3.8-public</cupid.sdk.version>
</properties>
<dependencies>
<dependency>
Expand Down Expand Up @@ -68,6 +69,34 @@
<artifactId>snakeyaml</artifactId>
<version>1.26</version>
</dependency>
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>hadoop-fs-oss</artifactId>
<version>${cupid.sdk.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-spark-datasource_2.11</artifactId>
<version>${cupid.sdk.version}</version>
<exclusions>
<exclusion>
<groupId>net.jpountz.lz4</groupId>
<artifactId>lz4</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>cupid-sdk</artifactId>
<version>${cupid.sdk.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<plugins>
Expand Down Expand Up @@ -119,6 +148,47 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<minimizeJar>false</minimizeJar>
<shadedArtifactAttached>true</shadedArtifactAttached>
<artifactSet>
<includes>
<!-- Include here the dependencies you
want to be packed in your fat jar -->
<include>*:*</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>**/log4j.properties</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
<packaging>jar</packaging>
Expand Down
15 changes: 15 additions & 0 deletions spark/src/main/java/com/alibaba/graphar/GeneralParams.java
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
/** Copyright 2022 Alibaba Group Holding Limited.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.alibaba.graphar;

public class GeneralParams {
Expand Down
38 changes: 36 additions & 2 deletions spark/src/main/scala/com/alibaba/graphar/EdgeInfo.scala
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
/** Copyright 2022 Alibaba Group Holding Limited.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.alibaba.graphar

import java.io.{File, FileInputStream}
Expand Down Expand Up @@ -210,8 +225,8 @@ class EdgeInfo() {
def getAdjListOffsetFilePath(chunk_index: Long, adj_list_type: AdjListType.Value) : String = {
if (containAdjList(adj_list_type) == false)
throw new IllegalArgumentException
val str: String = prefix + getAdjListPrefix(adj_list_type) + "offset/part" +
chunk_index.toString() + "/chunk0"
val str: String = prefix + getAdjListPrefix(adj_list_type) + "offset/chunk" +
chunk_index.toString()
return str
}

Expand Down Expand Up @@ -256,6 +271,25 @@ class EdgeInfo() {
return str
}

def getPropertyFilePath(property_group: PropertyGroup, adj_list_type: AdjListType.Value, vertex_chunk_index: Long) : String = {
if (containPropertyGroup(property_group, adj_list_type) == false)
throw new IllegalArgumentException
var str: String = property_group.getPrefix
if (str == "") {
val properties = property_group.getProperties
val num = properties.size
for ( j <- 0 to num - 1 ) {
if (j > 0)
str += GeneralParams.regularSeperator
str += properties.get(j).getName;
}
str += "/"
}
str = prefix + getAdjListPrefix(adj_list_type) + str + "part" +
vertex_chunk_index.toString() + "/"
return str
}

def getPropertyDirPath(property_group: PropertyGroup, adj_list_type: AdjListType.Value) : String = {
if (containPropertyGroup(property_group, adj_list_type) == false)
throw new IllegalArgumentException
Expand Down
15 changes: 15 additions & 0 deletions spark/src/main/scala/com/alibaba/graphar/GraphInfo.scala
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
/** Copyright 2022 Alibaba Group Holding Limited.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.alibaba.graphar

import java.io.{File, FileInputStream}
Expand Down
18 changes: 17 additions & 1 deletion spark/src/main/scala/com/alibaba/graphar/VertexInfo.scala
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
/** Copyright 2022 Alibaba Group Holding Limited.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.alibaba.graphar

import java.io.{File, FileInputStream}
Expand Down Expand Up @@ -133,7 +148,7 @@ class VertexInfo() {
} else {
str = property_group.getPrefix
}
return prefix + str + "part" + chunk_index.toString() + "/chunk0"
return prefix + str + "chunk" + chunk_index.toString()
}

def getDirPath(property_group: PropertyGroup): String = {
Expand All @@ -148,6 +163,7 @@ class VertexInfo() {
str += GeneralParams.regularSeperator
str += properties.get(j).getName;
}
str += "/"
} else {
str = property_group.getPrefix
}
Expand Down
Loading

0 comments on commit 6124896

Please sign in to comment.