diff --git a/README.rst b/README.rst index da3816357..05b2ea2a4 100644 --- a/README.rst +++ b/README.rst @@ -3,53 +3,41 @@ GraphAr |GraphAr CI| |Docs CI| |GraphAr Docs| -GraphAr (short for "Graph Archive") is an open source, standard data file format with C++ SDK and Spark tools for graph data storage and retrieval. - -The GraphAr project includes such modules as: - -- The design of the standardized file format (GAR) for graph data. -- A C++ Library for reading and writing GAR files. -- Apache Spark tools for generating, loading and transforming GAR files (coming soon). -- Examples of applying GraphAr to graph processing applications or existing systems such as GraphScope. - - +Welcome to GraphAr (short for "Graph Archive"), an open source, standardized file format for graph data storage and retrieval. +What is GraphAr? +----------------- |Overview Pic| +Graph processing serves as the essential building block for a diverse variety of +real-world applications such as social network analytics, data mining, network routing, +and scientific computing. -Motivation ----------- - -Graph processing serves as the essential building block for a diverse variety of real-world applications such as social network analytics, data mining, network routing, and scientific computing. - -GraphAr (GAR) is established to enable diverse graph applications and systems (in-memory and out-of-core storages, databases, graph computing systems and interactive graph query frameworks) to build and access the graph data conveniently and efficiently. It specifies a standardized system-independent file format for graph and provides a set of interfaces to generate and access such formatted files. - -GraphAr (GAR) targets two main scenarios: - -- To serve as the standard file format for importing/exporting and persistent storage of the graph data for diverse existing systems, reducing the overhead when various systems co-work. -- To serve as the direct data source for graph processing applications. - - -What's in GraphAr ---------------------- +GraphAr is a project that aims to make it easier for diverse applications and +systems (in-memory and out-of-core storages, databases, graph computing systems, and interactive graph query frameworks) +to build and access graph data conveniently and efficiently. -The **GAR** file format that defines a standard store file format for graph data. +It can be used for importing/exporting and persistent storage of graph data, +thereby reducing the burden on systems when working together. Additionally, it can +serve as a direct data source for graph processing applications. -The **GAR SDK** library that contains a C++ library to provide APIs for accessing and generating the GAR format files. +To achieve this, GraphAr provides: +- The Graph Archive(GAR) file format: a standardized system-independent file format for storing graph data +- Libraries: a set of libraries for reading and writing or transforming GAR files -GraphAr File Format ---------------------- +By using GraphAr, you can: -GraphAr specifies a standardized system-independent file format (GAR) for storing property graphs. -It uses metadata to record all the necessary information of a graph, and maintains the actual data -in a chunked way. +- Store and persist your graph data in a system-independent way with the GAR file format +- Easily access and generate GAR files using the libraries +- Use the Apache Spark library to quickly manipulate and transform your GAR files -What is Property Graph -^^^^^^^^^^^^^^^^^^^^^^^ - -GraphAr is designed for representing and storing the property graphs. Graph (in discrete mathematics) is a structure made of vertices and edges. Property graph is then a type of graph model where the vertices/edges could carry a name (also called as type or label) and some properties. Since carrying additional information than non-property graphs, the property graph is able to represent connections among data scattered across diverse data databases and with different schemas. Compared with the relational database schema, the property graph excels at showing data dependencies. Therefore, it is widely-used in modeling modern applications including social network analytics, data mining, network routing, scientific computing and so on. +The GAR File Format +------------------- +The GAR file format is designed for storing property graphs. It uses metadata to +record all the necessary information of a graph, and maintains the actual data in +a chunked way. A property graph includes vertices and edges. Each vertex contains: @@ -70,7 +58,7 @@ The following is an example property graph containing two types of vertices "per |Property Graph| Vertices in GraphAr -^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^ Logical table of vertices """""""""""""""""""""""""" @@ -123,29 +111,31 @@ Take the "person knows person" edges to illustrate, when the vertex chunk size i |Edge Physical Table2| -Building SDK Steps ---------------------- +Building the Libraries +---------------------- + +Libraries are available for C++ and Spark. -Dependencies -^^^^^^^^^^^^^ +Prerequisites +^^^^^^^^^^^^^^ -**GraphAr** is developed and tested on ubuntu 20.04. It should also work on other unix-like distributions. Building GraphAr requires the following softwares installed as dependencies. +Basic dependencies: - A modern C++ compiler compliant with C++17 standard (g++ >= 7.1 or clang++ >= 5). - `CMake `_ (>=2.8) -Here are the dependencies for optional features: +Dependencies for optional features: - `Doxygen `_ (>= 1.8) for generating documentation; - `sphinx `_ for generating documentation. -Extra dependencies are required by examples and unit tests: +Extra dependencies are required by examples: - `BGL `_ (>= 1.58). -Building and install GraphAr C++ library -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Building +^^^^^^^^^ Once the required dependencies have been installed, go to the root directory of GraphAr and do an out-of-source build using CMake. @@ -158,9 +148,9 @@ Once the required dependencies have been installed, go to the root directory of **Optional**: Using a Custom Namespace -The `namespace` that `gar` is defined in is configurable. By default, -it is defined in `namespace GraphArchive`; however this can be toggled by -setting `NAMESPACE` option with cmake: +The :code:`namespace` is configurable. By default, +it is defined in :code:`namespace GraphArchive`; however this can be toggled by +setting :code:`NAMESPACE` option with cmake: .. code:: shell @@ -181,7 +171,7 @@ Install the GraphAr library: sudo make install -Build the documentation of GraphAr library: +Optionally, you can build the documentation for GraphAr library: .. code-block:: shell @@ -189,37 +179,17 @@ Build the documentation of GraphAr library: pip3 install -r ../requirements-dev.txt --user make doc -Using GraphAr C++ library in your own project ------------------------------------------------ - -The way we recommend to integrate the GraphAr C++ library in your own C++ project is to use -CMake's `find_package` function for locating and integrating dependencies. - -Here is a minimal `CMakeLists.txt` that compiles a source file `my_example.cc` into an executable -target linked with GraphAr C++ shared library. - -.. code-block:: cmake - project(MyExample) +The Spark Library +----------------- - find_package(gar REQUIRED) - include_directories(${GAR_INCLUDE_DIRS}) +See `GraphAr Spark Library`_ for details about the Spark library. - add_executable(my_example my_example.cc) - target_compile_features(my_example PRIVATE cxx_std_17) - target_link_libraries(my_example PRIVATE ${GAR_LIBRARIES}) - -Please refer to `examples/pagerank_example.cc` for details. Contributing to GraphAr ------------------------ - -- Read the `Contribution Guide`_. -- Please report bugs by submitting `GitHub Issues`_ or ask me anything in `Github Discussions`_. -- Submit contributions using pull requests. - -Thank you in advance for your contributions to GraphAr! +---------------------------- +See `Contribution Guide`_ for details on submitting patches and the contribution workflow. License ------- @@ -269,6 +239,8 @@ third-party libraries may not have the same license as GraphAr. .. _GraphAr File Format: https://alibaba.github.io/GraphAr/user-guide/file-format.html +.. _GraphAr Spark Library: https://alibaba.github.io/GraphAr/user-guide/spark-lib.html + .. _example files: https://github.com/GraphScope/gar-test/blob/main/ldbc_sample/ .. _Contribution Guide: https://alibaba.github.io/GraphAr/user-guide/contributing.html diff --git a/docs/index.rst b/docs/index.rst index 1ecea9af4..096535500 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -13,7 +13,7 @@ user-guide/overview.rst user-guide/getting-started.rst user-guide/file-format.rst - user-guide/spark-tool.rst + user-guide/spark-lib.rst .. toctree:: :maxdepth: 1 @@ -34,6 +34,7 @@ .. toctree:: :maxdepth: 1 :caption: API Reference + :hidden: reference/api-reference-cpp.rst Spark API Reference diff --git a/docs/user-guide/overview.rst b/docs/user-guide/overview.rst index 1707024c4..27f4dfeb1 100644 --- a/docs/user-guide/overview.rst +++ b/docs/user-guide/overview.rst @@ -13,8 +13,7 @@ GraphAr aims to serve as the standard file format for importing/exporting and pe The GraphAr project includes such topics as: - Design of the standardized file format for graph data. (see `GraphAr File Format `_) -- The C++ SDK library for reading and writing GAR files. (see `API Reference <../api-reference.html>`_) -- A set of Apache Spark tools for generating, loading and transforming GAR files. (see `GraphAr Spark Tools `_) +- A set of libraries for reading and writing or transforming GAR files. (now the `C++ library <../reference/api-reference-cpp.html>`_ and `Spark library `_ is available) - How to use GraphAr to write graph algorithms, or to work with existing systems such as GraphScope. (see `Application Cases <../applications/out-of-core.html>`_) .. image:: ../images/overview.png diff --git a/docs/user-guide/spark-tool.rst b/docs/user-guide/spark-lib.rst similarity index 91% rename from docs/user-guide/spark-tool.rst rename to docs/user-guide/spark-lib.rst index cd791bece..ff0423e54 100644 --- a/docs/user-guide/spark-tool.rst +++ b/docs/user-guide/spark-lib.rst @@ -1,20 +1,20 @@ -GraphAr Spark Tools +GraphAr Spark Library ============================ Overview ----------- -GraphAr Spark tools are provided as a library for generating, loading and transforming GAR files with Apache Spark easy. It consists of the following parts: +GraphAr Spark library are provided for generating, loading and transforming GAR files with Apache Spark easy. It consists of the following parts: -- **Information Classes**: As same with in C++ SDK, the information classes are implemented as a part of Spark tools for constructing and accessing the meta information about the graphs, vertices and edges in GraphAr. +- **Information Classes**: As same with in C++ SDK, the information classes are implemented as a part of Spark library for constructing and accessing the meta information about the graphs, vertices and edges in GraphAr. - **IndexGenerator**: The IndexGenerator helps to generate the indices for vertex/edge DataFrames. In most cases, IndexGenerator is first utilized to generate the indices for a DataFrame (e.g., from primary keys), and then this DataFrame can be written into GAR files through the Writer. - **Writer**: The GraphAr Spark Writer provides a set of interfaces that can be used to write Spark DataFrames into GAR files. Every time it takes a DataFrame as the logical table for a type of vertices or edges, assembles the data in specified format (e.g., reorganize the edges in the CSR way) and then dumps it to standard GAR files (orc, parquet or CSV files) under the specific directory path. - **Reader**: The GraphAr Spark Reader provides a set of interfaces that can be used to read GAR files. It reads a set of vertices or edges at a time and assembles the result into Spark DataFrames. Similar with the Reader SDK in C++, it supports the users to specify the data they need, e.g., to read a single property group instead of all properties. - + Use Cases ---------- -The GraphAr Spark Tools can be applied to these scenarios: +The GraphAr Spark Library can be applied to these scenarios: - Take GAR as data sources to execute SQL queries or do graph processing (e.g., using GraphX). - Transform data between GAR and other data sources (e.g., Hive, Neo4j, NebulaGraph, ...). @@ -23,10 +23,10 @@ The GraphAr Spark Tools can be applied to these scenarios: - Modify existing GAR data (e.g., add new vertices/edges). -Get GraphAr Spark Tools +Get GraphAr Spark Library ------------------------------ -Make the graphar-spark-tools directory as the current working directory: +Make the graphar-spark-library directory as the current working directory: .. code-block:: shell @@ -46,7 +46,7 @@ How to Use Information Classes ````````````````````` -The information classes are included in Spark tools for constructing and accessing the meta information about the graphs, vertices and edges in GraphAr. They are also used as the essential parameters for constructing readers/writers. In common cases, the information can be built from reading and parsing existing meta files (Yaml files). Also, we support to construct them in memory from nothing. +The information classes are included in Spark library for constructing and accessing the meta information about the graphs, vertices and edges in GraphAr. They are also used as the essential parameters for constructing readers/writers. In common cases, the information can be built from reading and parsing existing meta files (Yaml files). Also, we support to construct them in memory from nothing. To build information from Yaml files, please refer to the following code. diff --git a/examples/bgl_example.cc b/examples/bgl_example.cc index a1ff634f5..064b46eb4 100644 --- a/examples/bgl_example.cc +++ b/examples/bgl_example.cc @@ -33,8 +33,8 @@ int main(int argc, char* argv[]) { std::string path = TEST_DATA_DIR + "/ldbc_sample/parquet/ldbc_sample.graph.yml"; auto graph_info = GAR_NAMESPACE::GraphInfo::Load(path).value(); - assert(graph_info.GetAllVertexInfo().size() == 1); - assert(graph_info.GetAllEdgeInfo().size() == 1); + assert(graph_info.GetVertexInfos().size() == 1); + assert(graph_info.GetEdgeInfos().size() == 1); // construct vertices collection std::string label = "person"; diff --git a/examples/construct_info_example.cc b/examples/construct_info_example.cc index f24b591a8..88318441c 100644 --- a/examples/construct_info_example.cc +++ b/examples/construct_info_example.cc @@ -24,8 +24,8 @@ int main(int argc, char* argv[]) { // validate assert(graph_info.GetName() == name); assert(graph_info.GetPrefix() == prefix); - const auto& vertex_infos = graph_info.GetAllVertexInfo(); - const auto& edge_infos = graph_info.GetAllEdgeInfo(); + const auto& vertex_infos = graph_info.GetVertexInfos(); + const auto& edge_infos = graph_info.GetEdgeInfos(); assert(vertex_infos.size() == 0); assert(edge_infos.size() == 0); @@ -85,7 +85,7 @@ int main(int argc, char* argv[]) { /*------------------add vertex info to graph------------------*/ graph_info.AddVertex(vertex_info); - assert(graph_info.GetAllVertexInfo().size() == 1); + assert(graph_info.GetVertexInfos().size() == 1); assert(graph_info.GetVertexInfo(vertex_label).status().ok()); assert(graph_info.GetVertexPropertyGroup(vertex_label, id.name).value() == group1); @@ -124,7 +124,7 @@ int main(int argc, char* argv[]) { GAR_NAMESPACE::FileType::PARQUET) .ok()); assert( - edge_info.GetAdjListFileType(GAR_NAMESPACE::AdjListType::ordered_by_dest) + edge_info.GetFileType(GAR_NAMESPACE::AdjListType::ordered_by_dest) .value() == GAR_NAMESPACE::FileType::PARQUET); assert( edge_info @@ -185,7 +185,7 @@ int main(int argc, char* argv[]) { assert(res1.status().ok()); edge_info = res1.value(); assert(edge_info - .GetAdjListFileType(GAR_NAMESPACE::AdjListType::ordered_by_source) + .GetFileType(GAR_NAMESPACE::AdjListType::ordered_by_source) .value() == GAR_NAMESPACE::FileType::PARQUET); auto res2 = edge_info.ExtendPropertyGroup( group3, GAR_NAMESPACE::AdjListType::ordered_by_source); @@ -198,7 +198,7 @@ int main(int argc, char* argv[]) { /*------------------add edge info to graph------------------*/ graph_info.AddEdge(edge_info); graph_info.AddEdgeInfoPath("person_knows_person.edge.yml"); - assert(graph_info.GetAllEdgeInfo().size() == 1); + assert(graph_info.GetEdgeInfos().size() == 1); assert( graph_info.GetEdgeInfo(src_label, edge_label, dst_label).status().ok()); assert(graph_info diff --git a/include/gar/graph.h b/include/gar/graph.h index 1ec1c0850..8715984c0 100644 --- a/include/gar/graph.h +++ b/include/gar/graph.h @@ -157,11 +157,11 @@ class VertexIter { cur_offset_ = offset; } - /// Copy constructor. + /** Copy constructor. */ VertexIter(const VertexIter& other) : readers_(other.readers_), cur_offset_(other.cur_offset_) {} - /// Construct and return the vertex of the current offset. + /** Construct and return the vertex of the current offset. */ Vertex operator*() noexcept { for (auto& reader : readers_) { reader.seek(cur_offset_); @@ -169,10 +169,10 @@ class VertexIter { return Vertex(cur_offset_, readers_); } - /// Get the vertex id of the current offset. + /** Get the vertex id of the current offset. */ IdType id() { return cur_offset_; } - /// Get the value for a property of the current vertex. + /** Get the value for a property of the current vertex. */ template Result property(const std::string& property) noexcept { std::shared_ptr column(nullptr); @@ -192,25 +192,25 @@ class VertexIter { return Status::KeyError("The property is not exist."); } - /// The prefix increment operator. + /** The prefix increment operator. */ VertexIter& operator++() noexcept { ++cur_offset_; return *this; } - /// The postfix increment operator. + /** The postfix increment operator. */ VertexIter operator++(int) { VertexIter ret(*this); ++cur_offset_; return ret; } - /// The equality operator. + /** The equality operator. */ bool operator==(const VertexIter& rhs) const noexcept { return cur_offset_ == rhs.cur_offset_; } - /// The inequality operator. + /** The inequality operator. */ bool operator!=(const VertexIter& rhs) const noexcept { return cur_offset_ != rhs.cur_offset_; } @@ -246,18 +246,18 @@ class VerticesCollection { fs->ReadFileToValue(vertex_num_path)); } - /// The iterator pointing to the first vertex. + /** The iterator pointing to the first vertex. */ VertexIter begin() noexcept { return VertexIter(vertex_info_, prefix_, 0); } - /// The iterator pointing to the past-the-end element. + /** The iterator pointing to the past-the-end element. */ VertexIter end() noexcept { return VertexIter(vertex_info_, prefix_, vertex_num_); } - /// The iterator pointing to the vertex with specific id. + /** The iterator pointing to the vertex with specific id. */ VertexIter find(IdType id) { return VertexIter(vertex_info_, prefix_, id); } - /// Get the number of vertices in the collection. + /** Get the number of vertices in the collection. */ size_t size() const noexcept { return vertex_num_; } private: @@ -330,7 +330,7 @@ class EdgeIter { } } - /// Copy constructor. + /** Copy constructor. */ EdgeIter(const EdgeIter& other) : adj_list_reader_(other.adj_list_reader_), offset_reader_(other.offset_reader_), @@ -349,7 +349,7 @@ class EdgeIter { adj_list_type_(other.adj_list_type_), index_converter_(other.index_converter_) {} - /// Construct and return the edge of the current offset. + /** Construct and return the edge of the current offset. */ Edge operator*() { adj_list_reader_.seek(cur_offset_); for (auto& reader : property_readers_) { @@ -358,13 +358,13 @@ class EdgeIter { return Edge(adj_list_reader_, property_readers_); } - /// Get the source vertex id for the current edge. + /** Get the source vertex id for the current edge. */ IdType source(); - /// Get the destination vertex id for the current edge. + /** Get the destination vertex id for the current edge. */ IdType destination(); - /// Get the value of a property for the current edge. + /** Get the value of a property for the current edge. */ template Result property(const std::string& property) noexcept { std::shared_ptr column(nullptr); @@ -384,7 +384,7 @@ class EdgeIter { return Status::KeyError("The property is not exist."); } - /// The prefix increment operator. + /** The prefix increment operator. */ EdgeIter& operator++() { if (num_row_of_chunk_ == 0) { adj_list_reader_.seek(cur_offset_); @@ -424,14 +424,14 @@ class EdgeIter { return *this; } - /// The postfix increment operator. + /** The postfix increment operator. */ EdgeIter operator++(int) { EdgeIter ret(*this); this->operator++(); return ret; } - /// The copy assignment operator. + /** The copy assignment operator. */ EdgeIter operator=(const EdgeIter& other) { adj_list_reader_ = other.adj_list_reader_; offset_reader_ = other.offset_reader_; @@ -452,24 +452,24 @@ class EdgeIter { return *this; } - /// The equality operator. + /** The equality operator. */ bool operator==(const EdgeIter& rhs) const noexcept { return global_chunk_index_ == rhs.global_chunk_index_ && cur_offset_ == rhs.cur_offset_ && adj_list_type_ == rhs.adj_list_type_; } - /// The inequality operator. + /** The inequality operator. */ bool operator!=(const EdgeIter& rhs) const noexcept { return global_chunk_index_ != rhs.global_chunk_index_ || cur_offset_ != rhs.cur_offset_ || adj_list_type_ != rhs.adj_list_type_; } - /// Get the global index of the current edge chunk. + /** Get the global index of the current edge chunk. */ IdType global_chunk_index() const { return global_chunk_index_; } - /// Get the current offset in the current chunk. + /** Get the current offset in the current chunk. */ IdType cur_offset() const { return cur_offset_; } /** @@ -492,7 +492,7 @@ class EdgeIter { */ bool first_dst(const EdgeIter& from, IdType id); - /// Let the iterator to point to the begin. + /** Let the iterator to point to the begin. */ void to_begin() { global_chunk_index_ = chunk_begin_; cur_offset_ = offset_of_chunk_begin_; @@ -502,13 +502,13 @@ class EdgeIter { refresh(); } - /// Check if the current position is the end. + /** Check if the current position is the end. */ bool is_end() const { return global_chunk_index_ == chunk_end_ && cur_offset_ == offset_of_chunk_end_; } - /// Point to the next edge with the same source, return false if not found. + /** Point to the next edge with the same source, return false if not found. */ bool next_src() { if (is_end()) return false; @@ -535,8 +535,10 @@ class EdgeIter { return false; } - /// Point to the next edge with the same destination, return false if not - /// found. + /** + * Point to the next edge with the same destination, return false if not + * found. + */ bool next_dst() { if (is_end()) return false; @@ -563,8 +565,10 @@ class EdgeIter { return false; } - /// Point to the next edge with the specific source, return false if not - /// found. + /** + * Point to the next edge with the specific source, return false if not + * found. + */ bool next_src(IdType id) { if (is_end()) return false; @@ -572,8 +576,10 @@ class EdgeIter { return this->first_src(*this, id); } - /// Point to the next edge with the specific destination, return false if - /// not found. + /** + * Point to the next edge with the specific destination, return false if + * not found. + */ bool next_dst(IdType id) { if (is_end()) return false; @@ -582,7 +588,7 @@ class EdgeIter { } private: - /// Refresh the readers to point to the current position. + // Refresh the readers to point to the current position. void refresh() { adj_list_reader_.seek_chunk_index(vertex_chunk_index_); adj_list_reader_.seek(cur_offset_); @@ -623,7 +629,7 @@ class EdgeIter { template <> class EdgesCollection { public: - static const AdjListType adj_list_type_ = AdjListType::ordered_by_source; + static const AdjListType adj_list_type_; /** * @brief Initialize the EdgesCollection. @@ -636,9 +642,9 @@ class EdgesCollection { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type_)); + base_dir += adj_list_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(auto vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); std::vector edge_chunk_nums(vertex_chunk_num, 0); @@ -672,9 +678,9 @@ class EdgesCollection { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type_)); + base_dir += adj_list_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(auto vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); std::vector edge_chunk_nums(vertex_chunk_num, 0); @@ -706,9 +712,9 @@ class EdgesCollection { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type_)); + base_dir += adj_list_path_prefix; IdType vertex_chunk_num = 0; GAR_ASSIGN_OR_RAISE_ERROR(vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); std::vector edge_chunk_nums(vertex_chunk_num, 0); @@ -731,7 +737,7 @@ class EdgesCollection { offset_of_chunk_end_ = 0; } - /// The iterator pointing to the first edge. + /** The iterator pointing to the first edge. */ EdgeIter begin() { if (begin_ == nullptr) { EdgeIter iter(edge_info_, prefix_, adj_list_type_, chunk_begin_, @@ -743,7 +749,7 @@ class EdgesCollection { return *begin_; } - /// The iterator pointing to the past-the-end element. + /** The iterator pointing to the past-the-end element. */ EdgeIter end() { if (end_ == nullptr) { EdgeIter iter(edge_info_, prefix_, adj_list_type_, chunk_end_, @@ -844,7 +850,7 @@ class EdgesCollection { template <> class EdgesCollection { public: - static const AdjListType adj_list_type_ = AdjListType::ordered_by_dest; + static const AdjListType adj_list_type_; /** * @brief Initialize the EdgesCollection. @@ -857,9 +863,9 @@ class EdgesCollection { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type_)); + base_dir += adj_list_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(auto vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); std::vector edge_chunk_nums(vertex_chunk_num, 0); @@ -893,9 +899,9 @@ class EdgesCollection { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type_)); + base_dir += adj_list_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(auto vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); std::vector edge_chunk_nums(vertex_chunk_num, 0); @@ -927,9 +933,9 @@ class EdgesCollection { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type_)); + base_dir += adj_list_path_prefix; IdType vertex_chunk_num = 0; GAR_ASSIGN_OR_RAISE_ERROR(vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); std::vector edge_chunk_nums(vertex_chunk_num, 0); @@ -952,7 +958,7 @@ class EdgesCollection { offset_of_chunk_end_ = 0; } - /// The iterator pointing to the first edge. + /** The iterator pointing to the first edge. */ EdgeIter begin() { if (begin_ == nullptr) { EdgeIter iter(edge_info_, prefix_, adj_list_type_, chunk_begin_, @@ -964,7 +970,7 @@ class EdgesCollection { return *begin_; } - /// The iterator pointing to the past-the-end element. + /** The iterator pointing to the past-the-end element. */ EdgeIter end() { if (end_ == nullptr) { EdgeIter iter(edge_info_, prefix_, adj_list_type_, chunk_end_, @@ -1065,7 +1071,7 @@ class EdgesCollection { template <> class EdgesCollection { public: - static const AdjListType adj_list_type_ = AdjListType::unordered_by_source; + static const AdjListType adj_list_type_; /** * @brief Initialize the EdgesCollection. @@ -1078,9 +1084,9 @@ class EdgesCollection { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type_)); + base_dir += adj_list_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(auto vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); std::vector edge_chunk_nums(vertex_chunk_num, 0); @@ -1114,9 +1120,9 @@ class EdgesCollection { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type_)); + base_dir += adj_list_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(auto vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); std::vector edge_chunk_nums(vertex_chunk_num, 0); @@ -1148,9 +1154,9 @@ class EdgesCollection { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type_)); + base_dir += adj_list_path_prefix; IdType vertex_chunk_num = 0; GAR_ASSIGN_OR_RAISE_ERROR(vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); std::vector edge_chunk_nums(vertex_chunk_num, 0); @@ -1173,7 +1179,7 @@ class EdgesCollection { offset_of_chunk_end_ = 0; } - /// The iterator pointing to the first edge. + /** The iterator pointing to the first edge. */ EdgeIter begin() { if (begin_ == nullptr) { EdgeIter iter(edge_info_, prefix_, adj_list_type_, chunk_begin_, @@ -1185,7 +1191,7 @@ class EdgesCollection { return *begin_; } - /// The iterator pointing to the past-the-end element. + /** The iterator pointing to the past-the-end element. */ EdgeIter end() { if (end_ == nullptr) { EdgeIter iter(edge_info_, prefix_, adj_list_type_, chunk_end_, @@ -1256,7 +1262,7 @@ class EdgesCollection { template <> class EdgesCollection { public: - static const AdjListType adj_list_type_ = AdjListType::unordered_by_dest; + static const AdjListType adj_list_type_; /** * @brief Initialize the EdgesCollection. @@ -1269,9 +1275,9 @@ class EdgesCollection { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type_)); + base_dir += adj_list_path_prefix; IdType vertex_chunk_num = 0; GAR_ASSIGN_OR_RAISE_ERROR(vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); std::vector edge_chunk_nums(vertex_chunk_num, 0); @@ -1305,9 +1311,9 @@ class EdgesCollection { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type_)); + base_dir += adj_list_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(auto vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); std::vector edge_chunk_nums(vertex_chunk_num, 0); @@ -1340,7 +1346,7 @@ class EdgesCollection { GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type_)); + edge_info.GetAdjListPathPrefix(adj_list_type_)); base_dir += dir_path; IdType vertex_chunk_num = 0; GAR_ASSIGN_OR_RAISE_ERROR(vertex_chunk_num, fs->GetFileNumOfDir(base_dir)); @@ -1364,7 +1370,7 @@ class EdgesCollection { offset_of_chunk_end_ = 0; } - /// The iterator pointing to the first edge. + /** The iterator pointing to the first edge. */ EdgeIter begin() { if (begin_ == nullptr) { EdgeIter iter(edge_info_, prefix_, adj_list_type_, chunk_begin_, @@ -1376,7 +1382,7 @@ class EdgesCollection { return *begin_; } - /// The iterator pointing to the past-the-end element. + /** The iterator pointing to the past-the-end element. */ EdgeIter end() { if (end_ == nullptr) { EdgeIter iter(edge_info_, prefix_, adj_list_type_, chunk_end_, diff --git a/include/gar/graph_info.h b/include/gar/graph_info.h index 476f98831..c64fb1e06 100644 --- a/include/gar/graph_info.h +++ b/include/gar/graph_info.h @@ -34,7 +34,9 @@ namespace GAR_NAMESPACE_INTERNAL { class Yaml; -/// Property is a struct to store the property information. +/** + * Property is a struct to store the property information for a group. + */ struct Property { std::string name; // property name DataType type; // property data type @@ -46,20 +48,34 @@ static bool operator==(const Property& lhs, const Property& rhs) { (lhs.is_primary == rhs.is_primary); } -/// PropertyGroup is a class to store the property group information. +/** + * PropertyGroup is a class to store the property group information. + * + * A property group is a collection of properties with a file type and prefix + * used for chunk files. The prefix is optional and is the concatenation of + * property names with '_' as separator by default. + */ class PropertyGroup { public: - /// Default constructor + /** + * Default constructor. + * + * Creates an empty property group. + */ PropertyGroup() = default; + /** + * Destructor. + */ ~PropertyGroup() {} /** - * Initialize the PropertyGroup + * Initialize the PropertyGroup with a list of properties, file type, and + * optional prefix. * * @param properties Property list of group * @param file_type File type of property group chunk file - * @param prefix prefix of property group chunk file [Option]. the default + * @param prefix prefix of property group chunk file. The default * prefix is the concatenation of property names with '_' as separator */ explicit PropertyGroup(std::vector properties, FileType file_type, @@ -74,25 +90,45 @@ class PropertyGroup { } } - /// Copy constructor + /** + * Copy constructor. + */ PropertyGroup(const PropertyGroup& other) = default; - /// Move constructor + + /** + * Move constructor. + */ PropertyGroup(PropertyGroup&& other) = default; - /// Copy assignment operator + /** + * Copy assignment operator. + */ inline PropertyGroup& operator=(const PropertyGroup& other) = default; - /// Move assignment operator + + /** + * Move assignment operator. + */ inline PropertyGroup& operator=(PropertyGroup&& other) = default; - /// Get the property list of group + /** + * Get the property list of group. + * + * @return The property list of group. + */ inline const std::vector& GetProperties() const { return properties_; } - /// Get the file type of property group chunk file + /** Get the file type of property group chunk file. + * + * @return The file type of group. + */ inline FileType GetFileType() const { return file_type_; } - /// Get the prefix of property group chunk file + /** Get the prefix of property group chunk file. + * + * @return The path prefix of group. + */ inline const std::string& GetPrefix() const { return prefix_; } private: @@ -107,21 +143,24 @@ static bool operator==(const PropertyGroup& lhs, const PropertyGroup& rhs) { (lhs.GetProperties() == rhs.GetProperties()); } -/// VertexInfo is a class to store the vertex meta information. +/** + * VertexInfo is a class that stores metadata information about a vertex. + */ class VertexInfo { public: - /// Default constructor + /** + * Default constructor. + */ VertexInfo() = default; - ~VertexInfo() {} - /** - * @brief Initialize the vertex info. + * Construct a VertexInfo object with the given metadata information. * * @param label The label of the vertex. - * @param chunk_size number of vertex in each vertex chunk. - * @param version version of the vertex info. - * @param prefix prefix of the vertex info. + * @param chunk_size The number of vertices in each vertex chunk. + * @param version The version of the vertex info. + * @param prefix The prefix of the vertex info. If left empty, the default + * prefix will be set to the label of the vertex. */ explicit VertexInfo(const std::string& label, IdType chunk_size, const InfoVersion& version, @@ -135,19 +174,37 @@ class VertexInfo { } } - /// Copy constructor + /** + * Destructor. + */ + ~VertexInfo() {} + + /** + * Copy constructor. + */ VertexInfo(const VertexInfo& vertex_info) = default; - /// Move constructor + /** + * Move constructor. + */ explicit VertexInfo(VertexInfo&& vertex_info) = default; - /// Copy assignment operator + /** + * Copy assignment operator. + */ inline VertexInfo& operator=(const VertexInfo& other) = default; - /// Move assignment operator + /** + * Move assignment operator. + */ inline VertexInfo& operator=(VertexInfo&& other) = default; - /// Add a property group to vertex info + /** + * Adds a property group to the vertex info. + * + * @param property_group The PropertyGroup object to add. + * @return A Status object indicating success or failure. + */ inline Status AddPropertyGroup(const PropertyGroup& property_group) { if (ContainPropertyGroup(property_group)) { return Status::InvalidOperation( @@ -174,24 +231,50 @@ class VertexInfo { return Status::OK(); } - /// Get the label of the vertex. + /** + * Get the label of the vertex. + * + * @return The label of the vertex. + */ inline std::string GetLabel() const { return label_; } - /// Get the chunk size of the vertex. + /** + * Get the chunk size of the vertex. + * + * @return The chunk size of the vertex. + */ inline IdType GetChunkSize() const { return chunk_size_; } - /// Get the path prefix of the vertex. + /** + * Get the path prefix of the vertex. + * + * @return The path prefix of the vertex. + */ inline std::string GetPrefix() const { return prefix_; } - /// Get the version info of the vertex. + /** + * Get the version info of the vertex. + * + * @return The version info of the vertex. + */ inline const InfoVersion& GetVersion() const { return version_; } - /// Get the property groups of the vertex. + /** + * Get the property groups of the vertex. + * + *@return A vector of PropertyGroup objects for the vertex. + */ inline const std::vector& GetPropertyGroups() const { return property_groups_; } - /// Get the property group that contains property + /** + * Get the property group that contains the specified property. + * + * @param property_name The name of the property. + * @return A Result object containing the PropertyGroup object, or a KeyError + * Status object if the property is not found. + */ Result GetPropertyGroup( const std::string& property_name) const noexcept { if (!ContainProperty(property_name)) { @@ -200,7 +283,13 @@ class VertexInfo { return property_groups_[p2group_index_.at(property_name)]; } - /// Get the data type of property + /** + * Get the data type of the specified property. + * + * @param property_name The name of the property. + * @return A Result object containing the data type of the property, or a + * KeyError Status object if the property is not found. + */ inline Result GetPropertyType( const std::string& property_name) const noexcept { if (p2type_.find(property_name) == p2type_.end()) { @@ -209,18 +298,39 @@ class VertexInfo { return p2type_.at(property_name); } - /// Check if the vertex info contains certain property. + /** + * Get whether the vertex info contains the specified property. + * + * @param property_name The name of the property. + * @return True if the property exists in the vertex info, False otherwise. + */ bool ContainProperty(const std::string& property_name) const { return p2type_.find(property_name) != p2type_.end(); } - /// Save the vertex info to yaml file - Status Save(const std::string& path) const; + /** + * Saves the vertex info to a YAML file. + * + * @param file_name The name of the file to save to. + * @return A Status object indicating success or failure. + */ + Status Save(const std::string& file_name) const; - /// Dump the vertex info to yaml format string + /** + * Returns the vertex info as a YAML formatted string. + * + * @return A Result object containing the YAML string, or a Status object + * indicating an error. + */ Result Dump() const noexcept; - /// Check if the property is primary key or not + /** + * Returns whether the specified property is a primary key. + * + * @param property_name The name of the property. + * @return A Result object containing a bool indicating whether the property + * is a primary key, or a KeyError Status object if the property is not found. + */ inline Result IsPrimaryKey(const std::string& property_name) const noexcept { if (p2primary_.find(property_name) == p2primary_.end()) { @@ -229,7 +339,13 @@ class VertexInfo { return p2primary_.at(property_name); } - /// Check if the vertex info contains the property group. + /** + * Returns whether the vertex info contains the specified property group. + * + * @param property_group The PropertyGroup object to check for. + * @return True if the property group exists in the vertex info, False + * otherwise. + */ bool ContainPropertyGroup(const PropertyGroup& property_group) const { for (const auto& pg : property_groups_) { if (pg == property_group) { @@ -239,7 +355,14 @@ class VertexInfo { return false; } - /// Extending the property groups of vertex and return a new vertex info + /** + * Returns a new VertexInfo object with the specified property group added to + * it. + * + * @param property_group The PropertyGroup object to add. + * @return A Result object containing the new VertexInfo object, or a Status + * object indicating an error. + */ const Result Extend(const PropertyGroup& property_group) const noexcept { VertexInfo new_info(*this); @@ -247,7 +370,14 @@ class VertexInfo { return new_info; } - /// Get the chunk file path of property group of vertex chunk + /** + * Get the file path for the specified property group and chunk index. + * + * @param property_group The PropertyGroup object to get the file path for. + * @param chunk_index The chunk index. + * @return A Result object containing the file path, or a KeyError Status + * object if the property group is not found in the vertex info. + */ inline Result GetFilePath(const PropertyGroup& property_group, IdType chunk_index) const noexcept { if (!ContainPropertyGroup(property_group)) { @@ -258,23 +388,36 @@ class VertexInfo { std::to_string(chunk_index); } - /// Get the chunk files directory path of property group - inline Result GetDirPath( + /** + * Get the path prefix for the specified property group. + * + * @param property_group The PropertyGroup object to get the path prefix for. + * @return A Result object containing the path prefix, or a KeyError Status + * object if the property group is not found in the vertex info. + */ + inline Result GetPathPrefix( const PropertyGroup& property_group) const noexcept { if (!ContainPropertyGroup(property_group)) { return Status::KeyError( "Vertex info does not contain the property group."); } - return prefix_ + property_group.GetPrefix(); } - /// Get the chunk file path of the number of vertices + /** + * Get the file path for the number of vertices. + * + * @return The file path for the number of vertices. + */ inline Result GetVerticesNumFilePath() const noexcept { return prefix_ + "vertex_count"; } - /// Check if the vertex info is validated + /** + * Returns whether the vertex info is validated. + * + * @return True if the vertex info is valid, False otherwise. + */ bool IsValidated() const noexcept { if (label_.empty() || chunk_size_ <= 0 || prefix_.empty()) { return false; @@ -292,7 +435,13 @@ class VertexInfo { return true; } - /// Load the input yaml as a VertexInfo instance. + /** + * Loads vertex info from a YAML object. + * + * @param yaml A shared pointer to a Yaml object containing the YAML string. + * @return A Result object containing the VertexInfo object, or a Status + * object indicating an error. + */ static Result Load(std::shared_ptr yaml); private: @@ -306,26 +455,34 @@ class VertexInfo { std::map p2group_index_; }; -/// Edge info is a class to store the edge meta information. +/** + * EdgeInfo is a class that stores metadata information about an edge. + */ class EdgeInfo { public: - /// Default constructor + /** + * Default constructor. + */ EdgeInfo() = default; + /** + * Destructor + */ ~EdgeInfo() {} /** - * @brief Initialize the EdgeInfo. + * @brief Construct an EdgeInfo object with the given metadata information. * - * @param src_label source vertex label - * @param edge_label edge label - * @param dst_label destination vertex label - * @param chunk_size number of edges in each edge chunk - * @param src_chunk_size number of source vertices in each vertex chunk - * @param dst_chunk_size number of destination vertices in each vertex chunk - * @param directed whether the edge is directed - * @param version version of the edge info - * @param prefix prefix of the edge info + * @param src_label The label of the source vertex. + * @param edge_label The label of the edge. + * @param dst_label The label of the destination vertex. + * @param chunk_size The number of edges in each edge chunk. + * @param src_chunk_size The number of source vertices in each vertex chunk. + * @param dst_chunk_size The number of destination vertices in each vertex + * chunk. + * @param directed Whether the edge is directed. + * @param version The version of the edge info. + * @param prefix The path prefix of the edge info. */ explicit EdgeInfo(const std::string& src_label, const std::string& edge_label, const std::string& dst_label, IdType chunk_size, @@ -346,27 +503,38 @@ class EdgeInfo { } } - /// Copy constructor + /** + * Copy constructor. + */ EdgeInfo(const EdgeInfo& info) = default; - /// Move constructor + /** + * Move constructor. + */ explicit EdgeInfo(EdgeInfo&& info) = default; - /// Copy assignment operator + /** + * Copy assignment operator. + */ inline EdgeInfo& operator=(const EdgeInfo& other) = default; - /// Move assignment operator + /** + * Move assignment operator. + */ inline EdgeInfo& operator=(EdgeInfo&& other) = default; /** - * @brief Add adj list information to edge info - * The adj list information is used to store the edge list by CSR, CSC - * or COO format. + * Add an adjacency list information to the edge info. + * The adjacency list information indicating the adjacency list stored with + * CSR, CSC, or COO format. * - * @param adj_list_type adj list type to add - * @param file_type the file type of adj list topology and offset chunk file - * @param prefix prefix of adj list topology chunk, optional, default is empty - * @return InvalidOperation if the adj list type is already added + * @param adj_list_type The type of the adjacency list to add. + * @param file_type The file type of the adjacency list topology and offset + * chunk file. + * @param prefix The prefix of the adjacency list topology chunk (optional, + * default is empty). + * @return A Status object indicating success or an error if the adjacency + * list type has already been added. */ Status AddAdjList(const AdjListType& adj_list_type, FileType file_type, const std::string& prefix = "") { @@ -387,13 +555,13 @@ class EdgeInfo { } /** - * @brief Add a property group to edge info - * Each adj list type has its own property groups. + * Add a property group to edge info for the given adjacency list type. * - * @param property_group property group to add - * @param adj_list_type adj list type to add property group to - * @return InvalidOperation if adj_list_type not support or - * the property group is already added to the adj list type + * @param property_group Property group to add. + * @param adj_list_type Adjacency list type to add property group to. + * @return A Status object indicating success or an error if adj_list_type is + * not supported by edge info or if the property group is already added to the + * adjacency list type. */ Status AddPropertyGroup(const PropertyGroup& property_group, AdjListType adj_list_type) noexcept { @@ -419,52 +587,78 @@ class EdgeInfo { return Status::OK(); } - /// Get source vertex label of edge. + /** + * Get the label of the source vertex. + * @return The label of the source vertex. + */ inline std::string GetSrcLabel() const { return src_label_; } - /// Get edge label of edge. + /** + * Get the label of the edge. + * @return The label of the edge. + */ inline std::string GetEdgeLabel() const { return edge_label_; } - /// Get destination vertex label of edge. + /** + * Get the label of the destination vertex. + * @return The label of the destination vertex. + */ inline std::string GetDstLabel() const { return dst_label_; } - /// Get chunk size of edge. + /** + * Get the number of edges in each edge chunk. + * @return The number of edges in each edge chunk. + */ inline IdType GetChunkSize() const { return chunk_size_; } - /// Get chunk size of source vertex. + /** + * Get the number of source vertices in each vertex chunk. + * @return The number of source vertices in each vertex chunk. + */ inline IdType GetSrcChunkSize() const { return src_chunk_size_; } - /// Get chunk size of destination vertex. + /** + * Get the number of destination vertices in each vertex chunk. + * @return The number of destination vertices in each vertex chunk. + */ inline IdType GetDstChunkSize() const { return dst_chunk_size_; } - /// Get path prefix of edge. + /** + * Get the path prefix of the edge. + * @return The path prefix of the edge. + */ inline std::string GetPrefix() const { return prefix_; } - /// Check if edge is directed. + /** + * Returns whether the edge is directed. + * @return True if the edge is directed, false otherwise. + */ inline bool IsDirected() const noexcept { return directed_; } - /// Get the version info of the edge. + /** + * Get the version info of the edge. + * @return The version info of the edge. + */ inline const InfoVersion& GetVersion() const { return version_; } - /// Get path prefix of adj list type. - inline Result GetAdjListPrefix(AdjListType adj_list_type) const { - if (!ContainAdjList(adj_list_type)) { - return Status::KeyError("The adj list type is not found in edge info."); - } - return adj_list2prefix_.at(adj_list_type); - } - - /// Check if the edge info contains the adj list type + /** + * Return whether the edge info contains the adjacency list information. + * + * @param adj_list_type The adjacency list type. + * @return True if the edge info contains the adjacency list information, + * false otherwise. + */ inline bool ContainAdjList(AdjListType adj_list_type) const noexcept { return adj_list2prefix_.find(adj_list_type) != adj_list2prefix_.end(); } /** - * @brief Check if the edge info contains the property group - * if adj_list_type is not supported by edge info, return false + * Returns whether the edge info contains the given property group for the + * specified adjacency list type. * - * @param property_group property group to check - * @param adj_list_type the adj list type property group belongs to + * @param property_group Property group to check. + * @param adj_list_type Adjacency list type the property group belongs to. + * @return True if the edge info contains the property group, false otherwise. */ inline bool ContainPropertyGroup(const PropertyGroup& property_group, AdjListType adj_list_type) const { @@ -479,13 +673,27 @@ class EdgeInfo { return false; } - /// Check if the edge info contains the property + /** + * @brief Returns whether the edge info contains the given property for any + * adjacency list type. + * + * @param property Property name to check. + * @return True if the edge info contains the property, false otherwise. + */ bool ContainProperty(const std::string& property) const { return p2type_.find(property) != p2type_.end(); } - /// Get the adj list topology chunk file type of adj list type - inline Result GetAdjListFileType(AdjListType adj_list_type) const + /** + * Get the file type of the adjacency list topology and offset chunk file for + * the given adjacency list type. + * + * @param adj_list_type The adjacency list type. + * @return A Result object containing the file type, or a Status object + * indicating an KeyError if the adjacency list type is not found in the edge + * info. + */ + inline Result GetFileType(AdjListType adj_list_type) const noexcept { if (!ContainAdjList(adj_list_type)) { return Status::KeyError("The adj list type is not found in edge info."); @@ -493,8 +701,14 @@ class EdgeInfo { return adj_list2file_type_.at(adj_list_type); } - /// Get the property groups of adj list type - /// if adj_list_type is not supported by edge info, return error. + /** + * @brief Get the property groups for the given adjacency list type. + * + * @param adj_list_type Adjacency list type. + * @return A Result object containing reference to the property groups for the + * given adjacency list type, or a Status object indicating an KeyError if the + * adjacency list type is not found in the edge info. + */ inline Result&> GetPropertyGroups( AdjListType adj_list_type) const noexcept { if (!ContainAdjList(adj_list_type)) { @@ -504,11 +718,14 @@ class EdgeInfo { } /** - * @brief Return property group that contains property and with adj - * list type + * @brief Get the property group containing the given property and for the + * specified adjacency list type. * - * @param property property name - * @param adj_list_type adj list type of the property group + * @param property Property name. + * @param adj_list_type Adjacency list type. + * @return A Result object containing reference to the property group, or a + * Status object indicating an KeyError if the adjacency list type is not + * found in the edge info. */ inline Result GetPropertyGroup( const std::string& property, AdjListType adj_list_type) const noexcept { @@ -541,9 +758,15 @@ class EdgeInfo { std::to_string(edge_chunk_index); } - /// Get the adj list topology chunk file directory path of adj list type - inline Result GetAdjListDirPath(AdjListType adj_list_type) const - noexcept { + /** + * Get the path prefix of the adjacency list topology chunk for the given + * adjacency list type. + * @param adj_list_type The adjacency list type. + * @return A Result object containing the directory, or a Status object + * indicating an error. + */ + inline Result GetAdjListPathPrefix( + const AdjListType& adj_list_type) const noexcept { if (!ContainAdjList(adj_list_type)) { return Status::KeyError("The adj list type is not found in edge info."); } @@ -551,7 +774,7 @@ class EdgeInfo { } /** - * @brief Get the adj list offset chunk file path of vertex chunk + * @brief Get the adjacency list offset chunk file path of vertex chunk * the offset chunks is aligned with the vertex chunks * * @param vertex_chunk_index index of vertex chunk @@ -565,8 +788,14 @@ class EdgeInfo { std::to_string(vertex_chunk_index); } - /// Get the adj list offset chunk file directory path of adj list type - inline Result GetAdjListOffsetDirPath( + /** + * Get the path prefix of the adjacency list offset chunk for the given + * adjacency list type. + * @param adj_list_type The adjacency list type. + * @return A Result object containing the path prefix, or a Status object + * indicating an error. + */ + inline Result GetOffsetPathPrefix( AdjListType adj_list_type) const noexcept { if (!ContainAdjList(adj_list_type)) { return Status::KeyError("The adj list type is not found in edge info."); @@ -597,8 +826,15 @@ class EdgeInfo { std::to_string(edge_chunk_index); } - /// Get the property group chunk file directory path of adj list type - inline Result GetPropertyDirPath( + /** + * Get the path prefix of the property group chunk for the given + * adjacency list type. + * @param property_group property group. + * @param adj_list_type The adjacency list type. + * @return A Result object containing the path prefix, or a Status object + * indicating an error. + */ + inline Result GetPropertyGroupPathPrefix( const PropertyGroup& property_group, AdjListType adj_list_type) const noexcept { if (!ContainPropertyGroup(property_group, adj_list_type)) { @@ -609,7 +845,13 @@ class EdgeInfo { property_group.GetPrefix(); } - /// Get the data type of property + /** + * Get the data type of the specified property. + * + * @param property_name The name of the property. + * @return A Result object containing the data type of the property, or a + KeyError Status object if the property is not found. + */ Result GetPropertyType(const std::string& property) const noexcept { if (p2type_.find(property) == p2type_.end()) { return Status::KeyError("The property is not found."); @@ -617,7 +859,13 @@ class EdgeInfo { return p2type_.at(property); } - /// Check if the property is primary key + /** + * Returns whether the specified property is a primary key. + * + * @param property_name The name of the property. + * @return A Result object containing a bool indicating whether the property + * is a primary key, or a KeyError Status object if the property is not found. + */ Result IsPrimaryKey(const std::string& property) const noexcept { if (p2primary_.find(property) == p2primary_.end()) { return Status::KeyError("The property is not found."); @@ -625,20 +873,33 @@ class EdgeInfo { return p2primary_.at(property); } - /// Save the edge info to yaml file - Status Save(const std::string& path) const; + /** + * Saves the edge info to a YAML file. + * + * @param file_name The name of the file to save to. + * @return A Status object indicating success or failure. + */ + Status Save(const std::string& file_name) const; - /// Dump the vertex info to yaml format string + /** + * Returns the edge info as a YAML formatted string. + * + * @return A Result object containing the YAML string, or a Status object + * indicating an error. + */ Result Dump() const noexcept; /** - * @brief Extend the adj list type of edge info and return a new edge info - * return error if the adj list type is already contained + * Returns a new EdgeInfo object with the specified adjacency list type + * added to with given metadata. * - * @param adj_list_type adj list type to extend - * @param prefix path prefix of adj list type - * @param file_type file type of adj list topology and offset chunks - * @return new edge info + * @param adj_list_type The type of the adjacency list to add. + * @param file_type The file type of the adjacency list topology and offset + * chunk file. + * @param prefix The prefix of the adjacency list topology chunk (optional, + * default is empty). + * @return A Result object containing the new EdgeInfo object, or a Status + * object indicating an error. */ const Result ExtendAdjList(AdjListType adj_list_type, FileType file_type, @@ -650,12 +911,13 @@ class EdgeInfo { } /** - * @brief Extend the property groups of adj list type and return a new edge - * info return error if the property group is already contained or the adj - * list type is not contained - * @param property_group property group to extend. - * @param adj_list_type the adj list type of property group - * @return new edge info + * Returns a new EdgeInfo object with the specified property group added to + * given adjacency list type. + * + * @param property_group The PropertyGroup object to add. + * @param adj_list_type The adjacency list type to add the property group to. + * @return A Result object containing the new EdgeInfo object, or a Status + * object indicating an error. */ const Result ExtendPropertyGroup( const PropertyGroup& property_group, AdjListType adj_list_type) const @@ -665,7 +927,11 @@ class EdgeInfo { return new_info; } - /// Check if the edge info is validated + /** + * Returns whether the edge info is validated. + * + * @return True if the edge info is valid, False otherwise. + */ bool IsValidated() const noexcept { if (src_label_.empty() || edge_label_.empty() || dst_label_.empty()) { return false; @@ -700,7 +966,7 @@ class EdgeInfo { return true; } - /// Loads the yaml as a EdgeInfo instance. + /** Loads the yaml as a EdgeInfo instance. */ static Result Load(std::shared_ptr yaml); private: @@ -719,38 +985,47 @@ class EdgeInfo { std::map> adj_list2property_groups_; }; -/// GraphInfo is is a class to store the graph meta information. +/** + * GraphInfo is is a class to store the graph meta information. + */ class GraphInfo { public: /** - * @brief Initialize the GraphInfo. - * the prefix of graph would be ./ by default. - * - * @param[in] graph_name name of graph - * @param[in] version version of graph info - * @param[in] prefix absolute path prefix to store chunk files of graph. + * @brief Constructs a GraphInfo instance. + * @param graph_name The name of the graph. + * @param version The version of the graph info. + * @param prefix The absolute path prefix to store chunk files of the graph. + * Defaults to "./". */ explicit GraphInfo(const std::string& graph_name, const InfoVersion& version, const std::string& prefix = "./") : name_(graph_name), version_(version), prefix_(prefix) {} /** - * @brief Loads the input file as a GraphInfo instance. - * - * @param[in] path path of yaml file. + * @brief Loads the input file as a `GraphInfo` instance. + * @param path The path of the YAML file. + * @return A Result object containing the GraphInfo instance, or a Status + * object indicating an error. */ static Result Load(const std::string& path); /** - * @brief Loads the input string as a GraphInfo instance. - * - * @param[in] content yaml content string. - * @param[in] relative_path relative path to access vertex/edge yaml. + * @brief Loads the input string as a `GraphInfo` instance. + * @param content The YAML content string. + * @param relative_path The relative path to access vertex/edge YAML. + * @return A Result object containing the GraphInfo instance, or a `Status` + * object indicating an error. */ static Result Load(const std::string& input, const std::string& relative_path); - /// Add a vertex info to graph info + /** + * @brief Adds a vertex info to the GraphInfo instance. + * @param vertex_info The vertex info to add. + * @return A Status object indicating the success or failure of the + * operation. Returns InvalidOperation if the vertex info is already + * contained. + */ Status AddVertex(const VertexInfo& vertex_info) noexcept { std::string label = vertex_info.GetLabel(); if (vertex2info_.find(label) != vertex2info_.end()) { @@ -760,7 +1035,13 @@ class GraphInfo { return Status::OK(); } - /// Add an edge info to graph info + /** + * @brief Adds an edge info to the GraphInfo instance. + * @param edge_info The edge info to add. + * @return A Status object indicating the success or failure of the + * operation. Returns `InvalidOperation` if the edge info is already + * contained. + */ Status AddEdge(const EdgeInfo& edge_info) noexcept { std::string key = edge_info.GetSrcLabel() + REGULAR_SEPERATOR + edge_info.GetEdgeLabel() + REGULAR_SEPERATOR + @@ -773,33 +1054,48 @@ class GraphInfo { } /** - *@brief Add a vertex info path to graph info + *@brief Add a vertex info path to graph info instance. * - *@param path vertex info path to add + *@param path The vertex info path to add */ void AddVertexInfoPath(const std::string& path) noexcept { vertex_paths_.push_back(path); } /** - *@brief Add a edge info path to graph info + *@brief Add a edge info path to graph info instance. * - *@param path edge info path to add + *@param path The edge info path to add */ void AddEdgeInfoPath(const std::string& path) noexcept { edge_paths_.push_back(path); } - /// Get the name of graph + /** + * @brief Get the name of the graph. + * @return The name of the graph. + */ inline std::string GetName() const noexcept { return name_; } - /// Get the absolute path prefix of chunk files. + /** + * @brief Get the absolute path prefix of the chunk files. + * @return The absolute path prefix of the chunk files. + */ inline std::string GetPrefix() const noexcept { return prefix_; } - /// Get the version info of the edge. + /** + * Get the version info of the graph info object. + * + * @return The version info of the graph info object. + */ inline const InfoVersion& GetVersion() const { return version_; } - /// Get the vertex info by vertex label + /** + * Get the vertex info with the given label. + * @param label The label of the vertex. + * @return A Result object containing the vertex info, or a Status object + * indicating an error. + */ inline Result GetVertexInfo(const std::string& label) const noexcept { if (vertex2info_.find(label) == vertex2info_.end()) { @@ -809,11 +1105,13 @@ class GraphInfo { } /** - *@brief Get the edge info by src label, edge label and dst label - * - *@param src_label source vertex label - *@param edge_label edge label - *@param dst_label destination vertex label + * Get the edge info with the given source vertex label, edge label, and + * destination vertex label. + * @param src_label The label of the source vertex. + * @param edge_label The label of the edge. + * @param dst_label The label of the destination vertex. + * @return A Result object containing the edge info, or a Status object + * indicating an error. */ inline Result GetEdgeInfo(const std::string& src_label, const std::string& edge_label, @@ -862,25 +1160,46 @@ class GraphInfo { return edge2info_.at(key).GetPropertyGroup(property, adj_list_type); } - /// Get all vertex info of graph. - inline const std::map& GetAllVertexInfo() const + /** + * @brief Get the vertex infos of graph info + * + * @return vertex infos of graph info + */ + inline const std::map& GetVertexInfos() const noexcept { return vertex2info_; } - /// Get all edge info of graph. - inline const std::map& GetAllEdgeInfo() const - noexcept { + /** + * @brief Get the edge infos of graph info + * + * @return edge infos of graph info + */ + inline const std::map& GetEdgeInfos() const noexcept { return edge2info_; } - /// Save the graph info to yaml file + /** + * Saves the graph info to a YAML file. + * + * @param file_name The name of the file to save to. + * @return A Status object indicating success or failure. + */ Status Save(const std::string& path) const; - /// Dump the graph info to yaml string + /** + * Returns the graph info as a YAML formatted string. + * + * @return A Result object containing the YAML string, or a Status object + * indicating an error. + */ Result Dump() const noexcept; - /// Check if the graph info is validated + /** + * Returns whether the graph info is validated. + * + * @return True if the graph info is valid, False otherwise. + */ inline bool IsValidated() const noexcept { if (name_.empty() || prefix_.empty()) { return false; diff --git a/include/gar/reader/arrow_chunk_reader.h b/include/gar/reader/arrow_chunk_reader.h index 914e65d54..54cb8d7e6 100644 --- a/include/gar/reader/arrow_chunk_reader.h +++ b/include/gar/reader/arrow_chunk_reader.h @@ -59,9 +59,9 @@ class VertexPropertyArrowChunkReader { seek_id_(chunk_index * vertex_info.GetChunkSize()), chunk_table_(nullptr) { GAR_ASSIGN_OR_RAISE_ERROR(fs_, FileSystemFromUriOrPath(prefix, &prefix_)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - vertex_info.GetDirPath(property_group)); - std::string base_dir = prefix_ + dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto pg_path_prefix, + vertex_info.GetPathPrefix(property_group)); + std::string base_dir = prefix_ + pg_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(chunk_num_, fs_->GetFileNumOfDir(base_dir)); } @@ -156,9 +156,9 @@ class AdjListArrowChunkReader { seek_offset_(0), chunk_table_(nullptr) { GAR_ASSIGN_OR_RAISE_ERROR(fs_, FileSystemFromUriOrPath(prefix, &prefix_)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type)); - base_dir_ = prefix_ + dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type)); + base_dir_ = prefix_ + adj_list_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(vertex_chunk_num_, fs_->GetFileNumOfDir(base_dir_)); std::string chunk_dir = @@ -312,7 +312,7 @@ class AdjListOffsetArrowChunkReader { chunk_table_(nullptr) { GAR_ASSIGN_OR_RAISE_ERROR(fs_, FileSystemFromUriOrPath(prefix, &prefix_)); GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListOffsetDirPath(adj_list_type)); + edge_info.GetOffsetPathPrefix(adj_list_type)); base_dir_ = prefix_ + dir_path; if (adj_list_type == AdjListType::ordered_by_source || adj_list_type == AdjListType::ordered_by_dest) { @@ -415,9 +415,9 @@ class AdjListPropertyArrowChunkReader { chunk_table_(nullptr) { GAR_ASSIGN_OR_RAISE_ERROR(fs_, FileSystemFromUriOrPath(prefix, &prefix_)); GAR_ASSIGN_OR_RAISE_ERROR( - auto dir_path, - edge_info.GetPropertyDirPath(property_group, adj_list_type)); - base_dir_ = prefix_ + dir_path; + auto pg_path_prefix, + edge_info.GetPropertyGroupPathPrefix(property_group, adj_list_type)); + base_dir_ = prefix_ + pg_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(vertex_chunk_num_, fs_->GetFileNumOfDir(base_dir_)); std::string chunk_dir = diff --git a/include/gar/reader/chunk_info_reader.h b/include/gar/reader/chunk_info_reader.h index 2ac68bdc3..205800803 100644 --- a/include/gar/reader/chunk_info_reader.h +++ b/include/gar/reader/chunk_info_reader.h @@ -30,7 +30,7 @@ limitations under the License. namespace GAR_NAMESPACE_INTERNAL { -/// The chunk info reader for vertex property group. +/** The chunk info reader for vertex property group. */ class VertexPropertyChunkInfoReader { public: ~VertexPropertyChunkInfoReader() {} @@ -53,9 +53,9 @@ class VertexPropertyChunkInfoReader { std::string base_dir; GAR_ASSIGN_OR_RAISE_ERROR(auto fs, FileSystemFromUriOrPath(prefix, &base_dir)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - vertex_info.GetDirPath(property_group)); - base_dir += dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto pg_path_prefix, + vertex_info.GetPathPrefix(property_group)); + base_dir += pg_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(chunk_num_, fs->GetFileNumOfDir(base_dir)); } @@ -86,9 +86,12 @@ class VertexPropertyChunkInfoReader { return prefix_ + chunk_file_path; } - /// Sets chunk position indicator to next chunk. - /// if current chunk is the last chunk, will return Status::OutOfRange - /// error. + /** + * Sets chunk position indicator to next chunk. + * + * if current chunk is the last chunk, will return Status::OutOfRange + * error. + */ Status next_chunk() noexcept { if (++chunk_index_ >= chunk_num_) { return Status::OutOfRange(); @@ -96,7 +99,7 @@ class VertexPropertyChunkInfoReader { return Status::OK(); } - /// Get the chunk number of the current vertex property group. + /** Get the chunk number of the current vertex property group. */ IdType GetChunkNum() noexcept { return chunk_num_; } private: @@ -107,7 +110,7 @@ class VertexPropertyChunkInfoReader { IdType chunk_num_; }; -/// The chunk info reader for adj list topology chunk. +/** The chunk info reader for adj list topology chunk. */ class AdjListChunkInfoReader { public: ~AdjListChunkInfoReader() {} @@ -128,9 +131,9 @@ class AdjListChunkInfoReader { vertex_chunk_index_(0), chunk_index_(0) { GAR_ASSIGN_OR_RAISE_ERROR(fs_, FileSystemFromUriOrPath(prefix, &base_dir_)); - GAR_ASSIGN_OR_RAISE_ERROR(auto dir_path, - edge_info.GetAdjListDirPath(adj_list_type)); - base_dir_ = prefix_ + dir_path; + GAR_ASSIGN_OR_RAISE_ERROR(auto adj_list_path_prefix, + edge_info.GetAdjListPathPrefix(adj_list_type)); + base_dir_ = prefix_ + adj_list_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(vertex_chunk_num_, fs_->GetFileNumOfDir(base_dir_)); std::string chunk_dir = @@ -167,7 +170,7 @@ class AdjListChunkInfoReader { return Status::OK(); } - /// Return the current chunk file path of chunk position indicator. + /** Return the current chunk file path of chunk position indicator. */ Result GetChunk() noexcept { GAR_ASSIGN_OR_RAISE(auto chunk_file_path, edge_info_.GetAdjListFilePath( @@ -175,9 +178,12 @@ class AdjListChunkInfoReader { return prefix_ + chunk_file_path; } - /// Sets chunk position indicator to next chunk. - /// if current chunk is the last chunk, will return Status::OutOfRange - /// error. + /** + * Sets chunk position indicator to next chunk. + * + * if current chunk is the last chunk, will return Status::OutOfRange + * error. + */ Status next_chunk() { if (++chunk_index_ >= chunk_num_) { ++vertex_chunk_index_; @@ -202,7 +208,9 @@ class AdjListChunkInfoReader { std::shared_ptr fs_; }; -/// The chunk info reader for edge property group chunk. +/** + * The chunk info reader for edge property group chunk. + */ class AdjListPropertyChunkInfoReader { public: /** @@ -225,9 +233,9 @@ class AdjListPropertyChunkInfoReader { chunk_index_(0) { GAR_ASSIGN_OR_RAISE_ERROR(fs_, FileSystemFromUriOrPath(prefix, &base_dir_)); GAR_ASSIGN_OR_RAISE_ERROR( - auto dir_path, - edge_info.GetPropertyDirPath(property_group, adj_list_type)); - base_dir_ = prefix_ + dir_path; + auto pg_path_prefix, + edge_info.GetPropertyGroupPathPrefix(property_group, adj_list_type)); + base_dir_ = prefix_ + pg_path_prefix; GAR_ASSIGN_OR_RAISE_ERROR(vertex_chunk_num_, fs_->GetFileNumOfDir(base_dir_)); std::string chunk_dir = @@ -264,7 +272,7 @@ class AdjListPropertyChunkInfoReader { return Status::OK(); } - /// Return the current chunk file path of chunk position indicator. + /** Return the current chunk file path of chunk position indicator. */ Result GetChunk() noexcept { GAR_ASSIGN_OR_RAISE( auto chunk_file_path, @@ -273,9 +281,12 @@ class AdjListPropertyChunkInfoReader { return prefix_ + chunk_file_path; } - /// Sets chunk position indicator to next chunk. - /// if current chunk is the last chunk, will return Status::OutOfRange - /// error. + /** + * Sets chunk position indicator to next chunk. + * + * if current chunk is the last chunk, will return Status::OutOfRange + * error. + */ Status next_chunk() { if (++chunk_index_ >= chunk_num_) { ++vertex_chunk_index_; diff --git a/include/gar/utils/adj_list_type.h b/include/gar/utils/adj_list_type.h index 95bb32f36..c77801236 100644 --- a/include/gar/utils/adj_list_type.h +++ b/include/gar/utils/adj_list_type.h @@ -24,7 +24,7 @@ limitations under the License. namespace GAR_NAMESPACE_INTERNAL { -/// \brief Adj list type enumeration for adjacency list of graph +/** Adj list type enumeration for adjacency list of graph. */ enum class AdjListType : std::uint8_t { /// collection of edges by source, but unordered, can represent COO format unordered_by_source = 0b00000001, diff --git a/include/gar/utils/convert_to_arrow_type.h b/include/gar/utils/convert_to_arrow_type.h index b4e3a7f24..e4cba9a32 100644 --- a/include/gar/utils/convert_to_arrow_type.h +++ b/include/gar/utils/convert_to_arrow_type.h @@ -26,7 +26,7 @@ limitations under the License. namespace GAR_NAMESPACE_INTERNAL { -/// \brief Struct to convert DataType to arrow::DataType. +/** Struct to convert DataType to arrow::DataType. */ template struct ConvertToArrowType {}; diff --git a/include/gar/utils/data_type.h b/include/gar/utils/data_type.h index 9cf7cb987..d5626870b 100644 --- a/include/gar/utils/data_type.h +++ b/include/gar/utils/data_type.h @@ -30,35 +30,37 @@ class DataType; namespace GAR_NAMESPACE_INTERNAL { -/// \brief Main data type enumeration +/** @brief Main data type enumeration. */ enum class Type { - /// Boolean + /** Boolean */ BOOL = 0, - /// Signed 32-bit integer + /** Signed 32-bit integer */ INT32, - /// Signed 64-bit integer + /** Signed 64-bit integer */ INT64, - /// 4-byte floating point value + /** 4-byte floating point value */ FLOAT, - /// 8-byte floating point value + /** 8-byte floating point value */ DOUBLE, - /// UTF8 variable-length string + /** UTF8 variable-length string */ STRING, - /// User-defined data type + /** User-defined data type */ USER_DEFINED, // Leave this at the end MAX_ID, }; -/// \brief The DataType struct to provide enum type for data type and functions -/// to parse data type. +/** + * @brief The DataType struct to provide enum type for data type and functions + * to parse data type. + */ class DataType { public: DataType() : id_(Type::BOOL) {} @@ -101,7 +103,7 @@ class DataType { return DataType(str2type.at(str.c_str())); } - /// \brief Return the type category of the DataType. + /** Return the type category of the DataType. */ Type id() const { return id_; } std::string ToTypeName() const; diff --git a/include/gar/utils/file_type.h b/include/gar/utils/file_type.h index b2b0c2fdb..a37282efa 100644 --- a/include/gar/utils/file_type.h +++ b/include/gar/utils/file_type.h @@ -23,7 +23,7 @@ limitations under the License. namespace GAR_NAMESPACE_INTERNAL { -/// \brief Type of file format +/** Type of file format */ enum FileType { CSV = 0, PARQUET = 1, ORC = 2 }; static inline FileType StringToFileType(const std::string& str) { diff --git a/include/gar/utils/filesystem.h b/include/gar/utils/filesystem.h index 5a9d2a741..19fa3a583 100644 --- a/include/gar/utils/filesystem.h +++ b/include/gar/utils/filesystem.h @@ -38,50 +38,81 @@ class RandomAccessFile; namespace GAR_NAMESPACE_INTERNAL { -/// A wrapper of arrow::FileSystem to provide read/write arrow::Table -/// from/to file and other necessary file operations. +/** + * This class wraps an arrow::fs::FileSystem and provides methods for + * reading and writing arrow::Table objects from and to files, as well as + * performing other file system operations such as copying and counting files. + */ class FileSystem { public: - /// \brief Create a FileSystem instance. + /** + * @brief Create a FileSystem instance. + * @param arrow_fs The arrow::fs::FileSystem to wrap. + */ explicit FileSystem(std::shared_ptr arrow_fs) : arrow_fs_(arrow_fs) {} ~FileSystem() = default; - /// Read a file as an arrow::Table + /** + * @brief Read a file as an arrow::Table. + * + * @param path The path of the file to read. + * @param file_type The type of the file to read. + * @return A Result containing a std::shared_ptr to an arrow::Table if + * successful, or an error Status if unsuccessful. + */ Result> ReadFileToTable( const std::string& path, FileType file_type) const noexcept; - /// Read a file to value - /// if the file bytes can not be converted to value, return - /// Status::ArrowError + /** + * @brief Read a file and convert its bytes to a value of type T. + * + * @tparam T The type to convert the file bytes to. + * @param path The path of the file to read. + * @return A Result containing the value if successful, or an error Status if + * unsuccessful. + */ template Result ReadFileToValue(const std::string& path) const noexcept; - /// Write a value to a file + /** + * @brief Write a value of type T to a file. + * + * @tparam T The type of the value to be written. + * @param value The value to be written. + * @param path The path of the file to be written + * @return A Status indicating OK if successful, or an error if unsuccessful. + */ template Status WriteValueToFile(const T& value, const std::string& path) const noexcept; - /// \brief Write a table to a file with a specific type. - /// - /// \param input_table The table to write. - /// \param file_type The type of the output file. - /// \param path The path of the output file. + /** + * @brief Write a table to a file with a specific type. + * @param input_table The table to write. + * @param file_type The type of the output file. + * @param path The path of the output file. + * @return A Status indicating OK if successful, or an error if unsuccessful. + */ Status WriteTableToFile(const std::shared_ptr& table, FileType file_type, const std::string& path) const noexcept; - /// Copy a file. - /// - /// If the destination exists and is a directory, an Status::ArrowError is - /// returned. Otherwise, it is replaced. + /** + * Copy a file. + * + * If the destination exists and is a directory, an Status::ArrowError is + * returned. Otherwise, it is replaced. + */ Status CopyFile(const std::string& src_path, const std::string& dst_path) const noexcept; - /// Get the number of file of a directory. - /// - /// the file is not pure file, it can be a directory or other type of file. + /** + * Get the number of file of a directory. + * + * the file is not pure file, it can be a directory or other type of file. + */ Result GetFileNumOfDir(const std::string& dir_path, bool recursive = false) const noexcept; @@ -89,15 +120,17 @@ class FileSystem { std::shared_ptr arrow_fs_; }; -/// \brief Create a new FileSystem by URI -/// -/// wrapper of arrow::fs::FileSystemFromUri -/// -/// Recognized schemes are "file", "mock", "hdfs", "viewfs", "s3", -/// "gs" and "gcs". -/// -/// in addition also recognize non-URIs, and treat them as local filesystem -/// paths. Only absolute local filesystem paths are allowed. +/** + * @brief Create a new FileSystem by URI + * + * wrapper of arrow::fs::FileSystemFromUri + * + * Recognized schemes are "file", "mock", "hdfs", "viewfs", "s3", + * "gs" and "gcs". + * + * in addition also recognize non-URIs, and treat them as local filesystem + * paths. Only absolute local filesystem paths are allowed. + */ Result> FileSystemFromUriOrPath( const std::string& uri, std::string* out_path = nullptr); diff --git a/include/gar/utils/result.h b/include/gar/utils/result.h index 9c93f3b88..37efe0ff2 100644 --- a/include/gar/utils/result.h +++ b/include/gar/utils/result.h @@ -36,40 +36,44 @@ limitations under the License. #define GAR_ASSIGN_OR_RAISE_NAME(x, y) GAR_CONCAT(x, y) -/// \brief Execute an expression that returns a Result, extracting its value -/// into the variable defined by `lhs` (or returning a Status on error). -/// -/// Example: Assigning to a new value: -/// GAR_ASSIGN_OR_RAISE(auto value, MaybeGetValue(arg)); -/// -/// Example: Assigning to an existing value: -/// ValueType value; -/// GAR_ASSIGN_OR_RAISE(value, MaybeGetValue(arg)); -/// -/// WARNING: GAR_ASSIGN_OR_RAISE expands into multiple statements; -/// it cannot be used in a single statement (e.g. as the body of an if -/// statement without {})! -/// -/// WARNING: GAR_ASSIGN_OR_RAISE `std::move`s its right operand. If you have -/// an lvalue Result which you *don't* want to move out of cast appropriately. -/// -/// WARNING: GAR_ASSIGN_OR_RAISE is not a single expression; it will not -/// maintain lifetimes of all temporaries in `rexpr` (e.g. -/// `GAR_ASSIGN_OR_RAISE(auto x, MakeTemp().GetResultRef());` -/// will most likely segfault)! +/** + * @brief Execute an expression that returns a Result, extracting its value + * into the variable defined by `lhs` (or returning a Status on error). + * + * Example: Assigning to a new value: + * GAR_ASSIGN_OR_RAISE(auto value, MaybeGetValue(arg)); + * + * Example: Assigning to an existing value: + * ValueType value; + * GAR_ASSIGN_OR_RAISE(value, MaybeGetValue(arg)); + * + * WARNING: GAR_ASSIGN_OR_RAISE expands into multiple statements; + * it cannot be used in a single statement (e.g. as the body of an if + * statement without {})! + * + * WARNING: GAR_ASSIGN_OR_RAISE `std::move`s its right operand. If you have + * an lvalue Result which you *don't* want to move out of cast appropriately. + * + * WARNING: GAR_ASSIGN_OR_RAISE is not a single expression; it will not + * maintain lifetimes of all temporaries in `rexpr` (e.g. + * `GAR_ASSIGN_OR_RAISE(auto x, MakeTemp().GetResultRef());` + * will most likely segfault)! + */ #define GAR_ASSIGN_OR_RAISE(lhs, rexpr) \ GAR_ASSIGN_OR_RAISE_IMPL( \ GAR_ASSIGN_OR_RAISE_NAME(_error_or_value, __COUNTER__), lhs, rexpr); -/// \brief Execute an expression that returns a Result, extracting its value -/// into the variable defined by `lhs` (or throw an runtime error). -/// -/// Example: Assigning to a new value: -/// GAR_ASSIGN_OR_RAISE_ERROR(auto value, MaybeGetValue(arg)); -/// -/// Example: Assigning to an existing value: -/// ValueType value; -/// GAR_ASSIGN_OR_RAISE_ERROR(value, MaybeGetValue(arg)); +/** + * @brief Execute an expression that returns a Result, extracting its value + * into the variable defined by `lhs` (or throw an runtime error). + * + * Example: Assigning to a new value: + * GAR_ASSIGN_OR_RAISE_ERROR(auto value, MaybeGetValue(arg)); + * + * Example: Assigning to an existing value: + * ValueType value; + * GAR_ASSIGN_OR_RAISE_ERROR(value, MaybeGetValue(arg)); + */ #define GAR_ASSIGN_OR_RAISE_ERROR(lhs, rexpr) \ GAR_ASSIGN_OR_RAISE_ERROR_IMPL( \ GAR_ASSIGN_OR_RAISE_NAME(_error_or_value, __COUNTER__), lhs, rexpr); @@ -82,47 +86,50 @@ limitations under the License. } \ lhs = std::move(result_name).ValueOrDie(); -/// \brief Execute an expression that returns a Arrow Result, extracting its -/// value into the variable defined by `lhs` (or returning a Status on error). -/// -/// Example: Assigning to a new value: -/// GAR_ASSIGN_OR_RAISE(auto value, MaybeGetValue(arg)); -/// -/// Example: Assigning to an existing value: -/// ValueType value; -/// GAR_ASSIGN_OR_RAISE(value, MaybeGetValue(arg)); -/// +/** + * @brief Execute an expression that returns a Arrow Result, extracting its + * value into the variable defined by `lhs` (or returning a Status on error). + * + * Example: Assigning to a new value: + * GAR_ASSIGN_OR_RAISE(auto value, MaybeGetValue(arg)); + * + * Example: Assigning to an existing value: + * ValueType value; + * GAR_ASSIGN_OR_RAISE(value, MaybeGetValue(arg)); + */ #define GAR_RETURN_ON_ARROW_ERROR_AND_ASSIGN(lhs, rexpr) \ GAR_RETURN_ON_ARROW_ERROR_AND_ASSIGN_IMPL( \ GAR_ASSIGN_OR_RAISE_NAME(_error_or_value, __COUNTER__), lhs, rexpr); namespace GAR_NAMESPACE_INTERNAL { -/// A class for representing either a usable value, or an error. -/// -/// A Result object either contains a value of type `T` or a Status object -/// explaining why such a value is not present. The type `T` must be -/// copy-constructible and/or move-constructible. -/// -/// The state of a Result object may be determined by calling has_error() or -/// status(). The has_error() method returns false if the object contains a -/// valid value. The status() method returns the internal Status object. A -/// Result object that contains a valid value will return an OK Status for a -/// call to status(). -/// -/// A value of type `T` may be extracted from a Result object through a call -/// to value(). This function should only be called if a call to has_error() -/// returns false. Sample usage: -/// -/// ``` -/// gar::Result result = CalculateFoo(); -/// if (!result.has_error()) { -/// Foo foo = result.value(); -/// foo.DoSomethingCool(); -/// } else { -/// std::err << result.status(); -/// } -/// ``` +/** + * @A class for representing either a usable value, or an error. + * + * A Result object either contains a value of type `T` or a Status object + * explaining why such a value is not present. The type `T` must be + * copy-constructible and/or move-constructible. + * + * The state of a Result object may be determined by calling has_error() or + * status(). The has_error() method returns false if the object contains a + * valid value. The status() method returns the internal Status object. A + * Result object that contains a valid value will return an OK Status for a + * call to status(). + * + * A value of type `T` may be extracted from a Result object through a call + * to value(). This function should only be called if a call to has_error() + * returns false. Sample usage: + * + * ``` + * gar::Result result = CalculateFoo(); + * if (!result.has_error()) { + * Foo foo = result.value(); + * foo.DoSomethingCool(); + * } else { + * std::err << result.status(); + * } + * ``` + */ template using Result = cpp::result; diff --git a/include/gar/utils/status.h b/include/gar/utils/status.h index adaf832e6..a324dedba 100644 --- a/include/gar/utils/status.h +++ b/include/gar/utils/status.h @@ -28,7 +28,7 @@ limitations under the License. } \ } while (0) -/// \brief Propagate any non-successful Status to the caller +/** @brief Propagate any non-successful Status to the caller. */ #define GAR_RETURN_NOT_OK(status) \ do { \ ::GAR_NAMESPACE_INTERNAL::Status __s = \ @@ -36,7 +36,7 @@ limitations under the License. GAR_RETURN_IF_(!__s.ok(), __s, GAR_STRINGIFY(status)); \ } while (false) -/// \brief Propagate any non-successful Arrow Status to the caller +/** @brief Propagate any non-successful Arrow Status to the caller. */ #define RETURN_NOT_ARROW_OK(status) \ do { \ if (GAR_PREDICT_FALSE(!status.ok())) { \ @@ -51,7 +51,7 @@ limitations under the License. } \ } while (0) -/// \brief Throw runtime error if Status not OK +/** @brief Throw runtime error if Status not OK. */ #define GAR_RAISE_ERROR_NOT_OK(status) \ do { \ ::GAR_NAMESPACE_INTERNAL::Status __s = \ @@ -61,6 +61,9 @@ limitations under the License. namespace GAR_NAMESPACE_INTERNAL { +/** + * An enum class representing the status codes for success or error outcomes. + */ enum class StatusCode : unsigned char { kOK = 0, kKeyError, @@ -79,36 +82,42 @@ enum class StatusCode : unsigned char { kUnknownError, }; -/// \brief Status outcome object (success or error) -/// -/// The Status object is an object holding the outcome of an operation. -/// The outcome is represented as a StatusCode, either success -/// (StatusCode::OK) or an error (any other of the StatusCode enumeration -/// values). -/// -/// Additionally, if an error occurred, a specific error message is generally -/// attached. +/** @brief Status outcome object (success or error) + * + * The Status object is an object holding the outcome of an operation. + * The outcome is represented as a StatusCode, either success + * (StatusCode::OK) or an error (any other of the StatusCode enumeration + * values). + * + * Additionally, if an error occurred, a specific error message is generally + * attached. + */ class Status { public: - // Create a success status. + /** Create a success status. */ Status() noexcept : state_(nullptr) {} + /** Destructor. */ ~Status() noexcept { if (state_ != nullptr) { deleteState(); } } - /// Create a status with the specified error code and message. + /** + * @brief Constructs a status with the specified error code and message. + * @param code The error code of the status. + * @param msg The error message of the status. + */ Status(StatusCode code, const std::string& msg) { state_ = new State; state_->code = code; state_->msg = msg; } - /// Copy the specified status. + /** Copy the specified status. */ inline Status(const Status& s) : state_((s.state_ == nullptr) ? nullptr : new State(*s.state_)) {} - /// Move the specified status. + /** Move the specified status. */ inline Status(Status&& s) noexcept : state_(s.state_) { s.state_ = nullptr; } - /// Move assignment operator. + /** Move assignment operator. */ inline Status& operator=(Status&& s) noexcept { delete state_; state_ = s.state_; @@ -116,26 +125,28 @@ class Status { return *this; } - /// Return a success status + /** Returns a success status. */ inline static Status OK() { return Status(); } - /// Return an error status when some IO-related operation failed + /** Returns an error status when some IO-related operation failed. */ static Status IOError(const std::string& msg = "") { return Status(StatusCode::kIOError, msg); } - /// Return an error status for failed key lookups + /** Returns an error status for failed key lookups. */ static Status KeyError(const std::string& msg = "") { return Status(StatusCode::kKeyError, msg); } - /// Return an error status for failed type matches + /** Returns an error status for failed type matches. */ static Status TypeError(const std::string& msg = "") { return Status(StatusCode::kTypeError, msg); } - /// Return an error status for invalid data (for example a string that fails - /// parsing) + /** + * Returns an error status for invalid data (for example a string that fails + * parsing). + */ static Status Invalid(const std::string& msg = "") { return Status(StatusCode::kInvalid, msg); } @@ -152,8 +163,10 @@ class Status { return Status(StatusCode::kInvalidOperation, msg); } - /// Return an error status for value is out of range (for example next_chunk - /// is out of range) + /** + * Return an error status for value is out of range (for example next_chunk + * is out of range) + */ static Status OutOfRange(const std::string& msg = "") { return Status(StatusCode::kOutOfRange, msg); } @@ -162,29 +175,29 @@ class Status { return Status(StatusCode::kEndOfChunk, msg); } - /// Return an error status when some yaml-cpp related operation failed + /** Return an error status when some yaml-cpp related operation failed. */ static Status YamlError(const std::string& msg = "") { return Status(StatusCode::kYamlError, msg); } - /// Return an error status when some arrow-related operation failed + /** Return an error status when some arrow-related operation failed. */ static Status ArrowError(const std::string& msg = "") { return Status(StatusCode::kArrowError, msg); } - /// Return an error status for unknown errors + /** Return an error status for unknown errors. */ static Status UnknownError(const std::string& msg = "") { return Status(StatusCode::kArrowError, msg); } - /// Return true iff the status indicates success. + /** Return true iff the status indicates success. */ bool ok() const { return (state_ == nullptr); } - /// Return true iff the status indicates a key lookup error. + /** Return true iff the status indicates a key lookup error. */ bool IsKeyError() const { return code() == StatusCode::kKeyError; } - /// Return true iff the status indicates a type match error. + /** Return true iff the status indicates a type match error. */ bool IsTypeError() const { return code() == StatusCode::kTypeError; } - /// Return true iff the status indicates invalid data. + /** Return true iff the status indicates invalid data. */ bool IsInvalid() const { return code() == StatusCode::kInvalid; } bool IsInvalidValue() const { return code() == StatusCode::kInvalidValue; } bool IsInvalidArgument() const { @@ -195,15 +208,15 @@ class Status { } bool IsOutOfRange() const { return code() == StatusCode::kOutOfRange; } bool IsEndOfChunk() const { return code() == StatusCode::kEndOfChunk; } - /// Return true iff the status indicates an yaml-cpp related failure. + /** Return true iff the status indicates an yaml-cpp related failure. */ bool IsYamlError() const { return code() == StatusCode::kYamlError; } - /// Return true iff the status indicates an arrow-related failure. + /** Return true iff the status indicates an arrow-related failure. */ bool IsArrowError() const { return code() == StatusCode::kArrowError; } - /// Return the StatusCode value attached to this status. + /** Return the StatusCode value attached to this status. */ StatusCode code() const { return ok() ? StatusCode::kOK : state_->code; } - /// Return the specific error message attached to this status. + /** Return the specific error message attached to this status. */ std::string message() const { return ok() ? "" : state_->msg; } private: diff --git a/include/gar/utils/utils.h b/include/gar/utils/utils.h index 6f2a0b57b..096eac37d 100644 --- a/include/gar/utils/utils.h +++ b/include/gar/utils/utils.h @@ -35,7 +35,7 @@ class Array; namespace GAR_NAMESPACE_INTERNAL { -/// \brief Type of vertex id or vertex index. +/** Type of vertex id or vertex index. */ using IdType = int64_t; namespace util { diff --git a/include/gar/utils/version_parser.h b/include/gar/utils/version_parser.h index 6e26616db..1e01c9354 100644 --- a/include/gar/utils/version_parser.h +++ b/include/gar/utils/version_parser.h @@ -25,22 +25,22 @@ limitations under the License. namespace GAR_NAMESPACE_INTERNAL { -/// \brief InfoVersion is a class provide version information of info. +/** InfoVersion is a class provide version information of info. */ class InfoVersion { public: - /// \brief Parse version string to InfoVersion. + /** Parse version string to InfoVersion. */ static Result Parse(const std::string& str) noexcept; - /// Default constructor + /** Default constructor */ InfoVersion() : version_(version2types.rbegin()->first) {} - /// Constructor with version + /** Constructor with version */ explicit InfoVersion(int version) : version_(version) { if (version2types.find(version) == version2types.end()) { throw std::invalid_argument("Unsupported version: " + std::to_string(version)); } } - /// Constructor with version and user defined types + /** Constructor with version and user defined types. */ explicit InfoVersion(int version, const std::vector& user_define_types) : version_(version), user_define_types_(user_define_types) { @@ -49,26 +49,26 @@ class InfoVersion { std::to_string(version)); } } - /// Copy constructor + /** Copy constructor */ InfoVersion(const InfoVersion& other) = default; - /// Copy assignment + /** Copy assignment */ inline InfoVersion& operator=(const InfoVersion& other) = default; - /// Check if two InfoVersion are equal + /** Check if two InfoVersion are equal */ bool operator==(const InfoVersion& other) const { return version_ == other.version_ && user_define_types_ == other.user_define_types_; } - /// Get version + /** Get version */ int version() const { return version_; } - /// Get user defined types + /** Get user defined types */ const std::vector& user_define_types() const { return user_define_types_; } - /// Dump version to string + /** Dump version to string. */ std::string ToString() const { std::string str = "gar/v" + std::to_string(version_); if (!user_define_types_.empty()) { @@ -81,7 +81,7 @@ class InfoVersion { return str; } - /// Check if type is supported by version + /** Check if type is supported by version. */ inline bool CheckType(const std::string& type_str) noexcept { auto& types = version2types.at(version_); // check if type_str is in supported types of version diff --git a/include/gar/utils/yaml.h b/include/gar/utils/yaml.h index d67590fa8..8b18b6ae1 100644 --- a/include/gar/utils/yaml.h +++ b/include/gar/utils/yaml.h @@ -29,7 +29,7 @@ class Node; namespace GAR_NAMESPACE_INTERNAL { -/// A wrapper of YAML::Node to provide functions to parse yaml. +/** A wrapper of YAML::Node to provide functions to parse yaml. */ class Yaml { public: explicit Yaml(std::shared_ptr root_node) @@ -39,24 +39,32 @@ class Yaml { const YAML::Node operator[](const std::string& key) const; - /// Loads the input string as Yaml instance. - /// - /// Return Status::YamlError if input string can not be loaded(malformed). + /** + * Loads the input string as Yaml instance. + * + * Return Status::YamlError if input string can not be loaded(malformed). + */ static Result> Load(const std::string& input); - /// Loads the input string as Yaml instance. - /// - /// Return Status::YamlError if input string can not be loaded(malformed). + /** + * Loads the input string as Yaml instance. + * + * Return Status::YamlError if input string can not be loaded(malformed). + */ static Result> Load(const char* input); - /// Loads the input stream as Yaml instance. - /// - /// Return Status::YamlError if input string can not be loaded(malformed). + /** + * Loads the input stream as Yaml instance. + * + * Return Status::YamlError if input string can not be loaded(malformed). + */ static Result> Load(std::istream& input); - /// Loads the input file as a single Yaml instance. - /// - /// Return Status::YamlError if the file can not be loaded(malformed). + /** + * Loads the input file as a single Yaml instance. + * + * Return Status::YamlError if the file can not be loaded(malformed). + */ static Result> LoadFile(const std::string& file_name); private: diff --git a/spark/src/main/scala/com/alibaba/graphar/EdgeInfo.scala b/spark/src/main/scala/com/alibaba/graphar/EdgeInfo.scala index ea507a198..20c485302 100644 --- a/spark/src/main/scala/com/alibaba/graphar/EdgeInfo.scala +++ b/spark/src/main/scala/com/alibaba/graphar/EdgeInfo.scala @@ -296,13 +296,13 @@ class EdgeInfo() { return str } - /** Get the adj list offset chunk file directory path of adj list type. - * + /** Get the path prefix of the adjacency list offset for the given + * adjacency list type. * @param adj_list_type type of adj list structure. - * @return the offset directory. If edge info not support the adj list type, + * @return the path prefix of the offset. If edge info not support the adj list type, * raise an IllegalArgumentException error. */ - def getAdjListOffsetDirPath(adj_list_type: AdjListType.Value) : String = { + def getOffsetPathPrefix(adj_list_type: AdjListType.Value) : String = { if (containAdjList(adj_list_type) == false) throw new IllegalArgumentException return prefix + getAdjListPrefix(adj_list_type) + "offset/" @@ -321,24 +321,24 @@ class EdgeInfo() { return str } - /** Get the path of adj list topology chunk of certain vertex chunk. + /** Get the path prefix of adj list topology chunk of certain vertex chunk. * * @param vertex_chunk_index index of vertex chunk. * @param adj_list_type type of adj list structure. * @return path prefix of the edge chunk of vertices of given vertex chunk. */ - def getAdjListFilePath(vertex_chunk_index: Long, adj_list_type: AdjListType.Value) : String = { + def getAdjListPathPrefix(vertex_chunk_index: Long, adj_list_type: AdjListType.Value) : String = { var str: String = prefix + getAdjListPrefix(adj_list_type) + "adj_list/part" + vertex_chunk_index.toString() + "/" return str } - /** Get the adj list topology chunk file directory path of adj list type. - * + /** Get the path prefix of the adjacency list topology chunk for the given + * adjacency list type. * @param adj_list_type type of adj list structure. - * @return directory path of adj list type. + * @return path prfix of of the adjacency list topology. */ - def getAdjListDirPath(adj_list_type: AdjListType.Value) : String = { + def getAdjListPathPrefix(adj_list_type: AdjListType.Value) : String = { return prefix + getAdjListPrefix(adj_list_type) + "adj_list/" } @@ -372,7 +372,7 @@ class EdgeInfo() { return str } - /** Get path of adj list property group of certain vertex chunk. + /** Get path prefix of adj list property group of certain vertex chunk. * * @param property_group property group. * @param adj_list_type type of adj list structure. @@ -381,7 +381,7 @@ class EdgeInfo() { * If edge info not contains the property group, * raise an IllegalArgumentException error. */ - def getPropertyFilePath(property_group: PropertyGroup, adj_list_type: AdjListType.Value, vertex_chunk_index: Long) : String = { + def getPropertyGroupPathPrefix(property_group: PropertyGroup, adj_list_type: AdjListType.Value, vertex_chunk_index: Long) : String = { if (containPropertyGroup(property_group, adj_list_type) == false) throw new IllegalArgumentException var str: String = property_group.getPrefix @@ -400,14 +400,14 @@ class EdgeInfo() { return str } - /** Get the property group chunk file directory path of adj list type. - * + /** Get the path prefix of the property group chunk for the given + * adjacency list type * @param property_group property group. * @param adj_list_type type of adj list structure. - * @return directory path of property group chunks. If edge info not contains the property group, + * @return path prefix of property group chunks. If edge info not contains the property group, * raise an IllegalArgumentException error. */ - def getPropertyDirPath(property_group: PropertyGroup, adj_list_type: AdjListType.Value) : String = { + def getPropertyGroupPathPrefix(property_group: PropertyGroup, adj_list_type: AdjListType.Value) : String = { if (containPropertyGroup(property_group, adj_list_type) == false) throw new IllegalArgumentException var str: String = property_group.getPrefix diff --git a/spark/src/main/scala/com/alibaba/graphar/VertexInfo.scala b/spark/src/main/scala/com/alibaba/graphar/VertexInfo.scala index ecf9d80b1..e86208fc4 100644 --- a/spark/src/main/scala/com/alibaba/graphar/VertexInfo.scala +++ b/spark/src/main/scala/com/alibaba/graphar/VertexInfo.scala @@ -193,12 +193,12 @@ class VertexInfo() { return prefix + str + "chunk" + chunk_index.toString() } - /** Get the chunk files directory path of property group. + /** Get the path prefix for the specified property group. * * @param property_group the property group. - * @return the dirctory path that store the chunk files of property group. + * @return the path prefix of the property group chunk files. */ - def getDirPath(property_group: PropertyGroup): String = { + def getPathPrefix(property_group: PropertyGroup): String = { if (containPropertyGroup(property_group) == false) throw new IllegalArgumentException var str: String = "" diff --git a/spark/src/main/scala/com/alibaba/graphar/reader/EdgeReader.scala b/spark/src/main/scala/com/alibaba/graphar/reader/EdgeReader.scala index 0bb56db20..24c3890bd 100644 --- a/spark/src/main/scala/com/alibaba/graphar/reader/EdgeReader.scala +++ b/spark/src/main/scala/com/alibaba/graphar/reader/EdgeReader.scala @@ -75,9 +75,9 @@ class EdgeReader(prefix: String, edgeInfo: EdgeInfo, adjListType: AdjListType.V * @return DataFrame of all AdjList chunks of vertices in given vertex chunk. */ def readAdjListForVertexChunk(vertex_chunk_index: Long, addIndex: Boolean = false): DataFrame = { - val file_path = prefix + "/" + edgeInfo.getAdjListFilePath(vertex_chunk_index, adjListType) - val file_system = FileSystem.get(new Path(file_path).toUri(), spark.sparkContext.hadoopConfiguration) - val path_pattern = new Path(file_path + "chunk*") + val part_prefix = prefix + "/" + edgeInfo.getAdjListPathPrefix(vertex_chunk_index, adjListType) + val file_system = FileSystem.get(new Path(part_prefix).toUri(), spark.sparkContext.hadoopConfiguration) + val path_pattern = new Path(part_prefix + "chunk*") val chunk_number = file_system.globStatus(path_pattern).length var df = spark.emptyDataFrame for ( i <- 0 to chunk_number - 1) { @@ -98,7 +98,7 @@ class EdgeReader(prefix: String, edgeInfo: EdgeInfo, adjListType: AdjListType.V * @return DataFrame of all AdjList chunks. */ def readAllAdjList(addIndex: Boolean = false): DataFrame = { - val file_path = prefix + "/" + edgeInfo.getAdjListDirPath(adjListType) + val file_path = prefix + "/" + edgeInfo.getAdjListPathPrefix(adjListType) val file_system = FileSystem.get(new Path(file_path).toUri(), spark.sparkContext.hadoopConfiguration) val path_pattern = new Path(file_path + "part*") val vertex_chunk_number = file_system.globStatus(path_pattern).length @@ -143,9 +143,9 @@ class EdgeReader(prefix: String, edgeInfo: EdgeInfo, adjListType: AdjListType.V def readEdgePropertiesForVertexChunk(propertyGroup: PropertyGroup, vertex_chunk_index: Long, addIndex: Boolean = false): DataFrame = { if (edgeInfo.containPropertyGroup(propertyGroup, adjListType) == false) throw new IllegalArgumentException - val file_path = prefix + "/" + edgeInfo.getPropertyFilePath(propertyGroup, adjListType, vertex_chunk_index) - val file_system = FileSystem.get(new Path(file_path).toUri(), spark.sparkContext.hadoopConfiguration) - val path_pattern = new Path(file_path + "chunk*") + val path_prefix = prefix + "/" + edgeInfo.getPropertyGroupPathPrefix(propertyGroup, adjListType, vertex_chunk_index) + val file_system = FileSystem.get(new Path(path_prefix).toUri(), spark.sparkContext.hadoopConfiguration) + val path_pattern = new Path(path_prefix + "chunk*") val chunk_number = file_system.globStatus(path_pattern).length var df = spark.emptyDataFrame for ( i <- 0 to chunk_number - 1) { @@ -170,9 +170,9 @@ class EdgeReader(prefix: String, edgeInfo: EdgeInfo, adjListType: AdjListType.V def readEdgeProperties(propertyGroup: PropertyGroup, addIndex: Boolean = false): DataFrame = { if (edgeInfo.containPropertyGroup(propertyGroup, adjListType) == false) throw new IllegalArgumentException - val file_path = prefix + "/" + edgeInfo.getPropertyDirPath(propertyGroup, adjListType) - val file_system = FileSystem.get(new Path(file_path).toUri(), spark.sparkContext.hadoopConfiguration) - val path_pattern = new Path(file_path + "part*") + val property_group_prefix = prefix + "/" + edgeInfo.getPropertyGroupPathPrefix(propertyGroup, adjListType) + val file_system = FileSystem.get(new Path(property_group_prefix).toUri(), spark.sparkContext.hadoopConfiguration) + val path_pattern = new Path(property_group_prefix + "part*") val vertex_chunk_number = file_system.globStatus(path_pattern).length var df = spark.emptyDataFrame for ( i <- 0 to vertex_chunk_number - 1) { diff --git a/spark/src/main/scala/com/alibaba/graphar/writer/EdgeWriter.scala b/spark/src/main/scala/com/alibaba/graphar/writer/EdgeWriter.scala index 2cc928751..45e83024c 100644 --- a/spark/src/main/scala/com/alibaba/graphar/writer/EdgeWriter.scala +++ b/spark/src/main/scala/com/alibaba/graphar/writer/EdgeWriter.scala @@ -156,7 +156,7 @@ class EdgeWriter(prefix: String, edgeInfo: EdgeInfo, adjListType: AdjListType.V val offset_schema = StructType(Seq(StructField(GeneralParams.offsetCol, LongType))) val vertex_chunk_size = if (adjListType == AdjListType.ordered_by_source) edgeInfo.getSrc_chunk_size() else edgeInfo.getDst_chunk_size() val index_column = if (adjListType == AdjListType.ordered_by_source) GeneralParams.srcIndexCol else GeneralParams.dstIndexCol - val output_prefix = prefix + edgeInfo.getAdjListOffsetDirPath(adjListType) + val output_prefix = prefix + edgeInfo.getOffsetPathPrefix(adjListType) for (chunk <- chunks) { val edge_count_df = chunk.select(index_column).groupBy(index_column).count() // init a edge count dataframe of vertex range [begin, end] to include isloated vertex @@ -187,7 +187,7 @@ class EdgeWriter(prefix: String, edgeInfo: EdgeInfo, adjListType: AdjListType.V val file_type = edgeInfo.getAdjListFileType(adjListType) var chunk_index: Long = 0 for (chunk <- chunks) { - val output_prefix = prefix + edgeInfo.getAdjListFilePath(chunk_index, adjListType) + val output_prefix = prefix + edgeInfo.getAdjListPathPrefix(chunk_index, adjListType) val adj_list_chunk = chunk.select(GeneralParams.srcIndexCol, GeneralParams.dstIndexCol) FileSystem.writeDataFrame(adj_list_chunk, FileType.FileTypeToString(file_type), output_prefix) chunk_index = chunk_index + 1 @@ -215,7 +215,7 @@ class EdgeWriter(prefix: String, edgeInfo: EdgeInfo, adjListType: AdjListType.V } var chunk_index: Long = 0 for (chunk <- chunks) { - val output_prefix = prefix + edgeInfo.getPropertyFilePath(propertyGroup, adjListType, chunk_index) + val output_prefix = prefix + edgeInfo.getPropertyGroupPathPrefix(propertyGroup, adjListType, chunk_index) val property_group_chunk = chunk.select(property_list.map(col): _*) FileSystem.writeDataFrame(property_group_chunk, propertyGroup.getFile_type(), output_prefix) chunk_index = chunk_index + 1 diff --git a/spark/src/main/scala/com/alibaba/graphar/writer/VertexWriter.scala b/spark/src/main/scala/com/alibaba/graphar/writer/VertexWriter.scala index 4fb7b8670..bb7e029e6 100644 --- a/spark/src/main/scala/com/alibaba/graphar/writer/VertexWriter.scala +++ b/spark/src/main/scala/com/alibaba/graphar/writer/VertexWriter.scala @@ -77,7 +77,7 @@ class VertexWriter(prefix: String, vertexInfo: VertexInfo, vertexDf: DataFrame) } // write out the chunks - val output_prefix = prefix + vertexInfo.getDirPath(propertyGroup) + val output_prefix = prefix + vertexInfo.getPathPrefix(propertyGroup) val property_list = ArrayBuffer[String]() val it = propertyGroup.getProperties().iterator while (it.hasNext()) { diff --git a/spark/src/test/scala/com/alibaba/graphar/TestGraphInfo.scala b/spark/src/test/scala/com/alibaba/graphar/TestGraphInfo.scala index 39f125d9c..029a81669 100644 --- a/spark/src/test/scala/com/alibaba/graphar/TestGraphInfo.scala +++ b/spark/src/test/scala/com/alibaba/graphar/TestGraphInfo.scala @@ -78,7 +78,7 @@ class GraphInfoSuite extends AnyFunSuite { assert(vertex_info.isPrimaryKey("id")) assert(vertex_info.getFilePath(property_group, 0) == "vertex/person/id/chunk0") assert(vertex_info.getFilePath(property_group, 4) == "vertex/person/id/chunk4") - assert(vertex_info.getDirPath(property_group) == "vertex/person/id/") + assert(vertex_info.getPathPrefix(property_group) == "vertex/person/id/") assert(vertex_info.containProperty("firstName")) val property_group_2 = vertex_info.getPropertyGroup("firstName") @@ -90,7 +90,7 @@ class GraphInfoSuite extends AnyFunSuite { assert(vertex_info.isPrimaryKey("firstName") == false) assert(vertex_info.getFilePath(property_group_2, 0) == "vertex/person/firstName_lastName_gender/chunk0") assert(vertex_info.getFilePath(property_group_2, 4) == "vertex/person/firstName_lastName_gender/chunk4") - assert(vertex_info.getDirPath(property_group_2) == "vertex/person/firstName_lastName_gender/") + assert(vertex_info.getPathPrefix(property_group_2) == "vertex/person/firstName_lastName_gender/") assert(vertex_info.containProperty("not_exist") == false) assertThrows[IllegalArgumentException](vertex_info.getPropertyGroup("not_exist")) @@ -123,13 +123,13 @@ class GraphInfoSuite extends AnyFunSuite { assert(edge_info.getAdjListFileType(AdjListType.ordered_by_source) == FileType.CSV) assert(edge_info.getPropertyGroups(AdjListType.ordered_by_source).size == 1) assert(edge_info.getAdjListFilePath(0, 0, AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/adj_list/part0/chunk0") - assert(edge_info.getAdjListFilePath(0, AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/adj_list/part0/") + assert(edge_info.getAdjListPathPrefix(0, AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/adj_list/part0/") assert(edge_info.getAdjListFilePath(1, 2, AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/adj_list/part1/chunk2") - assert(edge_info.getAdjListFilePath(1, AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/adj_list/part1/") - assert(edge_info.getAdjListDirPath(AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/adj_list/") + assert(edge_info.getAdjListPathPrefix(1, AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/adj_list/part1/") + assert(edge_info.getAdjListPathPrefix(AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/adj_list/") assert(edge_info.getAdjListOffsetFilePath(0, AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/offset/chunk0") assert(edge_info.getAdjListOffsetFilePath(4, AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/offset/chunk4") - assert(edge_info.getAdjListOffsetDirPath(AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/offset/") + assert(edge_info.getOffsetPathPrefix(AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/offset/") val property_group = edge_info.getPropertyGroups(AdjListType.ordered_by_source).get(0) assert(edge_info.containPropertyGroup(property_group, AdjListType.ordered_by_source)) val property = property_group.getProperties.get(0) @@ -140,20 +140,20 @@ class GraphInfoSuite extends AnyFunSuite { assert(edge_info.isPrimaryKey(property_name) == property.getIs_primary) assert(edge_info.getPropertyFilePath(property_group, AdjListType.ordered_by_source, 0, 0) == "edge/person_knows_person/ordered_by_source/creationDate/part0/chunk0") assert(edge_info.getPropertyFilePath(property_group, AdjListType.ordered_by_source, 1, 2) == "edge/person_knows_person/ordered_by_source/creationDate/part1/chunk2") - assert(edge_info.getPropertyDirPath(property_group, AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/creationDate/") + assert(edge_info.getPropertyGroupPathPrefix(property_group, AdjListType.ordered_by_source) == "edge/person_knows_person/ordered_by_source/creationDate/") assert(edge_info.containAdjList(AdjListType.ordered_by_dest)) assert(edge_info.getAdjListPrefix(AdjListType.ordered_by_dest) == "ordered_by_dest/") assert(edge_info.getAdjListFileType(AdjListType.ordered_by_dest) == FileType.CSV) assert(edge_info.getPropertyGroups(AdjListType.ordered_by_dest).size == 1) assert(edge_info.getAdjListFilePath(0, 0, AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/adj_list/part0/chunk0") - assert(edge_info.getAdjListFilePath(0, AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/adj_list/part0/") + assert(edge_info.getAdjListPathPrefix(0, AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/adj_list/part0/") assert(edge_info.getAdjListFilePath(1, 2, AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/adj_list/part1/chunk2") - assert(edge_info.getAdjListFilePath(1, AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/adj_list/part1/") - assert(edge_info.getAdjListDirPath(AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/adj_list/") + assert(edge_info.getAdjListPathPrefix(1, AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/adj_list/part1/") + assert(edge_info.getAdjListPathPrefix(AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/adj_list/") assert(edge_info.getAdjListOffsetFilePath(0, AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/offset/chunk0") assert(edge_info.getAdjListOffsetFilePath(4, AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/offset/chunk4") - assert(edge_info.getAdjListOffsetDirPath(AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/offset/") + assert(edge_info.getOffsetPathPrefix(AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/offset/") val property_group_2 = edge_info.getPropertyGroups(AdjListType.ordered_by_dest).get(0) assert(edge_info.containPropertyGroup(property_group_2, AdjListType.ordered_by_dest)) val property_2 = property_group_2.getProperties.get(0) @@ -164,7 +164,7 @@ class GraphInfoSuite extends AnyFunSuite { assert(edge_info.isPrimaryKey(property_name_2) == property_2.getIs_primary) assert(edge_info.getPropertyFilePath(property_group_2, AdjListType.ordered_by_dest, 0, 0) == "edge/person_knows_person/ordered_by_dest/creationDate/part0/chunk0") assert(edge_info.getPropertyFilePath(property_group_2, AdjListType.ordered_by_dest, 1, 2) == "edge/person_knows_person/ordered_by_dest/creationDate/part1/chunk2") - assert(edge_info.getPropertyDirPath(property_group_2, AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/creationDate/") + assert(edge_info.getPropertyGroupPathPrefix(property_group_2, AdjListType.ordered_by_dest) == "edge/person_knows_person/ordered_by_dest/creationDate/") assert(edge_info.containAdjList(AdjListType.unordered_by_source) == false) assertThrows[IllegalArgumentException](edge_info.getAdjListPrefix(AdjListType.unordered_by_source)) diff --git a/spark/src/test/scala/com/alibaba/graphar/TestWriter.scala b/spark/src/test/scala/com/alibaba/graphar/TestWriter.scala index 490c78a21..ac8485ffd 100644 --- a/spark/src/test/scala/com/alibaba/graphar/TestWriter.scala +++ b/spark/src/test/scala/com/alibaba/graphar/TestWriter.scala @@ -54,7 +54,7 @@ class WriterSuite extends AnyFunSuite { // write certain property group val property_group = vertex_info.getPropertyGroup("id") writer.writeVertexProperties(property_group) - val id_chunk_path = new Path(prefix + vertex_info.getDirPath(property_group) + "chunk*") + val id_chunk_path = new Path(prefix + vertex_info.getPathPrefix(property_group) + "chunk*") val id_chunk_files = fs.globStatus(id_chunk_path) assert(id_chunk_files.length == 10) writer.writeVertexProperties() @@ -93,17 +93,17 @@ class WriterSuite extends AnyFunSuite { // test write adj list writer.writeAdjList() - val adj_list_path_pattern = new Path(prefix + edge_info.getAdjListDirPath(adj_list_type) + "*/*") + val adj_list_path_pattern = new Path(prefix + edge_info.getAdjListPathPrefix(adj_list_type) + "*/*") val adj_list_chunk_files = fs.globStatus(adj_list_path_pattern) assert(adj_list_chunk_files.length == 9) - val offset_path_pattern = new Path(prefix + edge_info.getAdjListOffsetDirPath(adj_list_type) + "*") + val offset_path_pattern = new Path(prefix + edge_info.getOffsetPathPrefix(adj_list_type) + "*") val offset_chunk_files = fs.globStatus(offset_path_pattern) assert(offset_chunk_files.length == 7) // test write property group val property_group = edge_info.getPropertyGroup("creationDate", adj_list_type) writer.writeEdgeProperties(property_group) - val property_group_path_pattern = new Path(prefix + edge_info.getPropertyDirPath(property_group, adj_list_type) + "*/*") + val property_group_path_pattern = new Path(prefix + edge_info.getPropertyGroupPathPrefix(property_group, adj_list_type) + "*/*") val property_group_chunk_files = fs.globStatus(property_group_path_pattern) assert(property_group_chunk_files.length == 9) @@ -159,10 +159,10 @@ class WriterSuite extends AnyFunSuite { // test write adj list writer.writeAdjList() - val adj_list_path_pattern = new Path(prefix + edge_info.getAdjListDirPath(adj_list_type) + "*/*") + val adj_list_path_pattern = new Path(prefix + edge_info.getAdjListPathPrefix(adj_list_type) + "*/*") val adj_list_chunk_files = fs.globStatus(adj_list_path_pattern) assert(adj_list_chunk_files.length == 11) - val offset_path_pattern = new Path(prefix + edge_info.getAdjListOffsetDirPath(adj_list_type) + "*") + val offset_path_pattern = new Path(prefix + edge_info.getOffsetPathPrefix(adj_list_type) + "*") val offset_chunk_files = fs.globStatus(offset_path_pattern) assert(offset_chunk_files.length == 10) // compare with correct offset chunk value @@ -175,7 +175,7 @@ class WriterSuite extends AnyFunSuite { // test write property group val property_group = edge_info.getPropertyGroup("creationDate", adj_list_type) writer.writeEdgeProperties(property_group) - val property_group_path_pattern = new Path(prefix + edge_info.getPropertyDirPath(property_group, adj_list_type) + "*/*") + val property_group_path_pattern = new Path(prefix + edge_info.getPropertyGroupPathPrefix(property_group, adj_list_type) + "*/*") val property_group_chunk_files = fs.globStatus(property_group_path_pattern) assert(property_group_chunk_files.length == 11) diff --git a/spark/src/test/scala/com/alibaba/graphar/TransformExample.scala b/spark/src/test/scala/com/alibaba/graphar/TransformExample.scala index 96eb22e1f..6e87cc079 100644 --- a/spark/src/test/scala/com/alibaba/graphar/TransformExample.scala +++ b/spark/src/test/scala/com/alibaba/graphar/TransformExample.scala @@ -84,11 +84,11 @@ class TransformExampleSuite extends AnyFunSuite { val output_prefix : String = "/tmp/example/" val writer = new EdgeWriter(output_prefix, edge_info, output_adj_list_type, adj_list_df) writer.writeAdjList() - val adj_list_path_pattern = new Path(output_prefix + edge_info.getAdjListDirPath(output_adj_list_type) + "*/*") + val adj_list_path_pattern = new Path(output_prefix + edge_info.getAdjListPathPrefix(output_adj_list_type) + "*/*") val fs = FileSystem.get(adj_list_path_pattern.toUri(), spark.sparkContext.hadoopConfiguration) val adj_list_chunk_files = fs.globStatus(adj_list_path_pattern) assert(adj_list_chunk_files.length == 11) - val offset_path_pattern = new Path(output_prefix + edge_info.getAdjListOffsetDirPath(output_adj_list_type) + "*") + val offset_path_pattern = new Path(output_prefix + edge_info.getOffsetPathPrefix(output_adj_list_type) + "*") val offset_chunk_files = fs.globStatus(offset_path_pattern) assert(offset_chunk_files.length == 10) diff --git a/src/arrow_chunk_reader.cc b/src/arrow_chunk_reader.cc index cb88c0ffa..82ff90990 100644 --- a/src/arrow_chunk_reader.cc +++ b/src/arrow_chunk_reader.cc @@ -112,8 +112,7 @@ AdjListArrowChunkReader::GetChunk() noexcept { edge_info_.GetAdjListFilePath( vertex_chunk_index_, chunk_index_, adj_list_type_)); std::string path = prefix_ + chunk_file_path; - GAR_ASSIGN_OR_RAISE(auto file_type, - edge_info_.GetAdjListFileType(adj_list_type_)); + GAR_ASSIGN_OR_RAISE(auto file_type, edge_info_.GetFileType(adj_list_type_)); GAR_ASSIGN_OR_RAISE(chunk_table_, fs_->ReadFileToTable(path, file_type)); } IdType row_offset = seek_offset_ - chunk_index_ * edge_info_.GetChunkSize(); @@ -126,8 +125,7 @@ Result AdjListArrowChunkReader::GetRowNumOfChunk() noexcept { edge_info_.GetAdjListFilePath( vertex_chunk_index_, chunk_index_, adj_list_type_)); std::string path = prefix_ + chunk_file_path; - GAR_ASSIGN_OR_RAISE(auto file_type, - edge_info_.GetAdjListFileType(adj_list_type_)); + GAR_ASSIGN_OR_RAISE(auto file_type, edge_info_.GetFileType(adj_list_type_)); GAR_ASSIGN_OR_RAISE(chunk_table_, fs_->ReadFileToTable(path, file_type)); } return chunk_table_->num_rows(); @@ -199,8 +197,7 @@ AdjListOffsetArrowChunkReader::GetChunk() noexcept { auto chunk_file_path, edge_info_.GetAdjListOffsetFilePath(chunk_index_, adj_list_type_)); std::string path = prefix_ + chunk_file_path; - GAR_ASSIGN_OR_RAISE(auto file_type, - edge_info_.GetAdjListFileType(adj_list_type_)); + GAR_ASSIGN_OR_RAISE(auto file_type, edge_info_.GetFileType(adj_list_type_)); GAR_ASSIGN_OR_RAISE(chunk_table_, fs_->ReadFileToTable(path, file_type)); } IdType row_offset = seek_id_ - chunk_index_ * vertex_chunk_size_; diff --git a/src/arrow_chunk_writer.cc b/src/arrow_chunk_writer.cc index d5bbda2ac..a1363cc6d 100644 --- a/src/arrow_chunk_writer.cc +++ b/src/arrow_chunk_writer.cc @@ -222,8 +222,7 @@ Status EdgeChunkWriter::WriteOffsetChunk( const std::shared_ptr& input_table, IdType vertex_chunk_index) const noexcept { GAR_RETURN_NOT_OK(Validate(input_table, vertex_chunk_index)); - GAR_ASSIGN_OR_RAISE(auto file_type, - edge_info_.GetAdjListFileType(adj_list_type_)); + GAR_ASSIGN_OR_RAISE(auto file_type, edge_info_.GetFileType(adj_list_type_)); GAR_ASSIGN_OR_RAISE(auto suffix, edge_info_.GetAdjListOffsetFilePath( vertex_chunk_index, adj_list_type_)); std::string path = prefix_ + suffix; @@ -234,8 +233,7 @@ Status EdgeChunkWriter::WriteAdjListChunk( const std::shared_ptr& input_table, IdType vertex_chunk_index, IdType chunk_index) const noexcept { GAR_RETURN_NOT_OK(Validate(input_table, vertex_chunk_index)); - GAR_ASSIGN_OR_RAISE(auto file_type, - edge_info_.GetAdjListFileType(adj_list_type_)); + GAR_ASSIGN_OR_RAISE(auto file_type, edge_info_.GetFileType(adj_list_type_)); std::vector indices; indices.clear(); auto schema = input_table->schema(); diff --git a/src/graph.cc b/src/graph.cc index 2e3523cfd..755586127 100644 --- a/src/graph.cc +++ b/src/graph.cc @@ -360,4 +360,17 @@ bool EdgeIter::first_dst(const EdgeIter& from, IdType id) { } } +const AdjListType + EdgesCollection::adj_list_type_ = + AdjListType::ordered_by_source; +const AdjListType + EdgesCollection::adj_list_type_ = + AdjListType::ordered_by_dest; +const AdjListType + EdgesCollection::adj_list_type_ = + AdjListType::unordered_by_source; +const AdjListType + EdgesCollection::adj_list_type_ = + AdjListType::unordered_by_dest; + } // namespace GAR_NAMESPACE_INTERNAL diff --git a/src/reader_utils.cc b/src/reader_utils.cc index 1ab2efa5b..d09da1a2c 100644 --- a/src/reader_utils.cc +++ b/src/reader_utils.cc @@ -58,8 +58,7 @@ Result> GetAdjListOffsetOfVertex( edge_info.GetAdjListOffsetFilePath(offset_chunk_index, adj_list_type)); std::string out_prefix; GAR_ASSIGN_OR_RAISE(auto fs, FileSystemFromUriOrPath(prefix, &out_prefix)); - GAR_ASSIGN_OR_RAISE(auto file_type, - edge_info.GetAdjListFileType(adj_list_type)); + GAR_ASSIGN_OR_RAISE(auto file_type, edge_info.GetFileType(adj_list_type)); std::string path = out_prefix + offset_file_path; GAR_ASSIGN_OR_RAISE(auto table, fs->ReadFileToTable(path, file_type)); auto array = std::static_pointer_cast( diff --git a/test/test_chunk_info_reader.cc b/test/test_chunk_info_reader.cc index aae200da5..a01373c08 100644 --- a/test/test_chunk_info_reader.cc +++ b/test/test_chunk_info_reader.cc @@ -28,8 +28,8 @@ TEST_CASE("test_vertex_property_chunk_info_reader") { auto maybe_graph_info = GAR_NAMESPACE::GraphInfo::Load(path); REQUIRE(maybe_graph_info.status().ok()); auto graph_info = maybe_graph_info.value(); - REQUIRE(graph_info.GetAllVertexInfo().size() == 1); - REQUIRE(graph_info.GetAllEdgeInfo().size() == 1); + REQUIRE(graph_info.GetVertexInfos().size() == 1); + REQUIRE(graph_info.GetEdgeInfos().size() == 1); // construct vertex property info reader std::string label = "person", property_name = "id"; @@ -83,8 +83,8 @@ TEST_CASE("test_adj_list_chunk_info_reader") { auto maybe_graph_info = GAR_NAMESPACE::GraphInfo::Load(path); REQUIRE(maybe_graph_info.status().ok()); auto graph_info = maybe_graph_info.value(); - REQUIRE(graph_info.GetAllVertexInfo().size() == 1); - REQUIRE(graph_info.GetAllEdgeInfo().size() == 1); + REQUIRE(graph_info.GetVertexInfos().size() == 1); + REQUIRE(graph_info.GetEdgeInfos().size() == 1); // construct adj list info reader std::string src_label = "person", edge_label = "knows", dst_label = "person"; diff --git a/test/test_info.cc b/test/test_info.cc index fa91d6c0a..95ee5dca2 100644 --- a/test/test_info.cc +++ b/test/test_info.cc @@ -36,12 +36,12 @@ TEST_CASE("test_graph_info") { REQUIRE(graph_info.GetVersion() == version); // test add vertex and get vertex info - REQUIRE(graph_info.GetAllVertexInfo().size() == 0); + REQUIRE(graph_info.GetVertexInfos().size() == 0); GAR_NAMESPACE::VertexInfo vertex_info("test_vertex", 100, version, "test_vertex_prefix"); auto st = graph_info.AddVertex(vertex_info); REQUIRE(st.ok()); - REQUIRE(graph_info.GetAllVertexInfo().size() == 1); + REQUIRE(graph_info.GetVertexInfos().size() == 1); auto maybe_vertex_info = graph_info.GetVertexInfo("test_vertex"); REQUIRE(!maybe_vertex_info.has_error()); REQUIRE(maybe_vertex_info.value().GetLabel() == "test_vertex"); @@ -51,14 +51,14 @@ TEST_CASE("test_graph_info") { REQUIRE(graph_info.AddVertex(vertex_info).IsInvalidOperation()); // test add edge and get edge info - REQUIRE(graph_info.GetAllEdgeInfo().size() == 0); + REQUIRE(graph_info.GetEdgeInfos().size() == 0); std::string src_label = "test_vertex", edge_label = "test_edge", dst_label = "test_vertex"; GAR_NAMESPACE::EdgeInfo edge_info(src_label, edge_label, dst_label, 1024, 100, 100, true, version); st = graph_info.AddEdge(edge_info); REQUIRE(st.ok()); - REQUIRE(graph_info.GetAllEdgeInfo().size() == 1); + REQUIRE(graph_info.GetEdgeInfos().size() == 1); auto maybe_edge_info = graph_info.GetEdgeInfo(src_label, edge_label, dst_label); REQUIRE(!maybe_edge_info.has_error()); @@ -122,11 +122,11 @@ TEST_CASE("test_vertex_info") { // test get dir path std::string expected_dir_path = v_info.GetPrefix() + pg.GetPrefix(); - auto maybe_dir_path = v_info.GetDirPath(pg); + auto maybe_dir_path = v_info.GetPathPrefix(pg); REQUIRE(!maybe_dir_path.has_error()); REQUIRE(maybe_dir_path.value() == expected_dir_path); // property group not exist - REQUIRE(v_info.GetDirPath(pg2).status().IsKeyError()); + REQUIRE(v_info.GetPathPrefix(pg2).status().IsKeyError()); // test get file path auto maybe_path = v_info.GetFilePath(pg, 0); REQUIRE(!maybe_path.has_error()); @@ -174,51 +174,43 @@ TEST_CASE("test_edge_info") { REQUIRE(edge_info.ContainAdjList(adj_list_type)); // same adj list type can not be added twice REQUIRE(edge_info.AddAdjList(adj_list_type, file_type).IsInvalidOperation()); - auto adj_prefix_result = edge_info.GetAdjListPrefix(adj_list_type); - REQUIRE(!adj_prefix_result.has_error()); - auto adj_prefix = adj_prefix_result.value(); - REQUIRE(adj_prefix == - "ordered_by_source/"); // default prefix is adj_list_type + "/" - auto file_type_result = edge_info.GetAdjListFileType(adj_list_type); + auto file_type_result = edge_info.GetFileType(adj_list_type); REQUIRE(!file_type_result.has_error()); REQUIRE(file_type_result.value() == file_type); + auto prefix_of_adj_list_type = + std::string(GraphArchive::AdjListTypeToString(adj_list_type)) + "/"; + auto adj_list_path_prefix = edge_info.GetAdjListPathPrefix(adj_list_type); + REQUIRE(!adj_list_path_prefix.has_error()); + REQUIRE(adj_list_path_prefix.value() == + edge_info.GetPrefix() + prefix_of_adj_list_type + "adj_list/"); auto adj_list_file_path = edge_info.GetAdjListFilePath(0, 0, adj_list_type); REQUIRE(!adj_list_file_path.has_error()); REQUIRE(adj_list_file_path.value() == - edge_info.GetPrefix() + adj_prefix + "adj_list/part0/chunk0"); - auto adj_list_dir_path = edge_info.GetAdjListDirPath(adj_list_type); - REQUIRE(!adj_list_dir_path.has_error()); - REQUIRE(adj_list_dir_path.value() == - edge_info.GetPrefix() + adj_prefix + "adj_list/"); + adj_list_path_prefix.value() + "part0/chunk0"); + auto adj_list_offset_path_prefix = + edge_info.GetOffsetPathPrefix(adj_list_type); + REQUIRE(!adj_list_offset_path_prefix.has_error()); + REQUIRE(adj_list_offset_path_prefix.value() == + edge_info.GetPrefix() + prefix_of_adj_list_type + "offset/"); auto adj_list_offset_file_path = edge_info.GetAdjListOffsetFilePath(0, adj_list_type); REQUIRE(!adj_list_offset_file_path.has_error()); REQUIRE(adj_list_offset_file_path.value() == - edge_info.GetPrefix() + adj_prefix + "offset/chunk0"); - auto adj_list_offset_dir_path = - edge_info.GetAdjListOffsetDirPath(adj_list_type); - REQUIRE(!adj_list_offset_dir_path.has_error()); - REQUIRE(adj_list_offset_dir_path.value() == - edge_info.GetPrefix() + adj_prefix + "offset/"); + adj_list_offset_path_prefix.value() + "chunk0"); // adj list type not exist REQUIRE(!edge_info.ContainAdjList(adj_list_type_not_exist)); - REQUIRE(edge_info.GetAdjListPrefix(adj_list_type_not_exist) - .status() - .IsKeyError()); - REQUIRE(edge_info.GetAdjListFileType(adj_list_type_not_exist) - .status() - .IsKeyError()); + REQUIRE(edge_info.GetFileType(adj_list_type_not_exist).status().IsKeyError()); REQUIRE(edge_info.GetAdjListFilePath(0, 0, adj_list_type_not_exist) .status() .IsKeyError()); - REQUIRE(edge_info.GetAdjListDirPath(adj_list_type_not_exist) + REQUIRE(edge_info.GetAdjListPathPrefix(adj_list_type_not_exist) .status() .IsKeyError()); REQUIRE(edge_info.GetAdjListOffsetFilePath(0, adj_list_type_not_exist) .status() .IsKeyError()); - REQUIRE(edge_info.GetAdjListOffsetDirPath(adj_list_type_not_exist) + REQUIRE(edge_info.GetOffsetPathPrefix(adj_list_type_not_exist) .status() .IsKeyError()); @@ -246,16 +238,16 @@ TEST_CASE("test_edge_info") { auto is_primary_result = edge_info.IsPrimaryKey(p.name); REQUIRE(!is_primary_result.has_error()); REQUIRE(is_primary_result.value() == p.is_primary); + auto property_path_path_prefix = + edge_info.GetPropertyGroupPathPrefix(pg, adj_list_type); + REQUIRE(!property_path_path_prefix.has_error()); + REQUIRE(property_path_path_prefix.value() == + edge_info.GetPrefix() + prefix_of_adj_list_type + pg.GetPrefix()); auto property_file_path = edge_info.GetPropertyFilePath(pg, adj_list_type, 0, 0); REQUIRE(!property_file_path.has_error()); REQUIRE(property_file_path.value() == - edge_info.GetPrefix() + adj_prefix + pg.GetPrefix() + "part0/chunk0"); - auto property_dir_path = edge_info.GetPropertyDirPath(pg, adj_list_type); - REQUIRE(!property_dir_path.has_error()); - REQUIRE(property_dir_path.value() == - edge_info.GetPrefix() + adj_prefix + pg.GetPrefix()); - + property_path_path_prefix.value() + "part0/chunk0"); // test property not exist REQUIRE(edge_info.GetPropertyGroup("p_not_exist", adj_list_type) .status() @@ -268,7 +260,7 @@ TEST_CASE("test_edge_info") { REQUIRE(edge_info.GetPropertyFilePath(pg_not_exist, adj_list_type, 0, 0) .status() .IsKeyError()); - REQUIRE(edge_info.GetPropertyDirPath(pg_not_exist, adj_list_type) + REQUIRE(edge_info.GetPropertyGroupPathPrefix(pg_not_exist, adj_list_type) .status() .IsKeyError()); @@ -282,7 +274,7 @@ TEST_CASE("test_edge_info") { REQUIRE(edge_info.GetPropertyFilePath(pg, adj_list_type_not_exist, 0, 0) .status() .IsKeyError()); - REQUIRE(edge_info.GetPropertyDirPath(pg, adj_list_type_not_exist) + REQUIRE(edge_info.GetPropertyGroupPathPrefix(pg, adj_list_type_not_exist) .status() .IsKeyError()); @@ -332,8 +324,8 @@ TEST_CASE("test_graph_info_load_from_file") { auto graph_info = graph_info_result.value(); REQUIRE(graph_info.GetName() == "ldbc_sample"); REQUIRE(graph_info.GetPrefix() == TEST_DATA_DIR + "/ldbc_sample/csv/"); - const auto& vertex_infos = graph_info.GetAllVertexInfo(); - const auto& edge_infos = graph_info.GetAllEdgeInfo(); + const auto& vertex_infos = graph_info.GetVertexInfos(); + const auto& edge_infos = graph_info.GetEdgeInfos(); REQUIRE(vertex_infos.size() == 1); REQUIRE(edge_infos.size() == 1); }