Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multiple label #1452

Closed
wants to merge 11 commits into from
Closed

Conversation

rafsun42
Copy link
Member

@rafsun42 rafsun42 commented Dec 14, 2023

Add support for multiple labels in CREATE, MERGE and MATCH clause.

Important Notes:

  • Before merging, please rerun the workflow again. Any tests that were added after this branch was last updated will fail because this PR makes minor changes in how vertex and edge objects are output.
  • This PR was rebased. Now, it is based on PG 16 (previously it was based on PG 15).

@thjbdvlt
Copy link

thjbdvlt commented Feb 8, 2024

hello :-)

i have tried this while looking for a way to use OR (|) operator in Cypher queries, as i mentionned here.

but i couldn't compile (probably because i misunderstand something). here is the output of make after i have gh pr checkout 1452 from inside AGE directory:

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -fPIC -I.//src/include -I.//src/include/parser -I. -I./ -I/usr/include/postgresql/15/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/backend/parser/cypher_label_expr.o src/backend/parser/cypher_label_expr.c
src/backend/parser/cypher_label_expr.c: In function ‘get_label_expr_relations’:
src/backend/parser/cypher_label_expr.c:263:9: error: a label can only be part of a statement and a declaration is not a statement
  263 |         List *reloids;
      |         ^~~~
src/backend/parser/cypher_label_expr.c:264:9: error: expected expression before ‘List’
  264 |         List *(*merge_lists)(List *, const List *);
      |         ^~~~
src/backend/parser/cypher_label_expr.c:265:9: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
  265 |         ListCell *lc;
      |         ^~~~~~~~
src/backend/parser/cypher_label_expr.c:281:9: error: ‘merge_lists’ undeclared (first use in this function)
  281 |         merge_lists = label_expr_type == LABEL_EXPR_TYPE_AND ?
      |         ^~~~~~~~~~~
src/backend/parser/cypher_label_expr.c:281:9: note: each undeclared identifier is reported only once for each function it appears in
src/backend/parser/cypher_label_expr.c:321:27: warning: implicit declaration of function ‘merge_lists’ [-Wimplicit-function-declaration]
  321 |                 reloids = merge_lists(reloids, lcd->allrelations);
      |                           ^~~~~~~~~~~
src/backend/parser/cypher_label_expr.c:321:25: warning: assignment to ‘List *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
  321 |                 reloids = merge_lists(reloids, lcd->allrelations);
      |                         ^
src/backend/parser/cypher_label_expr.c: In function ‘label_expr_relname’:
src/backend/parser/cypher_label_expr.c:385:9: error: a label can only be part of a statement and a declaration is not a statement
  385 |         StringInfo relname_strinfo;
      |         ^~~~~~~~~~
src/backend/parser/cypher_label_expr.c:386:9: error: expected expression before ‘ListCell’
  386 |         ListCell *lc;
      |         ^~~~~~~~
src/backend/parser/cypher_label_expr.c:387:9: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
  387 |         char *relname;
      |         ^~~~
In file included from /usr/include/postgresql/15/server/access/tupdesc.h:19,
                 from /usr/include/postgresql/15/server/utils/relcache.h:17,
                 from /usr/include/postgresql/15/server/access/genam.h:21,
                 from src/backend/parser/cypher_label_expr.c:3:
src/backend/parser/cypher_label_expr.c:394:18: error: ‘lc’ undeclared (first use in this function)
  394 |         foreach (lc, label_expr->label_names)
      |                  ^~
/usr/include/postgresql/15/server/nodes/pg_list.h:356:5: note: in definition of macro ‘foreach’
  356 |    (cell = &cell##__state.l->elements[cell##__state.i], true) : \
      |     ^~~~
/usr/include/postgresql/15/server/nodes/pg_list.h:356:55: warning: left-hand operand of comma expression has no effect [-Wunused-value]
  356 |    (cell = &cell##__state.l->elements[cell##__state.i], true) : \
      |                                                       ^
src/backend/parser/cypher_label_expr.c:394:9: note: in expansion of macro ‘foreach’
  394 |         foreach (lc, label_expr->label_names)
      |         ^~~~~~~
/usr/include/postgresql/15/server/nodes/pg_list.h:357:16: warning: left-hand operand of comma expression has no effect [-Wunused-value]
  357 |    (cell = NULL, false); \
      |                ^
src/backend/parser/cypher_label_expr.c:394:9: note: in expansion of macro ‘foreach’
  394 |         foreach (lc, label_expr->label_names)
      |         ^~~~~~~
make: *** [<commande interne> : src/backend/parser/cypher_label_expr.o] Erreur 1

here is the output of psql --version:

psql (PostgreSQL) 15.5 (Debian 15.5-1.pgdg110+1)

i am running postgresql 15 on Debian 11 bullseye, installed with these debian apt packages:

postgresql-client-15 
postgresql-15 
postgresql-server-dev-15 
libpq-dev

i have tried on these PG15 branchs of AGE (for both, in META.JSON file, the version is said to be 1.3.0):

git clone --branch PG15 https://github.com/apache/age
git clone --branch PG15/v1.5.0-rc0 https://github.com/apache/age .

it's probably meaningless but to make AGE compile, i had to install two deps from debian apt packages: bison and flex.

@rafsun42
Copy link
Member Author

rafsun42 commented Feb 8, 2024

@thjbdvlt Did you do make clean before make install?

@thjbdvlt
Copy link

thjbdvlt commented Feb 9, 2024

@rafsun42 i didn't, but with make clean, the same errors occur when i run make install, or also if i use make clean install.
i would like to help more but i really dont understand all of these, (i'll read about make next days, and as i did want to learn some c, it may be a good occasion to start and try understanding these reports).

(plus, as i now have a small partition on my computer dedicated to postgresql 15/age/pr 1452, i can easily try things)

@rafsun42
Copy link
Member Author

@thjbdvlt After some digging, I figured out this branch indeed does not compile with a PostgreSQL copy compiled with the --with-llvm option, which is the case when PostgreSQL is installed via a package manager. This error did not occur in my development environment because I build PostgreSQL from source without that option.

I pushed a commit to address this issue. Let me know if it compiles now.

@thjbdvlt
Copy link

thjbdvlt commented Feb 14, 2024

@rafsun42 it does compile!
i ran some minimal tests, matching with (a)-[:elabel_b|edge_label_c]->(d) works.
absolutely wonderful :)

@mark-win
Copy link

This Feature is huge, not only for openCypher compliance. A merge would be much appreciated.
I would like to provide two real life examples, where this is mandatory for matching.

I am using graph databases mostly as backend for traditional OLTP Applications, as opposed to data analysis. For me the most common use case for union style relation type queries is, to enforce constraints on your relations by checking for forbidden combinations before creating an edge.

Example 1

Imagine a graph with vertices representing persons :PERSON and edges representing their relation to one another :SIBLING, :PARENT, :MARRIED, ....
When trying to add a new edge between two persons, especially when triggered by user input, you would probably want to check, if that would violate the rules of your application. In this case you might consider denying the action, if (a:PERSON)-[:SIBLING|PARENT *]-(b:PERSON) exists, before marrying two people.

Example 2

Imagine a graph for task planning. Tasks :TASK can depend :DEPENDS_ON on other tasks, to form a chain of execution. But they can also be infinitely nested :SUBTASK_OF to divide them into smaller chunks of work.
Before creating any of the two edge types, we need to check, that we don't create a circular dependency. In this graph model, it can happen like this:

// having
(a:TASK)-[:DEPENDS_ON]->(b:TASK)
// each of these would produce a circular dependency for the model:
CREATE (b)-[:DEPENDS_ON]->(a)
CREATE (a)-[:SUBTASK_OF]->(b)
CREATE (b)-[:SUBTASK_OF]->(a)

To add to the complexity, any combination of both edge types in any path length are a problem and can (within reasonable effort) only be detected like this:

// again not allowed to exist:
MATCH p = (a:TASK)-[:DEPENDS_ON|SUBTASK_OF *]-(b:TASK)

Almost feels like a prime example for the power of graph databases. And if you don't want to use this feature in an OLTP context, you sure want to be able to find these conditions in your dataset.

@jrgemignani
Copy link
Contributor

@rafsun42 Could this be rebased with the master and applied to the master branch instead of PG15? I'm not saying not to do PG15, just get it on the master first. Thoughts?

@jrgemignani
Copy link
Contributor

@rafsun42 Additionally, it states that there are conflicts?

image

@rafsun42
Copy link
Member Author

@jrgemignani Sure, John. I will rebase this to latest master.

rafsun42 added 11 commits April 23, 2024 16:25
It represents label expression of different type: empty, single or multiple.
Previously, label field was char* type.

The change affected the type cypher_node, cypher_relationship
and cypher_target_node. As well as, any places where these
types are used.
Supports queries like-
    MATCH (v:A|B|C) RETURN v
    MATCH ()-[e:A|B|C]->() RETURN v
Some examples of supported multiple label queries:
	CREATE (:a:b)
	MERGE  (:a:b)
	MATCH  (:a:b)
	MATCH  (:a|b)

See regress/sql/multiple_label.sql for more details on what kind
of queries are supported.

Change summary:
---------------
* A new column `allrelations` is added to ag_label catalog
* Change in creating AGE relations logic
* Change in MATCH's transformation logic (related to building parse
  namespace item)
The logic for building vertex objects is updated. Agtype vertex objects can be
built from either a single label (as a cstring) or multiple labels (as an
agtype array). The following functions are updated to reflect this-
agtype_typecast_vertex, agtype_in and _agtype_build_vertex. if
_agtype_build_vertex is called from SQL, its label argument must be explicitly
cast to avoid ambiguity in function overload.

The `_label_names` function is added to extract label names from a vertex ID
as a list of string. It is used as a helper function to build vertex objects.
A new cache called `allrelations` is also added. This is used by _label_names
to search for all labels that are related to a given relation.

Multiple helper functions are added to extract label infromation from an entity
ID. For example, entity's relation ID, relation name, label names. These are
used by CREATE, DELETE, MERGE, VLE and SET executors for building a vertex's
object or updating its relation.

All test files are updated to show the label field as an array in the output.
In all test SQLs, _agtype_build_vertex's label argument is explicity cast.
It updates the function filter_vertices_on_label_id().

Additional changes:
-------------------
 - Add internal function _label_ids
Cache issues fixed:
-------------------
  - Use of wrong data type for cache entry in label relation cache (pre-existing)
  - Use of wrong update function for catalog table (related to multiple label)

Other changes:
--------------
  - The function _label_name() is unsupported for vertices
Changes:
--------
 - Update create_label_expr_relations() to return RangeVar. It removes
   redundant call to label_expr_relname() in the code that also calls
   this function.

 - Use deconstruct_array() to convert ArrayType* to List*

 - Update test files after rebase
This fixes some compile-time errors that occur if
PostgreSQL is configured with the --with-llvm option.
Changes:
-------
  * Include missing header files
  * Update newly added tests
  * Other minor changes
Following PRs are reapplied: 1465, 1509, 1514, and 1518.
@rafsun42 rafsun42 changed the base branch from PG15 to master April 24, 2024 19:56
@rafsun42 rafsun42 added master and removed PG15 PostgreSQL15 labels Apr 24, 2024
@github-actions github-actions bot added the PG15 PostgreSQL15 label Apr 24, 2024
@rafsun42 rafsun42 removed the PG15 PostgreSQL15 label Apr 24, 2024
@rafsun42
Copy link
Member Author

After rebase, moved to a new PR #1785.

@rafsun42 rafsun42 closed this Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants