title | author | date |
---|---|---|
rdfpuml - Convert RDF to PlantUML diagrams |
Vladimir Alexiev, Ontotext Corp |
2023-06-02 |
Table of Contents
perl -S rdfpuml.pl file.ttl # makes file.puml
java -jar plantuml.jar -charset UTF-8 file.puml # makes file.png
Converts an RDF Turtle file to a readable diagram using PlantUML.
The best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram. Many RDF visualization tools exist, but they have shortcomings, eg
- Focus on large graphs, where the details are not easily visible
- Visualization results are not satisfactory
- Manual tweaking of the diagrams is required
rdfpuml makes true UML diagrams directly from Turtle examples,
with a small amount of tweaking that can be done with puml:
formatting triples
(this follows the approach from
Circles and arrows diagrams using stylesheet rules, Dan Connolly, W3C 2005).
Benefits:
- Guarantees consistency between what you say (triple statements) and what you show (diagram)
- Lets you focus on domain modeling rather than diagram layouting/tweaking
- Enables efficient source control
- Saves you lots of effort
Diagram readability is a prime concern. rdfpuml implements the following features for maximum readability.
- Shorten URLs aggressively. Some prefixed names you see in the diagram are not valid Turtle
- Show inline as much as possible: literal values, types, and nodes declared as
puml:Inline
- Sort properties by name (rdf:type comes first)
- Show literal datatypes, use Turtle shortcuts (eg 1 instead of "1"^^xsd:integer)
- Collect property values together; collect "parallel" relations together
- Show reified properties as UML Associations
- Allow node decorations such as stereotypes, colored circles (and in the future: background color and icons)
- Allow arrow decorations such as head, dashed, colored
- Allow control of arrow directions
- Allow hidden arrows to tweak the layout
rdfpuml prepends prefixes.ttl
if it finds such a fule,
so when you make a set of examples, you can keep all your prefixes in one file.
It also predefines the following prefixes:
puml => 'http://plantuml.com/ontology#'
rdf => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
rdfs => 'http://www.w3.org/2000/01/rdf-schema#'
skos => 'http://www.w3.org/2004/02/skos/core#'
crm => 'http://www.cidoc-crm.org/cidoc-crm/'
crmx => 'http://purl.org/NET/cidoc-crm/ext#'
frbroo => 'http://example.com/frbroo/'
crmdig => 'http://www.ics.forth.gr/isl/CRMdig/'
crmsci => 'http://www.ics.forth.gr/isl/crmsci/'
leak => 'http://data.ontotext.com/resource/leak/'
puml
is used for PlantUML formatting triples, see below.
rdfs
and skos
are used to display node labels.
The rest are used for reification (see below).
Multiple property instances between nodes are collected in one arrow and shown as several labels. Inverse arrows work fine.
In RDF, Reification is an approach where a statement s p o is represented as a class instance with 3 "addressing" properties and then additional properties are added to elaborate the statement (eg probability, effective date range, who assigned the statement, etc).
This approach has been used in RDF for many years using the RDF Reification vocabulary.
It was developed for CIDOC CRM in the paper Types and Annotations for CIDOC CRM Properties
and is used for British Museum data (bmo:EX_Association, bmo:PX_property
),
see Reified Association.
rdfpuml recognizes a number of reification "situations" and renders them as a UML Association, for example
RDF Reification looks like this:
[] a rdf:Statement; rdf:subject s; rdf:predicate p; rdf:object o; <statement metadata>
rdf:Statement
is the reification class,
rdf:subject, rdf:predicate, rdf:object
are the addressing properties,
and it can be applied over any shortcut property p.
The Property Reification Vocabulary (PRV) can be used to describe other reification situations, using terms like in the table below. rdfpuml recognizes the following situations, and in the future you should be able to provide your own PRV descriptions.
REIFICATION CLASS SUBJECT PROP SHORTCUT PROP OBJECT PROP SHORTCUT
_____________________________ ______________________________ ________________________________ ___________________________ __________________________________________
rdf:Statement rdf:subject rdf:predicate rdf:object <any>
crm:E13_Attribute_Assignment crm:P140_assigned_attribute_to crmx:property crm:P141_assigned <any CRM prop>
crm:E14_Condition_Assessment crm:P34_concerned crmx:property crm:P35_has_identified crm:P44_has_condition
crm:E15_Identifier_Assignment crm:P140_assigned_attribute_to crmx:property crm:P37_assigned crm:P1_is_identified_by, crm:P102_has_title
crm:E15_Identifier_Assignment crm:P140_assigned_attribute_to crmx:property crm:P38_deassigned crm:P1_is_identified_by, crm:P102_has_title
crm:E16_Measurement crm:P39_measured crmx:property crm:P40_observed_dimension crm:P43_has_dimension
crm:E17_Type_Assignment crm:P41_classified crmx:property crm:P42_assigned crm:P2_has_type or subprop
frbroo:F52_Name_Use_Activity frbroo:R63_named crmx:property frbroo:R64_used_name crm:P1_is_identified_by, crm:P102_has_title
crmsci:S4_Observation crmsci:O8_observed crmsci:O9_observed_property_type crmsci:O16_observed_value
leak:Edge leak:hasSource <none> leak:hasTarget
For CIDOC CRM we need a new extension crmx:property
to point to the property being reified (the shortcut), similar to how rdf:predicate
is used.
Even for a specific CRM reification class like E17_Type_Assignment,
the shortcut property is not fixed to crm:P2_has_type
:
we may need to reify a sub-property thereof, e.g. crm:P72_has_language
.
Visuals: the shortcut is shown as a normal relation. The reification node is attached to the relation usign a dashed line. It is automatically positioned below or to the right of the relation, depending on the relation's direction. The 3 "addressing" properties are shown inside the reification class, and there are little characters in front of them to point to the subject ("←" or "↑"), property (".." or ":") and object ("→" or "↓").
Limitation: you can show as reified a maximum of 2 relations between the same nodes, and even that is ugly.
If you don't want to show a relation as reified (either because it's the third one between the same nodes or for other reasons,
use the puml:NoReify
class to tell rdfpuml not to reify it, e.g.
cona_term:1000000718-contrib-10000016 a puml:NoReify.
In order to save space, rdfpuml inlines various resources in the subject node, rather than as a separate node. All literals and types are inlined automatically. In addition, you can inline other nodes by using the following.
-
puml:Inline
Shows a single node inline. This is used quite often for lookup values, e.g.
<cona/event/competition> a puml:Inline. cona_contrib:10000000 a puml:Inline; rdfs:label "Getty Vocabulary Program".
-
puml:InlineProperty
Declares a property to be inlined, i.e. all its objects are shown inlined, e.g.
fn:annotationSetFrame a puml:InlineProperty. fn:annotationSetLU a puml:InlineProperty.
E.g. this is used in this complex diagram showing FrameNet nodes.
Nodes can have labels: puml:label, rdfs:label, skos:prefLabel
.
- If a node is inlined, its label is added as a comment. If several nodes are inlined at the same property, all their comments are also gathered together.
- If a node is shown normally, its label is shown amongst the other node attributes.
puml:label
is used to give a label without any predicate (attribute) name. It's printed in red (but only if single-line). It's used with completely different meanings in rdf2rml (to specify a SQL table or join) and in rdf2sparql (to specify a row filter): I know this is a hack.
rdfpuml allows you to customize arrows by using properties like puml:DIR-HEAD-LINE-COLOR-LEN
.
You can combine the different parts freely (each is optional) and even write them in different order.
-
DIR
Arrow direction:
left, right, up
ordown
(default) -
HEAD
Arrowhead (end):
none
(use for symmetric properties),tri
(hollow triangle),star
(filled dot),o
(empty dot).WARNING:
o
has a bug: it sets the arrow direction to the opposite of what was specified.TODO: allow customizing the arrowtail (beginning).
-
LINE
Line style:
dotted, dashed, bold, hidden
.dotted, dashed
are exclusive of each other;bold
can be used alone or with them.hidden
can be used to adjust the layout by constraining node positions, and is exclusive of the other line attributes.This example shows using the parameters
DIR-HEAD-LINE-COLOR
. We emit the same relations in thepuml:
namespace (to customize the arrow) and in the empty namespace (to show an arrow label).<x> puml:none-right <y1>. <x> :none-right <y1>. <x> puml:dashed <y2>. <x> :dashed <y2>. <x> puml:dotted-bold <y3>. <x> :dotted-bold <y3>. <x> puml:up-black <y4>. <x> :up-black <y4>. <x> puml:tri-up <y5>. <x> :tri-up <y5>. <x> puml:left-blue <y6>. <x> :left-blue <y6>.
-
COLOR
Line color: name (e.g.
red
) or hex-code (e.g.FF0000
) To see the full list of color names supported by PlantUML, use this command and search for;color
java -jar plantuml.jar -language
For example, 4 of the arrows on this diagram are colored (1 green, 3 red):
-
LEN
Length: a number of 1 or 2 digits. This applies only to vertical arrows (
up,down
). You can use this to adjust the layout and in some cases avoid the need for parasitichidden
arrows.See this example and its remake in rdfpuml below.
<x1> puml:down <y1>. <x2> puml:up-2 <y2>. <x3> puml:down-3 <y3>. <x4> puml:up-4 <y4>. <x5> puml:down-5 <y5>. <x6> puml:up-6 <y6>. <x7> puml:down-7 <y7>. <x2> puml:right-9 <y3>. <x4> puml:right-7 <y2>.
To customize the arrow of one relation connecting two nodes, use:
<node1> <prop> <node2>
<node1> puml:DIR-HEAD-LINE-COLOR <node2>
The arrow will show the label prop
but the style specified with puml:DIR-HEAD-LINE-COLOR
To customize the arrow for all relations with the same property, use:
<prop> puml:arrow puml:DIR-HEAD-LINE-COLOR.
E.g. on the diagram "Inlines", the following declaration was used to point all nif:oliaLink
arrows upward:
nif:oliaLink puml:arrow puml:up.
Stereotype is UML lingo for "role". In PlantUML these include a «guillemetted italic label» and colored circle.
puml:stereotype "(LETTER,COLOR)LABEL"
where LETTER
is a single uppercase letter,
COLOR
is a color name or hex-code (see </COLOR>),
LABEL
is a label, and all the parts are optional.
You can set stereotype on an individual node or a whole class, e.g. (referring to the previous diagram):
<#char=32,34_annoSet> puml:stereotype "(F)Frame"
fn:AnnotationSet puml:stereotype "(F)Frame"
Here is an example that also sets stereotype labels:
gvp:GuideTerm puml:stereotype "(G,green) Concept".
gvp:Concept puml:stereotype "(C,lightblue) ThesaurusArray, OrderedCollection".
iso:ThesaurusArray puml:stereotype "(A,red) ThesaurusArray, OrderedCollection".
Here is a bigger example that also shows how arrow directions are handled. It's a diagram for the Duraspace Portland Common Data Model for digital library metadata (Fedora, Islandora, etc): a remake of one of the Reference Diagrams. (A proposal to make PCDM diagrams with rdfpuml is tracked as duraspace/pcdm#46)
NOTE: Always specify color, else the stereotype text will include the parenthesized letter, eg (C)
.
This is bug plantuml#1854
Although the use of blank nodes is not recommended in semantic modeling, they are supported by rdfpuml. No special triples are needed for this.
E.g. below is a diagram of EXAMPLE 41: Complete Example from the Web Annotation Data Model.
As you can see, 10 nodes on the right side are blank nodes (have no URL).
The tiny one in the middle has no attributes whatsoever, only the rdf:first, rdf:next
outgoing links.
It should have had a type rdf:List
, this is an omission in the example.
If you want to visualize not only instances (A-Box) but also class statements and expressions (T-Box), see test/complex-types with its own README.
Here is an example closely mirroring the style of the Industrial Ontology Foundry (IOF):
You can pass options and pragmas to PlantUML using the puml:options
property (attached to an empty node).
The default options are:
[] puml:options """
hide empty members
hide circle
skinparam classAttributeIconSize 0
""";
Option descriptions:
hide empty members
removes the 2 blank compartments that appear for nodes with no attributes.hide circle
hides the parasitic circled letter(C)
. Only nodes withpuml:stereotype
are shown a circled letter.skinparam classAttributeIconSize 0
removes UML attribute visibility flags (public, private, protected)
You can use left to right direction
to fit diagrams with large nodes (see two examples below).
Don't forget to add the hide
options, else you'll get unwanted compartments and circles in nodes:
[] puml:options """
hide empty members
hide circle
skinparam classAttributeIconSize 0
left to right direction
""".
You can also try your luck with smetana, which uses an internal Java implementation of GraphViz instead of an external C program.
One way to invoke smetana
is by adding a pragma to puml:options
.
See text/saref4city
for another way, and a trial.
!pragma layout smetana
First example (test/saref4city
):
Second example (test/permid
):
You can pass additional options to PlantUML to select a skin, set colors, etc by using a global config file (eg plantuml.cfg
). For example:
perl -S rdfpuml.pl file.ttl
plantuml -Iplantuml.cfg file.puml
By default, PlantUML uses a drawing canvas of 4096 pixels.
This causes really big diagrams (eg ones that need left to right direction
as described above) to be cut off.
To avoid this, add the following command-line option:
-DPLANTUML_LIMIT_SIZE=8192
For example in a batch file:
java -jar ".../plantuml.jar" -charset UTF-8 -DPLANTUML_LIMIT_SIZE=8192 "$@"
The input Turtle can include Unicode chars (accented chars, Cyrillic, etc).
See test/unicode
for some examples.
It used to be that you needed to invoke the script with option -C
, now that is not necessary:
perl -C -S rdfpuml.pl file.ttl
I include two simple files to invoke the script: bin/rdfpuml.bat
(CMD batch) and bin/rdfpuml
(shell script).
- For some reason, the shell script doesn't work ok with Unicode in Cygwin Bash (so use the BAT file).
- If you have problems with it on Linux, please post an issue.
- GraphViz
- Java (JDK or JRE)
- PlantUML, see in particular plantuml class diagrams.
- Perl. Tested with version 5.22 on Windows (cygwin and Strawberry).
- Perl modules:
RDF::Trine
,RDF::Query
,FindBin
. Install them withcpan
,cpanm
(works best on Strawberry) orcpanp
.RDF::Prefixes::Curie
. This is my own module located in../lib
, and rdfpuml needsFindBin
to locate it.
Until rdfpuml is published as a proper perl package, use the following procedure:
- Install the prerequisites.
- Add
rdfpuml/bin
to your path. - Use
bin/rdfpuml.bat
orbin/rdfpuml
to run it (but see "Unicode" above) - You may want to create a batch file to invoke plantuml (call it
plantuml
orpuml
for short):- Linux/Cygwin:
#!/bin/sh
java -jar c:/prog/plantuml/plantuml.jar -charset UTF-8 $*
- Windows:
@echo off
java -jar c:\prog\plantuml\plantuml.jar -charset UTF-8 %*
- (If you use the scoop package manager,
scoop install plantuml
already makes such batch files calledplantuml
) - See
test/*/Makefile
for examples how to set up make.
I now use the pp script (part of PAR-Packer) to make a self-contained binary for Windows: rdfpuml.exe.
- It includes Perl, all required modules and the script.
- On first run it's slower since it unzips the distribution (over 4k files)
- It caches the unzipped distribution to a folder, so on sunbsequent runs it's much faster
Big thanks to @rschupp
who helped me fix this issue: rschupp/PAR-Packer#88 !
- rdf2rml: a tool to generate R2RML transformations from RDF examples.
- If you use rdfpuml or rdf2rml, please cite them as follows. See this presentation for numerous examples.
- RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation Alexiev, V. In Semantic Web in Libraries 2016 (SWIB 16), Bonn, Germany, November 2016. Presentation, HTML, PDF, Video
- https://twitter.com/hashtag/rdfpuml for news, screenshots and announcements