The SPARQL endpoint authorization service (SEAS) is a layer that is placed in front of a SPARQL endpoint and that rewrites queries on this endpoint based on the session information of the user and the access rights on the data.
The idea is that data is organized into graphs and the access to these graphs is restricted to a certain set of roles. Each set of roles is associated with a group. When an INSERT query is sent to the SPARQL endpoint it is intercepted by SEAS. SEAS then loops through its different group specifications and, per specification, distributes the triples across different graphs when they match the graph's constraint. When a service later tries to read data through SEAS, the session is examined and the access criterion of every group is evaluated. If the user has access to the group, the service is allowed to read from the group's graph.
Your current stack contains a triplestore (Virtuoso, Blazegraph, ....). Because we will put mu-authorization in front of this triplestore, look up the name of the triplestore service in your docker-compose.yml
file and rename it to triplestore
.
E.g.
services:
database:
image: tenforce/virtuoso
...
should be updated to
services:
triplestore:
image: tenforce/virtuoso
...
Next, add the mu-authorization service to the docker-compose.yml
file by adding the following snippet:
services:
database:
image: semtech/mu-authorization:0.6.0-beta.5
environment:
MU_SPARQL_ENDPOINT: "http://triplestore:8890/sparql"
volumes:
- ./config/authorization:/config
This snippet is typically added just above the triplestore
service.
Create a file ./config/authorization/config.ex
with the following contents:
alias Acl.Accessibility.Always, as: AlwaysAccessible
alias Acl.GraphSpec.Constraint.Resource, as: ResourceConstraint
alias Acl.GraphSpec, as: GraphSpec
alias Acl.GroupSpec, as: GroupSpec
alias Acl.GroupSpec.GraphCleanup, as: GraphCleanup
defmodule Acl.UserGroups.Config do
def user_groups do
# These elements are walked from top to bottom. Each of them may
# alter the quads to which the current query applies. Quads are
# represented in three sections: current_source_quads,
# removed_source_quads, new_quads. The quads may be calculated in
# many ways. The useage of a GroupSpec and GraphCleanup are
# common.
[
# // PUBLIC
%GroupSpec{
name: "public",
useage: [:read],
access: %AlwaysAccessible{},
graphs: [ %GraphSpec{
graph: "http://mu.semte.ch/graphs/public",
constraint: %ResourceConstraint{
resource_types: [
]
} } ] },
# // CLEANUP
#
%GraphCleanup{
originating_graph: "http://mu.semte.ch/application",
useage: [:write],
name: "clean"
}
]
end
end
Start your stack using docker-compose up -d
. The authorization service will be created.
Note that you will probably not see any data in your application until you've correctly configured the authorization rules in ./config/authorization/config.ex
.
This guide describes how to make data publicly available for unauthenticated users, i.e. users without any information attached to their session.
This guide assumes you've already added mu-authorization to your application as described in "Tutorial: Add mu-authorization to a stack" and that data is currently stored in the graph http://mu.semte.ch/application
.
First, open ./config/authorization/config.ex
and make sure there is a %GroupSpec
named public
in the user_groups
function like:
%GroupSpec{
name: "public",
useage: [:read],
access: %AlwaysAccessible{},
graphs: [ %GraphSpec{
graph: "http://mu.semte.ch/graphs/public",
constraint: %ResourceConstraint{
resource_types: [
]
} } ] },
Next, add all resource types (rdf:Class) that you want to make publicly available to the resource_types
array in %ResourceConstraint
.
E.g.
%GroupSpec{
name: "public",
useage: [:read],
access: %AlwaysAccessible{},
graphs: [ %GraphSpec{
graph: "http://mu.semte.ch/graphs/public",
constraint: %ResourceConstraint{
resource_types: [
"http://www.w3.org/ns/dcat#Catalog",
"http://www.w3.org/ns/dcat#Dataset",
"http://www.w3.org/ns/dcat#Distribution"
]
} } ] },
Restart the authorization service to pick up the updated config:
docker-compose restart database
Next, move the data in your database from the graph http://mu.semte.ch/application
to the graph http://mu.semte.ch/graphs/public
. If your stack contains the mu-migrations-service, you can generate a migration like:
DELETE {
GRAPH <http://mu.semte.ch/application> {
?s ?p ?o .
}
} INSERT {
GRAPH <http://mu.semte.ch/graphs/public> {
?s ?p ?o .
}
} WHERE {
GRAPH <http://mu.semte.ch/application> {
?s a ?type ; ?p ?o .
VALUES ?type {
<http://www.w3.org/ns/dcat#Catalog>
<http://www.w3.org/ns/dcat#Dataset>
<http://www.w3.org/ns/dcat#Distribution>
}
}
}
Otherwise, you can execute the SPARQL query above directly on the SPARQL endpoint of your triplestore (e.g. http://localhost:8890/sparql).
Now that the data has been moved in the triplestore, we just need to make sure that anonymous users landing in our application are associated with the public
group. The default groups for a user are configured through an environment variable on mu-identifier.
Open docker-compose.yml
and add the following environment variable to the identifier
service:
services:
identifier:
image: semtech/mu-identifier
environment:
DEFAULT_MU_AUTH_ALLOWED_GROUPS_HEADER: "[{\"variables\":[],\"name\":\"public\"}]"
Restart your stack using docker-compose up -d
.
You should now be able to retrieve resources of the specified resource types in your application.
mu-authorization can receive a configuration file for the user groups as well as for the delta system.
A standard mu-semtech stack ensures the configuration files are stored in ./config/authorization/
. They can be mounted in the mu-authorization service by adding the following volume in your docker-compose.yml
:
services:
database:
image: semtech/mu-authorization
volumes:
- ./config/authorization:/config
The contents of the configuration files is explained in depth in the sections on authorization configuration and delta system.
Authorization in SEAS is based on group access rules. Groups have read and/or write access to one or multiple graphs based on constraints applied on the data. The group(s) access rules that are applicable for a user are determined based on his session information.
The user groups should be accessible either on /config/user_groups.ex
or in /config/config.ex
. In your standard mu-semtech stack you'll probably save this in ./config/authorization/user_groups.ex
.
The most import function in the configuration file for SEAS is the user_groups
function. This function returns the entire configuration regarding groups known to SEAS. The structure of its return type is explained in detail below.
To make the configuration more concise and readable, the following aliases can be configured on top of the configuration file:
alias Acl.Accessibility.Always, as: AlwaysAccessible
alias Acl.Accessibility.ByQuery, as: AccessByQuery
alias Acl.GraphSpec.Constraint.Resource.AllPredicates, as: AllPredicates
alias Acl.GraphSpec.Constraint.Resource.NoPredicates, as: NoPredicates
alias Acl.GraphSpec.Constraint.ResourceFormat, as: ResourceFormatConstraint
alias Acl.GraphSpec.Constraint.Resource, as: ResourceConstraint
alias Acl.GraphSpec, as: GraphSpec
alias Acl.GroupSpec, as: GroupSpec
alias Acl.GroupSpec.GraphCleanup, as: GraphCleanup
The result of the user_groups
function is a list of GroupSpec objects. Every such object specifies a group access rule.
The properties of a GroupSpec
are:
- name: the name of the group specification rule. Must be unique.
- useage: a list of usage restrictions, from the set
:read
,:write
,:read_for_write
.:read
and:write
are obvious,:read_for_write
means the data can be read while doing update queries only. - access: defines which users belong to the group. See Access rules.
- graphs: array of graph specifications describing the graph(s) in which the data of the group is stored.
It's worth noting that a group access rules doesn't map one-on-one with traditional user groups. On the contrary, a group access rule may be applicable for multiple users groups. For example, a group access rule may specify the access rule for administrators of a company, but there is an administrator user group for each company (e.g. a user group "administrators of company X", a user group "administrators of company Y", etc.).
An access rule determines whether a user complies with a group specification and to which exact user group(s) he belongs.
There are two kinds of access rules: AlwaysAccessible
and AccessByQuery
.
A rule that simply gives access to all users regardless of their session information.
A rule that gives access according to a certain SPARQL query.
An AccessByQuery
object has two properties:
- query: a SPARQL query string that computes the access for the current user. This query is typically based on the information attached to the user's session.
<SESSION_ID>
can be used as a placeholder in the SPARQL query and is replaced with the URI of the current session at runtime. - vars: array of strings specifying the relevant variables exported/returned by the SPARQL query. The names should exactly match the variable names as returned in the
SELECT
block of the query. The variables will be consumed by the correspondingGraphSpecs
of the group access rule.
An access rule granting access to all users with the role SuperMegaAdmin
looks as follows:
%AccessByQuery{
vars: ["session_group_id","session_role"],
query: "PREFIX ext: <http://mu.semte.ch/vocabularies/ext/>
PREFIX mu: <http://mu.semte.ch/vocabularies/core/>
SELECT ?session_group ?session_role WHERE {
<SESSION_ID> ext:sessionGroup/mu:uuid ?session_group_id;
ext:sessionRole ?session_role.
FILTER( ?session_role = \"SuperMegaAdmin\" )
}
" }
Make sure to correctly escape special characters in the query.
If the query returns any results, the user has access. For every set of returned variables, a graph URI is resolved to use in the GraphSpecs
part of the group access rule.
A GraphSpec
object defines which graph is created and which triples are added to this graph based on constraints on the data.
Each of these GraphSpec
object holds the following properties:
- graph: the (base) URI of the graph that should be created. The URI is appended with the results of the
vars
property of theGroupSpec
's access rule. For instance, if the base graph URI ishttp://myorganization.org/group/
and thevars
array of the access rule is['group_id', 'group_name']
, the graph being created will be something likehttp://myorganization.org/group/9b2a5053-0967-425c-a1ee-c9b9bfe38b81/awesome_admins
. - constraint: the constraint that determines which triples should be sent to/read from the graph.
The graphs
property of a GroupSpec
object contains an array of GraphSpec
objects. I.e. data accessible for a group may be spread across multiple graphs. The constraint
property defines in which graph a triple is stored.
A constraint defines which triples will be sent to or read from a graph. This section makes abstraction of whether triples are read from or written to the graph as the principle is the same. Both actions will be described as 'sent to the graph'.
There are two kinds of constraints, a ResourceFormatConstraint
and a ResourceConstraint
.
A ResourceFormatConstraint
defines a constraint on the format of the subject URI of a triple. It has just one property resource_prefix
which contains a URI. All triples having a subject URI starting with this prefix will be sent to the graph.
In the example below all triples having a subject that starts with http://data.myorganization.org/
will be sent to the graph http://mu.semte.ch/graphs/public
.
%GraphSpec{
graph: "http://mu.semte.ch/graphs/public",
constraint: %ResourceFormatConstraint{
resource_prefix: "http://data.myorganization.org/"
} }
A ResourceConstraint
allows to put a constraint on the type (rdf:Class) of a resource and/or the predicate of a triple.
A ResourceConstraint
has the following properties:
- resource_types: a list of type URIs. Resources with this type will be sent to the graph.
- predicates: an additional constraint put on the predicates of such resources that will be sent to the graph. This property is optional. This constraint is either an
AllPredicates
(default) or aNoPredicates
object with an optionalexcept
array of predicate URIs.
In the first example below no properties of a foaf:Person
will be written to the graph http://mu.semte.ch/graphs/public
, expect for the person's foaf:name
and foaf:accountName
. In the second example all properties of a person will be written to the graph http://mu.semte.ch/graphs/public
except for the person's foaf:birthday
.
%GraphSpec{
graph: "http://mu.semte.ch/graphs/public",
constraint: %ResourceConstraint{
resource_types: [
"http://xmlns.com/foaf/0.1/Person" ],
predicates: %NoPredicates{
except: [
"http://xmlns.com/foaf/0.1/name",
"http://xmlns.com/foaf/0.1/accountName" ] }
} }
%GraphSpec{
graph: "http://mu.semte.ch/graphs/public",
constraint: %ResourceConstraint{
resource_types: [
"http://xmlns.com/foaf/0.1/Person" ],
predicates: %AllPredicates{
except: [
"http://xmlns.com/foaf/0.1/birthday" ] }
} }
For each update query mu-authorization executes it calculates the delta, i.e. which triples have been added and which triples have been removed. These delta messages can be sent to interested clients.
The services allows to configure which clients will receive those delta messages. This file needs to be accessible on /config/delta.ex
inside the docker container.
The delta config file specifies a list of target services to send the delta messages to. The docker service names - as they are known to the mu-authorization service - can be used in the URLs.
defmodule Delta.Config do
def targets do
[ "http://delta-notifier" ]
end
end
Currently the most used client for delta messages is the delta-notifier which allows to forward delta requests to other services based on pattern matching.
The format of the delta messages is explained in the README of the delta-notifier.
The mu-authorization service is a layer that is placed in front of a SPARQL endpoint. The service makes some assumptions regarding the SPARQL endpoint which can be altered. The location of the triplestore as well as alterations to support specific triplestores can be configured.
The default SPARQL endpoint mu-authorization sends SPARQL queries to can be configured with the MU_SPARQL_ENDPOINT
environment variable. The default is configured on http://database:8890/sparql
when running inside a standard container but you can override it in the docker-compose.yml
file. In case this variable is not set (as would be the case in a standard development setup), http://localhost:8890/sparql
will be used as a default setting.
Compatibility layers can rewrite SPARQL queries to be in line with the expectations of a triplestore. Use the DATABASE_COMPATIBILITY
environment variable.
Sometimes triplestores can be a bit cranky. mu-authorization tries to create sensible and valid SPARQL queries that express what it intends to achieve. Although it's still a work in progress, it's the goal to generate clean SPARQL queries. A triplestore could barf on a query but work with a rewritten version. The compatibility layer solves that.
Supported values for DATABASE_COMPATIBILITY
are:
Raw
: (default) Don't alter queries.Virtuoso
: Support for Virtuoso. Currently rewrites DELETE DATA to DELETE WHERE.
When writing SPARQL queries, you should not query graphs directly (except when in 'sudo'-mode). Mu-authorization will rewrite the query and handle which graphs the user has access to, regardless of the graphs specified in the query.
Mu-authorization implements most of the SPARQL 1.1 specification but there are some limitations. The following items are not supported by mu-authorization at the moment:
- inline comments (ex.
# this is a comment
) DESCRIBE
queries (ex.DESCRIBE <http://example.org/>
)- Graph operations (ex.
DROP GRAPH <http://my.graph>
)
Further in "sudo" mode (with direct access to graphs) the following is not supported:
Logging can be configured using environment variables. These properties can be set in the environment block in the docker-compose.yml
.
Flags which can be either on or off translate the environment variable string to an understood configuration. Following are considered true: ["true"
, "yes"
, "1"
, "on"
], all other strings are considered to be false.
LOG_ERRORS
: Logs the errors, turned on by defaultLOG_OUTGOING_SPARQL_QUERIES
: Logs outgoing SPARQL queries by printing them on the consoleINSPECT_OUTGOING_SPARQL_QUERIES
: Logs outgoing SPARQL queries by inspecting them (native string format)LOG_INCOMING_SPARQL_QUERIES
: Logs incoming SPARQL queries by printing them on the consoleINSPECT_INCOMING_SPARQL_QUERIES
: Logs incoming SPARQL queries by inspecting them (native string format)LOG_SERVER_CONFIGURATION
: Logs some information about the server configuration on bootLOG_ACCESS_RIGHTS
: Logs the incoming access rightsINSPECT_ACCESS_RIGHTS_PROCESSING
: Logs the processing of the access rights (verbose debugging)LOG_DELTA_MESSAGES
: Allows logging of delta messages as they're sent to other consumersLOG_DELTA_CLIENT_COMMUNICATION
: Allows logging of the communication caused by the delta systemLOG_TEMPLATE_MATCHER_PERFORMANCE
: Emits information regarding use of precalculated templates when analysing queriesLOG_OUTGOING_SPARQL_QUERY_RESPONSES
: Logs the responses coming back from the backing triplestoreINSPECT_OUTGOING_SPARQL_QUERY_RESPONSES
: Inspects the responses coming back from the backing triplestoreLOG_OUTGOING_SPARQL_QUERY_ROUNDTRIP
: Logs both the request and the response to/from the backing triplestore closely together in the logsLOG_WORKLOAD_INFO_REQUESTS
: Logs workload information to the console when it is requested through an http call
Complex SPARQL queries can take a long time to process and execute. The time mu-authorization is allowed to spend on this processing and execution before timing out can be configured through the following environment variables:
QUERY_MAX_PROCESSING_TIME
: Maximum amount of time in milliseconds in which the query should have been processed. Beyond this time, a 503 response may be sent. Take into account that, in practice, the actual consumed time might be substantially larger than the configured time.QUERY_MAX_EXECUTION_TIME
: Maximum amount of time in milliseconds in which a single query to the triplestore should have been processed. If the query takes longer the connection will be closed and the query failure mechanism will be executed potentially executing the same query again. In case of Virtuoso, a similar setting exists in the virtuoso.ini.
It is possible too many queries are sent to mu-authorization, making it go in overload. An experimental system exists to limit this overload.
DATABASE_OVERLOAD_RECOVERY
: Enables the overload recovery system. Note that this has not been tested and may not yet help in case of failure.LOG_DATABASE_OVERLOAD_TICK
: Logs a message whenever the database overload system executes a recalculation. Helps to see if it's still alive.
The service also provides a /recovery-status
endpoint to get some info on the status of the recovery system.
Some configuration doesn't fit in previous topics. These settings are described in this section.
ERROR_ON_UNWRITTEN_DATA
: It may be that you request to write manipulations which will not be written to the triplestore because you do not have the necessary rights. Turning this flag on will make the full manipulation fail in that case.TESTING_AUTH_QUERY_ERROR_RATE
: Chance a query should fail to execute when trying to test fault-tolerance of consuming services, float ranging [0,1]
- Authorization examines the graphs the user has access to when writing triples and only writes to graphs a triple belongs to. If no such graph exists, nothing is written to the endpoint. A 201 status code is returned nonetheless.
- Services should always strive to use SEAS to access the database. If session information is not necessary or should not be applied because the service validates access rights in its own way, the header
mu-auth-sudo
should be set totrue
in the SPARQL request sent to the service. - not all services can always use the SEAS because some triple patterns may not be understood by the service's rewrite rules. Note that a service should strive to be compliant with the SEAS service and I have yet to see a case where this is not possible. In a case where it is not possible to use SEAS, the service needs to write it's data to all graphs SEAS would normally write to. This is tough, hence the advice to always use SEAS.