Technical Design

Introduction

This page documents the technical design that has been implemented within in Stepup. This Technical Design has been written in retrospect to help future developers/development. Therefore the important parts are detailed below, it is not an as exhaustive document as when writing the technical design up front.

Platform Overview

High-Level Overview

The Stepup Platform has been split into 4 functional components, each is implemented as a separate application:

Self Service, the application where users can register their own second factor.
Registration Authority, the application where Registration Authorities can vet the registered second factors as well as where Registration Authority Administrators can manage the registration authorities themselves.
Gateway, the application that contains the SAML gateway. In the gateway the required LoA can be verified with the vetted Second Factors to grant additional confidence about the logged in Identity.
Middleware, the authoritative application with respect to all the data handled within the platform

Each of the applications are detailed in following sections.

The Self Service, Registration Authority and the Gateway are publicly accessible, the Middleware is not. Furthermore the Self Service and Registration Authority applications do not have an own data-store, all their data is governed by the Middleware application. The middleware application allows (limited) modification of data and reading of data through an API. The Gateway has its own data-store, however the data is still governed by the Middleware, it is read-only for the Gateway. The rationale behind this is that even if the Self Service, Registration Authority and Middleware applications fail, the Gateway must be able to keep functioning as this provides the functionality core to the platform.

The following image illustrates the application topology within the Step-up platform.

application overview

The (not-)publicly accessible boundary is illustrated with the horizontal dotted line, the dependency between applications is bounded by the vertical dotted line. Data flows are illustrated by the arrow, pointing in the direction which the data flows.

The dependency boundary is not a hard one; although the Gateway may not depend on any other application, the Self Service and Registration Authority applications do have a dependency on the Gateway. The Gateway exposes APIs to use to verify Yubikey OTPs as well as send OTPs via SMS. Due to requirements on usage statistics and various other log this was initially designed to be in the Middleware, however that would have meant that the Gateway would have had a dependency on the Middleware. It was unwanted to introduce a new application specific for this purpose (and that would still give the Gateway a dependency) and therefor the Gateway implements the required functionality and exposes an API for the other applications.

Separate Components

In order to optimize reuse of code, shared functionality was developed as self-contained packages, each with its own repository. Composer is used to include the packages in the various applications.

Stepup Bundle

The Stepup-Bundle is a collection of functionality that is shared between the applications. For instance all applications should use monolog in a specific way in order to be able to log according to the requirements. This bundle exposes monolog functionality that can be used by the applications to implement the logging strategy. Another example is exception handling; the functionality required for proper and consistent exception handling in the user facing applications is supplied by this bundle. Having this bundle allows creating a lot of shared services without duplicating the required code across the applications, making an update in the way of logging as simple as changing one library and running a few composer updates. Furthermore it allows us to leverage standardisation even further as we expose a set of value objects for commonly used values.

Functionality

This is a non-exhaustive list of the functionality supplied by the stepup-bundle

LoA definition, configuration and Resolution service
Extended logging functionality
- Custom formatters, handlers and processors
- Support for RequestID logging
SMS OTP sending API client
Yubikey OTP verification API client
RequestID injection service for Guzzle (allows for a single end-user request to be tracked across the various applications)
A cryptographically secure OTP generator (used to generate SMS OTPs and registration codes)
Locale extension, providing functionality to switch locales in an application
JsonConvertible interface and handler, allowing to easily decode JSON request bodies to specific objects

Stepup Middleware Client Bundle

Within the Stepup Middleware Client Bundle two distinct components can be found: the bundle itself and the library the bundle integrates in the Symfony2 framework.

The Middleware Client provides an API-client to communicate with the Middleware API. Through the use of specific Query objects as well as Commands the full API can be consulted through the client. The client itself is able to use the provided Queries and Commands to send GET-parameter or JSON based requests to the Middleware API. It then translates the received JSON back to pre-defined datastructures (read: arrays). Furthermore it takes care of gracefully handling the various non-OK responses and logging.

The bundle provides an integration layer on top of the client. It exposes this integration layer by registering various services within the Symfony Dependency Injection Container. At the same time, for read requests, it provides functionality that takes the datastructures the client returns and translates those to distinct collections and DTOs. This creates a good separation of concerns between the client and the bundle whilst being flexible enough on both accounts to be able to adapt to changes.

Messagebird API Client Bundle

This bundle provides a thin client to be able to communicate effectively with the messagebird API, including adhering to the various logging requirements as handling the documented responses from Messagebird. The bundle itself integrates the client with Symfony2, exposing a set of services for ease of use of the client.

Yubikey API Client Bundle

Akin to the Messagebird API client bundle, this bundle provides a thin client to be able to communicate with the Yubikey OTP verification API. This also includes adhering to the various logging requirements as handling the documented responses from Yubikey, relaying the information if required. The bundle itself integrates the client with Symfony2, exposing a set of services for ease of use of the client.

Logging

Logging is an important component of the platform. Since the platform controls access to potentially very sensitive data that manged by the services that rely on it, logging is a key part in the auditability of the system. In addition the platform itself will contain a growing amount of so called Personally Identifiable Information (PII). At any given time it should be possible to derive from the logs what was going on at point X in time on server Y or within application Z. In order to ensure this, a logging strategy has been designed. This strategy is used to ensure we log sensible messages without conveying any sensitive data. At the same time there are strict requirements on the logging system: it should be both locally persistent and aggregated. The rationale is that should a node be compromised or detached from the network that the local logs will always provide insights into what happened. The log aggregator helps with auditing and monitoring of the platform.

Initially the platform was designed to use Graylog2 as log aggregator, however in the end it was decided to use rsyslog with an ELK stack (elastic search, logstash and kibana) as log aggregator.

Open Source

The whole platform is developed as open-source and can be freely downloaded through Github. This means that every repository must include the correct LICENSE file, as well as the correct copyright header in each file. The whole platform has been developed under the OSI-approved copyleft Apache2 license. The copyright and license header has been further documented on the License and copyright page.

For developers this has some small implications, mainly in that they should realize that the whole project history can be read by everyone; therefore clear and concise commit messages should be used.

Self Service

Stepup-SelfService is the application in which end-users are allowed to register a new second factor for them to use. The application has been designed to work without a data-store, instead it requests all the data it needs from the Middleware and issues any commands for data to be written to the Middleware. The application design therefore focuses on providing a thin UI layer on top of the various APIs. In order to allow users to prove the possession of a specific second factor it makes use of the Gateway API.

Security

For the Self Service a custom Symfony Firewall has been created. This firewall covers the whole application with the exception of static assets (css, js, images). Users are authenticated using SAML2, relying on the IdP to provide the credentials.

The firewall implements the following logic (pseudo-code):

if ($loginTokenStorage->hasLoginToken()) {
  // user already logged in
  return;
}

// first visit no SAML AuthnRequest has been made yet, so we do that
if (!$samlProvider->transactionIsStarted()) {
    // save the current URL to be able to send the user back
    $stateHandler->saveCurrentUrl($symfonyRequest);

    // return an AuhnRequest.
    return $samlProvider->initiateSamlAuthnRequest();
}

// attempt to process the response, any errors should be caught
$assertionData = $samlProvider->processResponse($symfonyRequest);

// check with the middleware if there is an identity
$identity = $middlewareClient->findByNameIdAndInstitution($assertionData->nameId, $assertionData->schacHomeOrganisation);

// if there is no identity
if (!$identity) {
    // create a new one
    $identity = new Identity($assertionData);

    // save it
    $middlewareClient->issueCommand(new CreateIdentityCommand($identity));
} elseif ($identity->name !== $assertionData->commonName || $identity->email !== $assertionData->email) {
   // check if the commonname and email are still the same (they may change, i.e. when getting married)

    $command = new ChangeNameOrEmailCommand($assertionData->commonName, $assertionData->email);

    $middlewareClient->issueCommand($command);
}

// user identified, log in the user...
$loginTokenStorage->storeToken(new SamlToken($identity));

// ... and redirect the user back to the URL (s)he visited originally.
return new RedirectResponse($stateHandler->getOriginalUrl());

Registration Authority

The Registration Authority (RA) application provides various RA and Second Factor management features. Like Stepup-SelfService this application has been built as a thin UI client over the required APIs. It has no internal data store, instead it requests all the data it needs from the Middleware and issues any commands for data to be written to the Middleware. Second Factor Verifications all go via the Gateway API.

Security

For the Registration Authority application a custom Symfony Firewall has been created. This firewall covers the whole application with the exception of static assets (css, js, images). Users are authenticated using SAML2, relying on the IdP to provide the credentials. Furthermore the role(s) of the user are determined based on the credentials stored in the Middleware. The Registration Authority is aware of three different roles:

RA The Registration Authority. This is the "standard" user of the Registration Authority. These users are allowed to vet and revoke second factors for registrants within their institution. They can also view the auditlog for registrants within their institution.
RAA Registration Authority Administrator. In essence the is de application administrator for an institution. These users are allowed to promote Identities to RA or RAA, and manage the accompanying data (location, contact information). Furthermore they can change the current roles of RA(A)'s.
SRAA Super Registration Authority Administrator. The SRAA is a special role that can only be granted through configuration. SRAAs have the ability to switch Institutions within the RA and furthermore are allowed to do anything that an RAA can do within any organisation. This role has been introduced to allow the operators of the Stepup system to:
- Create new RAAs in an institution, required for bootstrapping the first RAA in an institution
- Support users

The firewall implements the following logic (pseudo-code):

if ($loginTokenStorage->hasLoginToken()) {
  // user already logged in
  return;
}

// first visit no SAML AuthnRequest has been made yet, so we do that
if (!$samlProvider->transactionIsStarted()) {
    // save the current URL to be able to send the user back
    $stateHandler->saveCurrentUrl($symfonyRequest);

    // return an AuhnRequest.
    return $samlProvider->initiateSamlAuthnRequest();
}

// attempt to process the response, any errors should be caught
$assertionData = $samlProvider->processResponse($symfonyRequest);

// check with the middleware if there is an identity
$identity = $middlewareClient->findByNameIdAndInstitution($assertionData->nameId, $assertionData->schacHomeOrganisation);

// if there is no identity, deny access
if (!$identity) {
    // deny access
    throw AuthenticationException();
}

// if the granted loa is not high enough, deny access
// this can only occur due to configuration error
if (!$loaResolutionService->isLoaHigherThenOrEqualTo($asssertionData->loa, $requiredLoa)) {
    throw AuthenticationException();
}

// fetch the RegistrationAuthorityCredentials
$credentials = $middlewareClient->getRegistrationAuthorityCredentialsFor($identity);

// no credentials -> deny access
if (!$credentials) {
    // deny access
    throw AuthenticationException
}

// retrieve the roles (RA, RAA, SRAA) from the credentials
$roles = $credentials->getRolesAsArray();

// (S)RA(A) identified, log in the user with the correct roles...
$loginTokenStorage->storeToken(new SamlToken($identity, $roles));

// ... and redirect the user back to the URL (s)he visited originally.
return new RedirectResponse($stateHandler->getOriginalUrl());

Gateway

The Gateway is the most used part of the platform, this is the application that enables the vetted second factors to be used. It acts as a SAML Gateway, proxying AuthnRequests to SURFconext. It only supports SP initiated login, not IdP initiated.

Security

For the Gateway no specific firewall has been built as it does not have the concept of users. Instead it acts like one big firewall. The exception to this is the SAML metadata it publishes, this metadata can always be freely retrieved. The gateway is publicly accessible as long as an AuthnRequest is sent to the SSO url or a SAML response is sent to the Assertion Consumer Service URL.

Furthermore there are two APIs that can be reached by configured API-clients. Lastly there are Generic SAML Stepup Provider URLs which are available for the configured Service Providers. More on this can be found in their respective sections.

LoA Determination

After the SAML reponse has been parsed and validated according to the rules of SAML, it is checked if there are LoA requirements. The required LoA's are retrieved from the following sources:

The possible required AuthnContextClassRef in the AuthnRequest
The default required LoA configured for the SP
The default required LoA configured for the IdP
A possible required LoA configured for the SP when logging in at that IdP
A possible required LoA configured for the IdP when coming from that SP

Once all LoA's have been gathered, the highest required LoA is determined. If the highest required LoA is the configured intrinsic LoA, an immediate response is given, adding the LoA to the AuthnContext.

If the LoA is higher than the configured intrinsic LoA, the Vetted Second Factors for the Identity are loaded. Second Factors that cannot grant the required LoA are filtered out, leaving only the Second Factors that can grant the required LoA (or higher). For each of these Second Factors it is checked whether or not the Institution is on the whitelist. If the institution is not on the whitelist, the Second Factor is rejected. If no Second Factors are left, the Gateway cannot grant the required LoA, thus a NoAuthnContext reponse is sent. If a Second Factor is left, the user is challenged to complete the Second Factor verification. Upon successful Second Factor verification, the user is sent back to the SP with a response to the original AuthnRequest containing an assertion with the granted LoA in the AuthnContext.

SAML Failure Handling

Ways in which the Gateway can respond with a Failed response to the SP are:

Status: Requester; Substatus: Request Unsupported; used when the SP requests an unknown LoA
Status: Responder; used when we cannot process the Response
Status: NoAuthContext; used when the required LoA cannot be granted

Lastly, there are certain cases in which the Gateway does not know how to respond or it simply cannot respond, e.g. when the AuthnRequest cannot be processed. In that case the user is presented with an error page detailing that an error has occurred and that (s)he should return to the service (s)he was originally trying to visit.

SAML2

The simpleSAMLphp/saml2 library already provided support for SAML AuthnRequest. During the development of the Stepup Platform SAML Response parsing and validation has been added, as well as support for <saml:subject> within an AuthnRequest.

The bindings of the SAML2 library were not yet developed enough to use independently of simpleSAMLphp itself. Since in order to integrate the SAML2 library a bundle was required, the required Redirect and Post bindings have been implemented in the OpenConext/saml-bundle. These bindings are developed specifically for AuthnRequest (Redirect) and Response (Post) handling.

API

The Gateway exposes 2 APIs: one for sending SMS messages through Messagebird and one for Yubikey OTP verification through the Yubikey API. Clients for these APIs have been built into the Stepup Bundle as documented above. These APIs are secured with HTTP basic as they may only be accessed through the trusted network.

Furthermore a configurable SAML endpoint has been created to allow Generic SAML Stepup Providers to be attached to the Gateway, each with their own distinct URL.

Middleware

The Middleware application contains the domain model. It exposes command and query APIs for the Self-Service and Registration Authority applications. Also, it projects second factors, the institution whitelist and SAML entity configuration into the Gateway database.

The HTTP API is stateless in that never sets cookies. Authentication is performed using HTTP Basic Authentication.

Command Query Responsibility Segregation

In essence, the Command Query Responsibility Segregation (CQRS) paradigm segregates (separates) commands (writes) and queries (reads). Commands mutate a domain, whereas queries only read and may not cause side effects. This separation gives the developer the confidence that a query doesn't mutate the domain, whereas a command definitely does. Also, the write and read sides of the domain can be modelled altogether differently, allowing for commands and queries that specifically reflect a certain task within the problem domain. To accommodate scaling requirements, the write and read side can be deployed to different infrastructures optimised for certain I/O, security, availability and other requirements. Downsides to CQRS are added complexity and boilerplate code.

CQRS was chosen for openConext Stepup because it has a relatively complex domain which is managed through task-based UIs. Mutations through actions and forms can be reflected in commands and task-specific views can be based off of specialised read models. As the load on an OpenConext Stepup deployment increases, its write and read sides can be optimised separately.

Event Sourcing

CQRS also happens to be a good match for Event Sourcing. Event Sourcing is a paradigm that describes the derivation of the current state of the domain off of a history of events. As actions are performed on the domain, events are emitted that describe what happened, which are appended to a log. By replaying the log, the current state of the domain can be derived. This log also enables the developer to construct specialised read models or derive statistics. These read models can even be created post-factum, since all of the domain's history is available. This derivation of events is called projection. Within the context of Event Sourcing, read models are more commonly called projections.

At first, OpenConext-Stepup was to be backed by a classic Doctrine ORM setup with CQRS. Event Sourcing was first considered when the requirement of full auditability became apparent. Since the full history of the domain is available, every action ever made by all actors is recorded. Additionally, SURFnet is not yet sure what they want statistics and reports of. The ability to replay new projections accommodates these changing reporting requirements.

Currently, all projections are managed using Doctrine ORM. Doctrine ORM is a familiar tool to many PHP developers and facilitates creation and maintenance of projections. Should the read side become a performance bottleneck, this is the first candidate for a refactoring to Doctrine DBAL.

Added 2017-05-04, DvR For event replay, both full and partial, the ORM is a bottleneck. For functional reasons, projections must be immediately consistent. This means that for an event that causes changes in projections, those projections must be updated in the same transaction as the event. The UnitOfWork that Doctrine uses, tries to optimize by batching inserts, updates and deletes, first executing all inserts, then updates and then deletes. This can however cause conflicts, as a simple example:

event 1 causes projection A to insert <something>
event 2 causes projection A to delete that same <something>
event 3 causes projection A to insert that same <something>

The UoW then optimizes that to insert, insert, delete, which can cause a duplicate key conflict. Because of this, each event during a full replay must run in it's own transaction. This significantly slows down replays, which will become a problem in the future. The problem can be resolved by the removal of the ORM integrations for projections, see Pivotal#138363619.

Furthermore, the event replay tooling for full replays was aimed at and tooled for development, not full replays of production data. This has been corrected by adding a replay mechanism that allows selection of which events to replay, through which projectors. Those projectors should be 1-off projectors: projectors intended for a migration, not for continuous use. This allows for replays that can migrate projections, introduce new projections, etc. end of addition

Command pipeline

The aforementioned projections facilitate the read side of CQRS. The command pipeline facilitates the write side.

Before commands are handled (translated into actions on the domain), it must be ensured that the executing party is authorized to execute the command, that the command is filled out and contain the proper types of information. After a command is handled, the events emitted from the domain must be dispatched to listeners, like projectors.

These four concerns are have been assigned to as many stages and have been composed into a pipeline. Each stage can discard or transform a command. Should additional logic be concerned with the handling of commands, a stage can be inserted or replaced at the required position in the pipeline through configuration.

As it stands, an action on the domain must either atomically succeed or fail. This includes the persistence of events and projections, sending of e-mails et cetera. To this end, the entire command handling pipeline has been wrapped in Middleware and Gateway database transactions.

Security

Authentication to the HTTP API is performed using HTTP Basic Authentication. This poses no security risk, because (and as long as) the Middleware application is not publicly accessible. One can authenticate oneself as Self-Service or Registration application or as management (system operators).

The authorising command pipeline stage authorises handling of a command based on a command's marker interfaces. These interfaces are Surfnet\\StepupMiddleware\\CommandHandlingBundle\\Command\\SelfServiceExecutable, RaExecutable and ManagementExecutable and are compared against the authenticated user's roles.

Query API controllers authorise access individually by checking the authenticated user's roles.

Integration with External Systems

Generic SAML Stepup Providers

In order to integrate with Tiqr as a Second Factor a system has been created to allow for adding an arbitrary amount of Stepup Providers (Second Factor verification services). This has been done by creating a configurable SAML proxy that allows for sending SAML requests containing a <saml:subject>. This can be used to trigger the verification of a specific subject (read: second factor) by the Stepup Provider. By using a descriptive and unique name for each provider, multiple providers can be configured. Currently there is no limit as to the amount of providers that can be configured. The open SSO url can in principle be accessed by anyone, however the requester must be configured as a connected Service Provider. If the requester is not a configured Service Provider, a 403 response will be given.

Messagebird

For sending OTPs via SMS an integration has been built for the Messagebird SMS API. This has been created in the Messagebird API client bundle. An example usage is as follows:

$message = new \Surfnet\MessageBirdApiClient\Messaging\Message(
    'SURFnet',     // name of the sender
    '31612345679', // number the sms should be sent to
    'ABCD1234'     // content of the message to be sent
);

// using the symfony Dependency Injection Container
$messagingService = $container->get('surfnet_message_bird_api_client.messaging');

$result = $messagingService->send($message);

The result object can be queried as to what the response of Messagebird was:

// whether or not the message was successfully sent, delivered,
// scheduled to be delivered or buffered for delivery by Messagebird
$result->isSuccess();

// whether or not the message itself was valid as determined by Messagebird
$result->isMessageInvalid();

// whether or not the configured Messagebird accesskey was valid
$result->isAccessKeyInvalid();

// retrieves the original error(s):
$result->getRawErrors()

// retrieve the errors formatted
// e.g. '(#9) no (correct) recipients found; (#10) originator is invalid'
$result->getErrorsAsString();

The library takes care of all the logging and error handling, ensuring that only in exceptional cases where the library fails exceptions are thrown to be handled by the consumer of the library.

Yubikey

For the verification of Yubikey OTPs the Yubikey API client bundle. The usage of the exposed service is as follows:

$otp = new Surfnet\YubikeyApiClient\Otp('cccccccbtbhncdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcd');

// using the symfony Dependency Injection Container
$verificationService = $container->get('surfnet_yubikey_api_client.verification_service');

$result = $verificationService->verify($otp);

The result can be queried as to whether or not the verification was successful:

// whether or not the otp was verified successfully
$result->isSuccessful();

// get the error that Yubikey returned, matching one of the
// \Surfnet\YubikeyApiClient\Service\OtpVerificationResult constants
$result->getErrror();

The library takes care of all the logging and error handling, ensuring that only in exceptional cases where the library fails exceptions are thrown to be handled by the consumer of the library.

Technology

Libraries and Frameworks

Name	Description	Targeted Version (if relevant)	Rationale
Symfony2	Framework for application development	2.7 (LTS)	Battle-tested, well supported and widely used framework that allows for effective application developement; goto framework for Ibuildings
Monolog	PSR compliant logging library	1.13 (latest stable)	Almost the PHP standard logging library, providing numerous extension points, highly configurable, PSR-7 compliant and fits all the needs with respect to logging
Monolog	PSR compliant logging library	1.13 (latest stable)	Almost the PHP standard logging library, providing numerous extension points, highly configurable, PSR-7 compliant and fits all the needs with respect to logging
Doctrine	Database abstraction layer and Object-relational mapper	2.5 (latest stable)	Widely used, well supported, robust, extensible and fast DBAL and ORM library; tool of choice for Ibuildings
Doctrine Migrations	Database migration library to be used in combination with Doctrine2		Flexible and automated database migration generation, execution and tracking for Doctrine2, removes the need to write manual migrations
Guzzle	HTTP client library	4.2	Very extensible and pluggable, allows all requirements to be fulfilled, popular, well supported library
Broadway	Event Sourcing library	0.5 (latest version)	First professional grade Event Sourcing library for PHP. Albeit it is not officially stable yet, used in production by a range of applications; backed by a company; Removes the need to write the Event Sourcing capabilities from scratch
Twitter Bootstrap	Front-end component and scaffolding library	3.2	Allows for fast iteration on design, mobile compatible, easy to set up and integrate
Mopa Bootstrap Bundle	Integration of Twitter bootstrap with Symfony2	3 (dev version)	Fast to set up and easy to integrate twitter bootstrap within Symfony
JMS Translation Bundle	Extends the functionality of the Symfony2 Translation component	1.1	Extensible and flexible translation management, adds translation key extraction capabilities as well as translation management UIs to the application.
PHPUnit	PHP Testing framework		Default PHP testing framework
Mockery	PHP mock object framework		Allows for simple, fast, readable and flexible mocking removing the need to write complex mock, fake or stub objects so we can easily isolate the unit under test from external influence and behaviour
Twig	Template engine		Ships with Symfony2, has a custom DSL making it possible to write custom templates without having to known PHP; custom written templates (i.e. mail templates) can be sandboxed for security
Swiftmailer	Mail library		Ships with Symfony2, proven and flexible library that allows to send emails in a very configurable way, including spooling, forwarding, queueing, etc.
RMT	Release management tool for PHP projects		Easy release management, making releasing new versions a pre-defined, repeatable process.
QA-Tools	Set of Ibuildings standard QA-Tools		Set of tools that allow for fast feedback loops, helps with configuring the various code quality and code inspection tools.

Tooling

Name	Description	Rationale
Composer	PHP Package management system	De facto php package manager
Github	Web-based git repository hosting service	De facto git repository hosting service, integrates with all other tools.
Pivotal Tracker	Kanban based project management tool	Publicly available, free project management tool supporting the agile methodology chosen (kanban)
Scrutinizer-CI	Code quality inspections and static analysis platform	Free for open-source projects. Thorough analysis of code according to configurable rulesets. Enables developers to write higher quality code
Sensiolabs Insight	Symfony2 specific code quality analysis platform	Symfony2 specific code analysis, granting insight into how to achieve better architecture for Symfony2 based components and applications
Travis-CI	Continuous integration platform	Free for open-source projects, flexible and highly customizable. Allows for simple continuous integration, allowing for higher quality code to be written

Performance

The performance target is that any request should be handled by the application within 2 seconds. In order to be able to meet this requirement, the application has been designed with performance in mind.

One of the most important parts in order to achieve the wanted level of performance is to minimize the amount of API-calls being made (subject to latency) as well as minimizing the amount of traffic per API-call. As a general guideline, no end-user operation should trigger more than 2 API-calls (one read, one write) from the Self Service or Registration Authority application to the Middleware.

Secondly, the API has been designed to be very thin and transparent on the read side; as a general rule no heavy transformations should be done on the data that is exposed through the API, it is better and faster to optimize the projections for the API-requirements.

Should the API become a bottleneck in the future the first thing to attempt would be to remove the Doctrine2 ORM integration on the read side, this allows for native SQL queries and to transform the data directly from the database-given structure to JSON, eliminating the overhead that the ORM inherently has. The reason to not have this done just yet is two-fold:

it allows for better control over the data by using well known, expected and robust infrastructure (Doctrine2 ORM) that is easier to test and verify
we simply do not know if it is a problem, no performance tests have been done yet (DvR 23-06-2015)

As much as we have attempted to ensure a good performance, there are a few critical areas that are out of the application control that can influence the performance. These areas are:

Database for the database SURFnet has chosen a Galera Cluster as that allows relatively fast and easy scaling, as well as allowing multi-master replication and geographical distribution. Having geographical distribution (distributed across two different datacenters) does mean that the replication causes some overhead. The applications try to minimize the impact by attempting to always use a single transaction, however the latency caused by the replication cannot be controlled.
SAML Signature Verification as the size of the XML that has been signed increases, the signature verification takes longer. Sending large assertions can lead to a slowdown of the application. The amount of XML is not in control of the application. This is a lesser risk though, as the XML messages the Gateway receives come from another application within control of SURFnet, which is susceptible to the same problem.
Logging to NAS as has been documented above, all the logs go to a local file first, governed by rsyslog. The disks on which the logs are written are however mounted from a NAS. This may incur a performance penalty. This will be monitored and the log setup may change if needed.

Reliability

The applications themselves do not have any specific requirements with regards to reliability. Reliability is guaranteed by the infrastructure by using a double redundant setup across two datacenters. The infrastructure design is explicitly out of scope of the platform software development, however feedback and advice is appreciated.

The applications do however have to handle all possible errors in a graceful manner, they should never fail without logging as to why and attempting to give a sensible response. It is critical that the user is never presented with a blank screen without communication. Apart from graceful error handling, the system has been designed in such a way that there is no single point of failure, there is no single service that when down blocks all other applications from working. A single service might influence other applications though: for instance if the Middleware is down, the Self Service and Registration Authority will no longer work correctly, but the Gateway will still work. When designing new services and/or applications within the platform care must be taken to ensure it does not become a single point of failure.

Deployment & Provisioning

Deployment and provisioning are done through the use of various custom ansible scripts. For development a specific branch of the Stepup-Deploy repository must be used in order to get a working development environment. The requirements as well as the steps to execute to get a working development environment are outlined in the README file. We're currently working on unifying the 'dev' tree of Stepup-Deploy with the Stepup-Deploy that is used to deploy the production infrastructure. This work takes place in the 'dev-vm' tree of the Stepup-Deploy and the Stepup-VM repo's:

Provisioning and deployment of other environments are also scripted using ansible and can be found in the Stepup-Deploy repository. In order to be able to deploy a new version, new builds must be made. To ease the creation of new builds a specific build environment has been created that allows for rapid building of the various applications. The installation and working of the build environment is documented in the README of the Stepup-Build repository. Once an application has been build successfully it can be deployed with a single command.

Technical Design

Technical Design

Introduction

Platform Overview

High-Level Overview

Separate Components

Stepup Bundle

Functionality

Stepup Middleware Client Bundle

Messagebird API Client Bundle

Yubikey API Client Bundle

Logging

Open Source

Self Service

Security

Registration Authority

Security

Gateway

Security

LoA Determination

SAML Failure Handling

SAML2

API

Middleware

Command Query Responsibility Segregation

Event Sourcing

Command pipeline

Security

Integration with External Systems

Generic SAML Stepup Providers

Messagebird

Yubikey

Technology

Libraries and Frameworks

Tooling

Performance

Reliability

Deployment & Provisioning

Clone this wiki locally