The system stores Schemas (currently Avro, Protobuf or JsonSchema).
Schemas, which are immutable, are organized into Subjects, which are named sequences of Schema versions, such as a single table or file format evolving over time.
The server component exposes a REST/JSON API. The technology stack is wildfly-swarm (resteasy, hibernate, hibernate-search).
cd server mvn clean compile verify java -jar target/perspicuus-server-0.3.0-SNAPSHOT-thorntail.jar
The server uses two forms of storage: a relational database for canonical state (via hibernate) and a filesystem for full-text indexes (lucene via hibernate-search).
By default an in-memory H2 database is used. For production use, rewire the database via edits to pom.xml, project-stages.yml and persistence.xml in accordance with https://howto.wildfly-swarm.io/create-a-datasource/
Alternatively, the database configuration for an existing binary build can be overridden by command line environment, e.g.
java -Dswarm.datasources.data-sources.DataSourcePerspicuus.driver-name=${DB_DRIVER} \ -Dswarm.datasources.data-sources.DataSourcePerspicuus.connection-url=${DB_URL} \ -Dswarm.datasources.data-sources.DataSourcePerspicuus.user-name=${DB_USERNAME} \ -Dswarm.datasources.data-sources.DataSourcePerspicuus.password=${DB_PASSWORD}
wildfly-swarm also supports overriding configuration parameters by specifying additional configuration files, see https://reference.wildfly-swarm.io/configuration.html for details.
Note these post-build configuration mechanisms are limited to using database drivers which are part of the build (h2 and mysql). For other database types supported by hibernate, build with pom.xml changes to include the appropriate driver dependency.
Configuration for other aspects of the server e.g. logging and authentication, can likewise be changed at build time or by runtime overrides.
The server build system can construct container images for deployment to kubernetes or OpenShift. This is done via the fabric8 maven plugin, see https://maven.fabric8.io/ for details.
Run maven with the 'containerization' profile enabled. By default this targets OpenShift, so you should have an appropriate oc login context active in the shell environment from which you invoke it. Change the 'fabric8.mode' property in pom.xml to target kubernetes instead.
oc login -u someone mvn -P containerization clean verify
fabric8 OpenShift integration will create a binary ImageStream for the server. The command 'mvn fabric8:deploy' will then deploy the application to the cluster. However, the default configuration that results may not be desirable, so build time configuration of the image may be required.
The preferred way to deploy the built server image with flexible deployment time configuration is via the OpenShift Service Catalog (https://docs.openshift.org/latest/architecture/service_catalog/index.html) with a configuration template. (requires OC 3.6 or later with the service catalog enabled.) The fabric8 build tooling doesn’t yet (as of fall 2017) support Service Catalog integration, so manual deployment of the template is necessary. Note that the image stream should also be in the 'openshift' namespace, so login and set the context appropriately before running the image build step above.
oc create -f openshift_template.json -n openshift
Clients may use the REST API directly. For functionality that overlaps the Confluent Schema Registry, the API is compatible. Therefore most of the examples curl at https://github.com/confluentinc/schema-registry will work against the perspicuus server.
Note that, by default, authentication is required. For server builds that don’t override the authentication settings, clients can use 'curl --user testuser:testpass' when invoking the API.
There is a Java wrapper library for the API, provided in the 'client' module.
Building the client with integration tests enabled (which they are by default) requires that a local server be running. A bug in the wildfly-swarm tooling prevents this happening automatically, so start a server manually before building the client.
cd server; java -jar target/perspicuus-server-0.3.0-SNAPSHOT-thorntail.jar cd client; mvn clean install
The client binary can be consumed via maven dependency:
<dependency> <groupId>org.jboss.perspicuus</groupId> <artifactId>client</artifactId> <version>0.1.0-SNAPSHOT</version> </dependency>
Java client usage example:
import org.jboss.perspicuus.client.*; ... SchemaRegistryClient schemaRegistryClient = new SchemaRegistryClient("http://localhost:8080", "username", "password"); Schema schema = ... long schemaId = schemaRegistryClient.registerSchema(schema.getName(), schema.toString()); schemaRegistryClient.annotate(schemaId,"key", "value"); long groupId = schemaRegistryClient.createGroup(); schemaRegistryClient.addSchemaToGroup(groupId, schemaId); schemaRegistryClient.annotate(groupId, "key", "value");
The Java client additionally provides adaptors for working with schema from two sources: Spark and JDBC.
The REST API of the server is self-describing via OpenAPI (https://www.openapis.org/). Using this machine readable description of the API, tooling can generate client libraries for a number of languages, see e.g. https://swagger.io/swagger-codegen/ for details.
The swagger-ui project provides an automatically generated web based interface for exploring and experimenting with the API. To use it, download and run the pre-built willdfly-server:
curl "https://repo1.maven.org/maven2/org/wildfly/swarm/servers/swagger-ui/2017.10.0/swagger-ui-2017.10.0-swarm.jar" java -Dswarm.port.offset=1 -jar swagger-ui-2017.10.0-swarm.jar
Note the port offset is necessary where the perspicuus server is already running locally, as it will use port 8080, obliging the swagger-ui server to move up to port 8081. Access http://localhost:8081/swagger-ui/ in a browser and use http://localhost:8080/swagger.json to reference the perspicuus API description. TODO make the server play nice with api_key auth, as the UI browser won’t do username/password
http://cidrdb.org/cidr2017/papers/p44-deng-cidr17.pdf The Data Civilizer System
http://dl.acm.org/citation.cfm?id=2903730 Goods: Organizing Google’s Datasets
http://cidrdb.org/cidr2017/papers/p111-hellerstein-cidr17.pdf Ground: A Data Context Service