DataHelix Generator

The generation of representative test and simulation data is a challenging and time-consuming task. Although DataHelix was created to address a specific challenge in the financial services industry, you will find it a useful tool for the generation of realistic data for simulation and testing, regardless of industry sector. All this from a straightforward JSON data profile document.

DataHelix is a proud member of the Fintech Open Source Foundation and operates within the FINOS Data Technologies Program.

Key documents

For information on how to get started with DataHelix see our Getting Started guide.
For information on the syntax of DataHelix profiles see the User Guide.
For information on how to contribute to the project, and more technical information about DataHelix, see the Developer Guide.
For a high level road map see Road Map.

The Problem

When performing a wide range of software development tasks - functional or load testing on a system, prototyping an API or pitching a service to a potential customer - sample data is a necessity, but generating and maintaining it can be difficult. The nature of some industries makes it particularly difficult to cleanly manage data schemas, and sample datasets:

Regulatory and methodological change often forces data schema changes.
It is often difficult to completely remove legacy data due to obligations to maintain deprecated products. Because of this, schemas tend to be progressively complicated with special cases and exceptions.
Errors can be costly and reputation-damaging.
For legal and/or privacy reasons, it is normally impossible to include real data in samples.

For all the above reasons, it is common to handcraft sample datasets. This approach brings several problems:

It costs significant time up-front, and thereafter every time the schema changes.
It's very easy to introduce errors.
The sample data is unlikely to exhaustively cover all test cases.
The sample data is not self-documenting, and documentation is likely to become out of date.

For data generation, partial solutions are available in services/libraries such as TSimulus, Mockaroo or GenRocket. However, these have limitations:

They are limited to relatively simple data schemas with limited dependencies between fields
None of them offer a complete end-to-end solution of profiling existing data to discover trends and constraints, generating from those constraints, and validating against them.
Complex behaviour (if available) is modelled in an imperative style, forcing the user to design the process for generating the data using the library's toolbox, rather than a declarative style that describes the shape of the data and leaves it to the library to determine how to create it.

The Mission

We aim to solve (at least) the following user needs:

"I want to generate test cases for my validation procedures."
"I want to generate sample data to document my API, or to use in a non-production version of my API."
"I want to validate some data against a known specification or implementation."
"I want to measure my existing test data's coverage against the range of possible data."
"I want to generate an exhaustive set of data, for testing my API's robustness."

The Product

A suite of tools:

To generate data based on a declarative profile, either from the command-line, or through a restful API which can be called manually or through a web front end.
To create a data profile from a dataset, including identifying constraints and relationships between the dataset's fields, so that similarly-shaped mock data can be generated using the profile.
To validate a dataset against a data profile.

Contributing

Fork it (https://github.com/yourname/yourproject/fork)
Create your feature branch (git checkout -b feature/fooBar)
Read our contribution guidelines and Community Code of Conduct
Commit your changes (git commit -am 'Add some fooBar')
Push to the branch (git push origin feature/fooBar)
Create a new Pull Request

NOTE: Commits and pull requests to FINOS repositories will only be accepted from those contributors with an active, executed Individual Contributor License Agreement (ICLA) with FINOS OR who are covered under an existing and active Corporate Contribution License Agreement (CCLA) executed with FINOS. Commits from individuals not covered under an ICLA or CCLA will be flagged and blocked by the FINOS Clabot tool. Please note that some CCLAs require individuals/employees to be explicitly named on the CCLA.

Need an ICLA? Unsure if you are covered under an existing CCLA? Email [email protected]

License

Distributed under the Apache License, Version 2.0.

SPDX-License-Identifier: Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 6,174 Commits
.circleci		.circleci
.github		.github
.idea		.idea
buildspec		buildspec
chocolatey/datahelix		chocolatey/datahelix
common		common
core		core
custom		custom
docs		docs
examples		examples
gradle/wrapper		gradle/wrapper
orchestrator		orchestrator
output		output
playground		playground
profile		profile
.codecov.yml		.codecov.yml
.editorconfig		.editorconfig
.gitignore		.gitignore
.whitesource		.whitesource
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE.spdx		LICENSE.spdx
NOTICE		NOTICE
README.md		README.md
ReleaseNotes.md		ReleaseNotes.md
build.gradle		build.gradle
commit-check.sh		commit-check.sh
docker-build.sh		docker-build.sh
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
header-update.sh		header-update.sh
header_cucumber.txt		header_cucumber.txt
header_java.txt		header_java.txt
license-check.sh		license-check.sh
settings.gradle		settings.gradle
whitesource.config		whitesource.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

DataHelix Generator

Key documents

The Problem

The Mission

The Product

Contributing

License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

matthewdunsdon/datahelix

Folders and files

Latest commit

History

Repository files navigation

DataHelix Generator

Key documents

The Problem

The Mission

The Product

Contributing

License

About

Resources

License

Licenses found

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages