Important
This project has been superseded by https://github.com/ggalmazor/lttb_downsampling
These implementations are based on the paper "Downsampling Time Series for Visual Representation" by Sveinn Steinarsson from the Faculty of Industrial Engineering, Mechanical Engineering and Computer Science University of Iceland (2013). You can read the paper here
The goal of Largest-Triangle downsampling algorithms for data visualization is to reduce the number of points in a number series without losing important visual features of the resulting graph. However, it is essential to know these algorithms are not numerically correct.
See how this algorithm compares to other algorithms designed to keep local extrema in the input series at ggalmazor.com/blog/evaluating_downsampling_algorithms.html
Latest version: 0.1.0
You can add this library to your Maven/Gradle/SBT/Leiningen project using a couple of source repositories.
Please follow the instructions at the JitPack.io page for this project. Gradle example:
allprojects {
repositories {
maven { url 'https://jitpack.io' }
}
}
dependencies {
implementation 'com.github.ggalmazor:lt_downsampling_java8:0.1.0'
}
Please follow the instructions at the GitHub Package Repository for this project. Gradle example:
repositories {
maven {
url = uri("https://maven.pkg.github.com/ggalmazor/lt_downsampling_java")
credentials {
username = project.findProperty("gpr.user") ?: System.getenv("USERNAME")
password = project.findProperty("gpr.key") ?: System.getenv("TOKEN")
}
}
}
dependencies {
implementation 'com.github.ggalmazor:lt_downsampling_java8:0.1.0'
}
This version of the algorithm groups numbers in buckets of the same size and then selects the point that produces the largest area from each bucket with points in neighboring buckets.
You can produce a downsampled version of an input series with:
List<Point> input = Arrays.asList(...);
int numberOfBuckets = 200;
List<Point> output = LTThreeBuckets.ofSorted(input, numberOfBuckets);
The first and last points of the original series are always in the output. The rest are grouped into the defined number of buckets, and the algorithm chooses the best point from each bucket, resulting in a list of 202 elements.
- This library must provide lists of instances of the
Point
supertype. - It also provides and uses internally the
DoublePoint
subtype, which can also be used to feed data to the library. - However, users can create implementations of
Point
that best fit their Domain.
Not yet implemented
This is how a raw time series with ~5000 data points and downsampled versions (2000, 500, and 250 buckets) look like (graphed by AirTable)
These are close-ups for 250, 500, 1000, and 2000 buckets with raw data in the back: