Supplementary material for the article "Outcome-Oriented Predictive Process Monitoring: Review and Benchmark" by Irene Teinemaa, Marlon Dumas, Marcello La Rosa, and Fabrizio Maria Maggi.
This repository provides implementations of different techniques for outcome-oriented predictive business process monitoring. The aim of these techniques is to predict a (pre-defined) binary case outcome of a running (partial) trace.
The benchmark includes implementations of four sequence encodings (for further description, refer to the paper):
- Static encoding
- Last state encoding
- Aggregation encoding
- Index-based encoding
Moreover, the repository contains implementations of four bucketing methods (see the paper for more details):
- No bucketing
- KNN
- State-based
- Clustering
- Prefix length based
The benchmark experiments have been performed using four classifiers:
- Random forest
- Gradient boosted trees (XGBoost)
- Logistic regression
- SVM
Together with the code, we make available 22 datasets that were used in the evaluation section in the paper (2 datasets used in the paper are private). These datasets correspond to different prediction tasks, formulated on 8 publicly available event logs (namely, the BPIC 2011, BPIC 2012, BPIC 2015, BPIC 2017, Sepsis Cases, Hospital Billing, Road Traffic Fine Management, Production log event logs). These (labeled and preprocessed) benchmark datasets can be found at https://drive.google.com/open?id=154hcH-HGThlcZJW5zBvCJMZvjOQDsnPR.
If you use code from this repository, please cite the following paper:
@article{teinemaa2019outcome,
author = {Irene Teinemaa and
Marlon Dumas and
Marcello La Rosa and
Fabrizio Maria Maggi},
title = {Outcome-Oriented Predictive Process Monitoring: Review and Benchmark},
journal = {{TKDD}},
volume = {13},
number = {2},
pages = {17:1--17:57},
year = {2019}
}