Skip to content

Commit

Permalink
Merge pull request #6 from AdaptiveBProcess/integration
Browse files Browse the repository at this point in the history
Refactory integration
  • Loading branch information
Mcamargo85 authored Oct 1, 2020
2 parents 973ac5d + 534995c commit f91e9af
Show file tree
Hide file tree
Showing 55 changed files with 1,733 additions and 277,396 deletions.
39 changes: 13 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# Learning Accurate Generative Models of Business Processes with LSTM Neural Networks
# DeepGenerator: Learning Accurate Generative Models of Business Processes with LSTM Neural Networks

The code here presented can execute different pre- and post-processing methods and architectures for building and using generative models from event logs in XES format using LSTM neural networks. This code can perform the next tasks:
The code here presented is able to execute different pre- and post-processing methods and architectures for building and using generative models from event logs in XES format using LSTM anf GRU neural networks. This code can perform the next tasks:


* Training embedded matrices for the activities and roles contained in an event log.
* Training LSTM neuronal networks using an event log as input.
* Generate full event logs using a trained LSTM neuronal network.
* Predict the remaining time and the continuation (suffix) of an incomplete business process trace.
Expand All @@ -15,43 +14,31 @@ These instructions will get you a copy of the project up and running on your loc

### Prerequisites

To execute this code, you need to install Anaconda in your system and create an environment using the *lstm_env.yml* specification provided in the repository.
To execute this code you just need to install Anaconda in your system, and create an environment using the *environment.yml* specification provided in the repository.

## Running the script

Once created the environment, you can perform each one of the tasks, specifying the following parameters in the lstm.py module, or by command line as is described below:

*Training embedded matrices:* this task is a pre-requisite to train an LSTM model, to execute it you need to specify the required activity (-a) as 'emb_training' followed by the name of the event log (-f), and define if the event-log has a single timestamp or not (-o):
*Training LSTM neuronal network:* To perform this task you need to set the required activity (-a) as 'training' followed by the name of the (-f) event log, and all the following parameters:

```
(lstm_env) C:\sc_lstm>python lstm.py -a emb_training -f Helpdesk.xes.gz -o True
```
*Training LSTM neuronal network:* To perform this task, you need to set the required activity as 'training' followed by the name of the event log, and all the following parameters:

* One timestamp Event-log (-o): define if the event-log has a single timestamp or not
* Implementation (-i): type of Keras LSTM implementation 1 cpu, 2 gpu
* LSTM activation function (-l): LSTM optimization function (see Keras doc), None to set it up as the default value.
* Dense activation function (-d): dense layer activation function (see Keras doc), None to set it up as the default value.
* optimization function (-p): optimization function (see Keras doc).
* Implementation (-i): type of keras lstm implementation 1 cpu, 2 gpu
* lSTM activation function (-l): lSTM optimization function (see keras doc), None to set it up as the default value.
* Dense activation function (-d): dense layer activation function (see keras doc), None to set it up as the default value.
* optimization function (-p): optimization function (see keras doc).
* Scaling method (-n) = relative time between events scaling method max or lognorm.
* Model type (-m): type of LSTM model specialized, concatenated, or shared_cat.
* Model type (-m): type of LSTM model one of specialized, concatenated, shared_cat, shared_cat_gru, specialized_gru or concatenated_gru.
* N-gram size (-z): Size of the n-gram (temporal dimension)
* LSTM layer sizes (-y): Size of the LSTM layers.

```
(lstm_env) C:\sc_lstm>python lstm.py -a training -f Helpdesk.xes -o True -i 1 -l sigmoid -d None -p Nadam -n max -m shared_cat -z 5 -y 50
```

*Generate full event log:* To perform this task, you need to set the required activity as 'pred_log' followed by the folder (-c) and model (-b) names to be used to generate the event logs. These folders and models were generated during the training task and can be found in the output_files folder. Additionally, you need to specify the maximum length of the predicted traces (-t). Finally, to store the results, it's necessary to define if you are executing the task as a single execution or if you are running other prediction instances (-x). If it's a single execution, the detailed results and individual measurements are stored in a subfolder called results. Otherwise, the results of all the running models are store in the output_files folder in a shared file:

```
(lstm_env) C:\sc_lstm>python lstm.py -a pred_log -c 20190228_155935509575 -b "model_rd_150 Nadam_22-0.59.h5" -t 100 -x False
(lstm_env) C:\sc_lstm>python lstm.py -a training -f Helpdesk.xes -i 1 -l None -d linear -p Nadam -n lognorm -m shared_cat -z 5 -y 100
```

*Predict the remaining time and suffix:* To perform this task, the only change with respect with the previous one is that you need to set the required activity as 'pred_sfx':
*Predictive task:* It is possible to execute various predictive tasks with DeepGenerator, such as predicting the next event, the case continuation, and the remaining time of an ongoing case. Similarly, it is possible to generate complete event logs starting from a zero prefix size. To perform these tasks, you need to set the activity (-a) as ‘predict_next’ for the next event prediction, ‘pred_sfx’ for case continuation and remaining time, and ‘pred_log’ for the full event log generation. Additionally, it's required to indicate the folder where the predictive model is located (-c), and the name of the .h5 model (-b). Finally, you need to specify the method for selecting the next predicted task (-v) ‘random_choice’ or ‘arg_max’ and the number of repetitions of the experiment (-r). **NB! The folders and models were generated in the training task and can be found in the output_files folder:

```
(lstm_env) C:\sc_lstm>python lstm.py -a pred_sfx -c 20190228_155935509575 -b "model_rd_150 Nadam_22-0.59.h5" -t 100 -x False
(lstm_env) C:\sc_lstm>-a pred_log -c 20201001_426975C9_FAC6_453A_9F0B_4DD528CB554B -b "model_shared_cat_02-1.10.h5" -v "random_choice" -r 1"
```
*Predict the next event and role:* To perform this task the only changes with respect with the previous ones are that you need to set the required activity as 'predict_next' and its not necesary to set the maximum trace length:

Expand All @@ -65,4 +52,4 @@ Models examples and experimental results can be found at <a href="http://kodu.ut

* **Manuel Camargo**
* **Marlon Dumas**
* **Oscar Gonzalez-Rojas**
* **Oscar Gonzalez-Rojas**
8 changes: 0 additions & 8 deletions compress_job

This file was deleted.

231 changes: 231 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
name: lstm_exp_v3
channels:
- defaults
dependencies:
- _tflow_select=2.3.0=mkl
- absl-py=0.9.0=py37_0
- alabaster=0.7.12=py37_0
- argh=0.26.2=py37_0
- astor=0.8.0=py37_0
- astroid=2.4.2=py37_0
- atomicwrites=1.4.0=py_0
- attrs=19.3.0=py_0
- autopep8=1.5.3=py_0
- babel=2.8.0=py_0
- backcall=0.2.0=py_0
- bcrypt=3.1.7=py37he774522_0
- blas=1.0=mkl
- bleach=3.1.5=py_0
- blinker=1.4=py37_0
- brotlipy=0.7.0=py37he774522_1000
- ca-certificates=2020.6.24=0
- cachetools=4.1.0=py_1
- certifi=2020.6.20=py37_0
- cffi=1.14.0=py37h7a1dbc1_0
- chardet=3.0.4=py37_1003
- click=7.1.2=py_0
- cloudpickle=1.5.0=py_0
- colorama=0.4.3=py_0
- cryptography=2.9.2=py37h7a1dbc1_0
- cycler=0.10.0=py37_0
- decorator=4.4.2=py_0
- defusedxml=0.6.0=py_0
- diff-match-patch=20200713=py_0
- dnspython=1.16.0=py37_0
- docutils=0.16=py37_0
- entrypoints=0.3=py37_0
- flake8=3.8.3=py_0
- freetype=2.10.2=hd328e21_0
- future=0.18.2=py37_0
- google-auth=1.17.2=py_0
- google-auth-oauthlib=0.4.1=py_2
- google-pasta=0.2.0=py_0
- grpcio=1.27.2=py37h351948d_0
- h5py=2.10.0=py37h5e291fa_0
- hdf5=1.10.4=h7ebc959_0
- icc_rt=2019.0.0=h0cc432a_1
- icu=58.2=ha925a31_3
- idna=2.10=py_0
- imagesize=1.2.0=py_0
- importlib-metadata=1.7.0=py37_0
- importlib_metadata=1.7.0=0
- intel-openmp=2020.1=216
- intervaltree=3.0.2=py_0
- ipykernel=5.3.3=py37h5ca1d4c_0
- ipython=7.16.1=py37h5ca1d4c_0
- ipython_genutils=0.2.0=py37_0
- isort=4.3.21=py37_0
- jedi=0.17.1=py37_0
- jinja2=2.11.2=py_0
- joblib=0.16.0=py_0
- jpeg=9b=hb83a4c4_2
- jsonschema=3.2.0=py37_0
- jupyter_client=6.1.6=py_0
- jupyter_core=4.6.3=py37_0
- keyring=21.2.1=py37_0
- kiwisolver=1.2.0=py37h74a9793_0
- lazy-object-proxy=1.4.3=py37he774522_0
- libpng=1.6.37=h2a8f88b_0
- libprotobuf=3.12.3=h7bd577a_0
- libsodium=1.0.18=h62dcd97_0
- libspatialindex=1.9.3=h33f27b4_0
- m2w64-gcc-libgfortran=5.3.0=6
- m2w64-gcc-libs=5.3.0=7
- m2w64-gcc-libs-core=5.3.0=7
- m2w64-gmp=6.1.0=2
- m2w64-libwinpthread-git=5.0.0.4634.697f757=2
- markdown=3.1.1=py37_0
- markupsafe=1.1.1=py37he774522_0
- matplotlib=3.2.2=0
- matplotlib-base=3.2.2=py37h64f37c6_0
- mccabe=0.6.1=py37_1
- mistune=0.8.4=py37he774522_0
- mkl=2020.1=216
- mkl-service=2.3.0=py37hb782905_0
- mkl_fft=1.1.0=py37h45dec08_0
- mkl_random=1.1.1=py37h47e9c7a_0
- msys2-conda-epoch=20160418=1
- nbconvert=5.6.1=py37_0
- nbformat=5.0.7=py_0
- networkx=2.4=py_0
- nltk=3.5=py_0
- notebook=6.0.3=py37_0
- numpy=1.18.5=py37h6530119_0
- numpy-base=1.18.5=py37hc3f5095_0
- numpydoc=1.1.0=py_0
- oauthlib=3.1.0=py_0
- openssl=1.1.1g=he774522_0
- opt_einsum=3.1.0=py_0
- packaging=20.4=py_0
- pandas=1.0.5=py37h47e9c7a_0
- pandoc=2.10=0
- pandocfilters=1.4.2=py37_1
- paramiko=2.7.1=py_0
- parso=0.7.0=py_0
- pathtools=0.1.2=py_1
- patsy=0.5.1=py37_0
- pexpect=4.8.0=py37_0
- pickleshare=0.7.5=py37_0
- pip=20.1.1=py37_1
- pluggy=0.13.1=py37_0
- prometheus_client=0.8.0=py_0
- prompt-toolkit=3.0.5=py_0
- prompt_toolkit=3.0.5=0
- protobuf=3.12.3=py37h33f27b4_0
- psutil=5.7.0=py37he774522_0
- pyasn1=0.4.8=py_0
- pyasn1-modules=0.2.7=py_0
- pycodestyle=2.6.0=py_0
- pycparser=2.20=py_0
- pydocstyle=5.0.2=py_0
- pyflakes=2.2.0=py_0
- pygments=2.6.1=py_0
- pyjwt=1.7.1=py37_0
- pylint=2.5.3=py37_0
- pymongo=3.9.0=py37ha925a31_0
- pynacl=1.4.0=py37h62dcd97_1
- pyopenssl=19.1.0=py37_0
- pyparsing=2.4.7=py_0
- pyqt=5.9.2=py37h6538335_2
- pyreadline=2.1=py37_1
- pyrsistent=0.16.0=py37he774522_0
- pysocks=1.7.1=py37_0
- python=3.7.7=h81c818b_4
- python-dateutil=2.8.1=py_0
- python-jsonrpc-server=0.3.4=py_0
- python-language-server=0.34.1=py37_0
- pytz=2020.1=py_0
- pywin32=227=py37he774522_1
- pywin32-ctypes=0.2.0=py37_1000
- pywinpty=0.5.7=py37_0
- pyyaml=5.3.1=py37he774522_0
- pyzmq=19.0.1=py37ha925a31_1
- qdarkstyle=2.8.1=py_0
- qt=5.9.7=vc14h73c81de_0
- qtawesome=0.7.2=py_0
- qtconsole=4.7.5=py_0
- qtpy=1.9.0=py_0
- regex=2020.6.8=py37he774522_0
- requests=2.24.0=py_0
- requests-oauthlib=1.3.0=py_0
- rope=0.17.0=py_0
- rsa=4.0=py_0
- rtree=0.9.4=py37h21ff451_1
- scikit-learn=0.23.1=py37h25d0782_0
- scipy=1.5.0=py37h9439919_0
- seaborn=0.10.1=py_0
- send2trash=1.5.0=py37_0
- setuptools=49.2.0=py37_0
- sip=4.19.8=py37h6538335_0
- six=1.15.0=py_0
- snowballstemmer=2.0.0=py_0
- sortedcontainers=2.2.2=py_0
- sphinx=3.1.2=py_0
- sphinxcontrib-applehelp=1.0.2=py_0
- sphinxcontrib-devhelp=1.0.2=py_0
- sphinxcontrib-htmlhelp=1.0.3=py_0
- sphinxcontrib-jsmath=1.0.1=py_0
- sphinxcontrib-qthelp=1.0.3=py_0
- sphinxcontrib-serializinghtml=1.1.4=py_0
- spyder=4.1.4=py37_0
- spyder-kernels=1.9.2=py37_0
- sqlite=3.32.3=h2a8f88b_0
- statsmodels=0.11.1=py37he774522_0
- termcolor=1.1.0=py37_1
- terminado=0.8.3=py37_0
- testpath=0.4.4=py_0
- threadpoolctl=2.1.0=pyh5ca1d4c_0
- toml=0.10.1=py_0
- tornado=6.0.4=py37he774522_1
- traitlets=4.3.3=py37_0
- typed-ast=1.4.1=py37he774522_0
- ujson=1.35=py37hfa6e2cd_0
- urllib3=1.25.9=py_0
- vc=14.1=h0510ff6_4
- vs2015_runtime=14.16.27012=hf0eaf9b_1
- watchdog=0.10.3=py37_0
- wcwidth=0.2.5=py_0
- webencodings=0.5.1=py37_1
- werkzeug=1.0.1=py_0
- wheel=0.34.2=py37_0
- win_inet_pton=1.1.0=py37_0
- wincertstore=0.2=py37_0
- winpty=0.4.3=4
- wrapt=1.11.2=py37he774522_0
- xlrd=1.2.0=py37_0
- yaml=0.1.7=hc54c509_2
- yapf=0.30.0=py_0
- zeromq=4.3.2=ha925a31_2
- zipp=3.1.0=py_0
- zlib=1.2.11=h62dcd97_4
- pip:
- astunparse==1.6.3
- bokeh==2.0.0
- dask==2.12.0
- distributed==2.12.0
- fsspec==0.6.3
- gast==0.3.3
- heapdict==1.0.1
- hyperopt==0.2.4
- ipywidgets==7.5.1
- jellyfish==0.7.1
- keras-preprocessing==1.1.2
- llvmlite==0.31.0
- locket==0.2.0
- msgpack==1.0.0
- numba==0.48.0
- partd==1.1.0
- pillow==7.0.0
- sklearn==0.0
- swifter==0.301
- tblib==1.6.0
- tensorboard==2.2.2
- tensorboard-plugin-wit==1.7.0
- tensorflow-estimator==2.2.0
- toolz==0.10.0
- tqdm==4.43.0
- typing-extensions==3.7.4.1
- widgetsnbextension==3.5.1
- zict==2.0.0
prefix: C:\Users\Manuel Camargo\.conda\envs\lstm_exp_v3

Loading

0 comments on commit f91e9af

Please sign in to comment.