Merge pull request #6 from AdaptiveBProcess/integration

Refactory integration
AdaptiveBProcess · Oct 1, 2020 · f91e9af · f91e9af
2 parents 973ac5d + 534995c
commit f91e9af
Show file tree

Hide file tree

Showing 55 changed files with 1,733 additions and 277,396 deletions.
diff --git a/README.md b/README.md
@@ -1,9 +1,8 @@
-# Learning Accurate Generative Models of Business Processes with LSTM Neural Networks
+# DeepGenerator: Learning Accurate Generative Models of Business Processes with LSTM Neural Networks
 
-The code here presented can execute different pre- and post-processing methods and architectures for building and using generative models from event logs in XES format using LSTM neural networks. This code can perform the next tasks:
+The code here presented is able to execute different pre- and post-processing methods and architectures for building and using generative models from event logs in XES format using LSTM anf GRU neural networks. This code can perform the next tasks:
 
 
-* Training embedded matrices for the activities and roles contained in an event log.
 * Training LSTM neuronal networks using an event log as input.
 * Generate full event logs using a trained LSTM neuronal network.
 * Predict the remaining time and the continuation (suffix) of an incomplete business process trace. 
@@ -15,43 +14,31 @@ These instructions will get you a copy of the project up and running on your loc
 
 ### Prerequisites
 
-To execute this code, you need to install Anaconda in your system and create an environment using the *lstm_env.yml* specification provided in the repository.
+To execute this code you just need to install Anaconda in your system, and create an environment using the *environment.yml* specification provided in the repository.
 
 ## Running the script
 
 Once created the environment, you can perform each one of the tasks, specifying the following parameters in the lstm.py module, or by command line as is described below:
 
-*Training embedded matrices:* this task is a pre-requisite to train an LSTM model, to execute it you need to specify the required activity (-a) as 'emb_training' followed by the name of the event log (-f), and define if the event-log has a single timestamp or not (-o):
+*Training LSTM neuronal network:* To perform this task you need to set the required activity (-a) as 'training' followed by the name of the (-f) event log, and all the following parameters:
 
-```
-(lstm_env) C:\sc_lstm>python lstm.py -a emb_training -f Helpdesk.xes.gz -o True
-```
-*Training LSTM neuronal network:* To perform this task, you need to set the required activity as 'training' followed by the name of the event log, and all the following parameters:
-
-* One timestamp Event-log (-o): define if the event-log has a single timestamp or not
-* Implementation (-i): type of Keras LSTM implementation 1 cpu, 2 gpu
-* LSTM activation function (-l): LSTM optimization function (see Keras doc), None to set it up as the default value.
-* Dense activation function (-d): dense layer activation function (see Keras doc), None to set it up as the default value.
-* optimization function (-p): optimization function (see Keras doc).
+* Implementation (-i): type of keras lstm implementation 1 cpu, 2 gpu
+* lSTM activation function (-l): lSTM optimization function (see keras doc), None to set it up as the default value.
+* Dense activation function (-d): dense layer activation function (see keras doc), None to set it up as the default value.
+* optimization function (-p): optimization function (see keras doc).
 * Scaling method (-n) = relative time between events scaling method max or lognorm.
-* Model type (-m): type of LSTM model specialized, concatenated, or shared_cat.
+* Model type (-m): type of LSTM model one of specialized, concatenated, shared_cat, shared_cat_gru, specialized_gru or concatenated_gru.
 * N-gram size (-z): Size of the n-gram (temporal dimension)
 * LSTM layer sizes (-y): Size of the LSTM layers.
 
 ```
-(lstm_env) C:\sc_lstm>python lstm.py -a training -f Helpdesk.xes -o True -i 1 -l sigmoid -d None -p Nadam -n max -m shared_cat -z 5 -y 50
-```
-
-*Generate full event log:* To perform this task, you need to set the required activity as 'pred_log' followed by the folder (-c) and model (-b) names to be used to generate the event logs. These folders and models were generated during the training task and can be found in the output_files folder. Additionally, you need to specify the maximum length of the predicted traces (-t). Finally, to store the results, it's necessary to define if you are executing the task as a single execution or if you are running other prediction instances (-x). If it's a single execution, the detailed results and individual measurements are stored in a subfolder called results. Otherwise, the results of all the running models are store in the output_files folder in a shared file:
-
-```
-(lstm_env) C:\sc_lstm>python lstm.py -a pred_log -c 20190228_155935509575 -b "model_rd_150 Nadam_22-0.59.h5" -t 100 -x False
+(lstm_env) C:\sc_lstm>python lstm.py -a training -f Helpdesk.xes -i 1 -l None -d linear -p Nadam -n lognorm -m shared_cat -z 5 -y 100
 ```
 
-*Predict the remaining time and suffix:* To perform this task, the only change with respect with the previous one is that you need to set the required activity as 'pred_sfx':
+*Predictive task:* It is possible to execute various predictive tasks with DeepGenerator, such as predicting the next event, the case continuation, and the remaining time of an ongoing case. Similarly, it is possible to generate complete event logs starting from a zero prefix size. To perform these tasks, you need to set the activity (-a) as ‘predict_next’ for the next event prediction, ‘pred_sfx’ for case continuation and remaining time, and ‘pred_log’ for the full event log generation. Additionally, it's required to indicate the folder where the predictive model is located (-c), and the name of the .h5 model (-b). Finally, you need to specify the method for selecting the next predicted task (-v) ‘random_choice’ or ‘arg_max’ and the number of repetitions of the experiment (-r). **NB! The folders and models were generated in the training task and can be found in the output_files folder:
 
 ```
-(lstm_env) C:\sc_lstm>python lstm.py -a pred_sfx -c 20190228_155935509575 -b "model_rd_150 Nadam_22-0.59.h5" -t 100 -x False
+(lstm_env) C:\sc_lstm>-a pred_log -c 20201001_426975C9_FAC6_453A_9F0B_4DD528CB554B -b "model_shared_cat_02-1.10.h5" -v "random_choice" -r 1"
 ```
 *Predict the next event and role:* To perform this task the only changes with respect with the previous ones are that you need to set the required activity as 'predict_next' and its not necesary to set the maximum trace length:
 
@@ -65,4 +52,4 @@ Models examples and experimental results can be found at <a href="http://kodu.ut
 
 * **Manuel Camargo**
 * **Marlon Dumas**
-* **Oscar Gonzalez-Rojas**
+* **Oscar Gonzalez-Rojas**
diff --git a/compress_job b/compress_job
diff --git a/environment.yml b/environment.yml
@@ -0,0 +1,231 @@
+name: lstm_exp_v3
+channels:
+  - defaults
+dependencies:
+  - _tflow_select=2.3.0=mkl
+  - absl-py=0.9.0=py37_0
+  - alabaster=0.7.12=py37_0
+  - argh=0.26.2=py37_0
+  - astor=0.8.0=py37_0
+  - astroid=2.4.2=py37_0
+  - atomicwrites=1.4.0=py_0
+  - attrs=19.3.0=py_0
+  - autopep8=1.5.3=py_0
+  - babel=2.8.0=py_0
+  - backcall=0.2.0=py_0
+  - bcrypt=3.1.7=py37he774522_0
+  - blas=1.0=mkl
+  - bleach=3.1.5=py_0
+  - blinker=1.4=py37_0
+  - brotlipy=0.7.0=py37he774522_1000
+  - ca-certificates=2020.6.24=0
+  - cachetools=4.1.0=py_1
+  - certifi=2020.6.20=py37_0
+  - cffi=1.14.0=py37h7a1dbc1_0
+  - chardet=3.0.4=py37_1003
+  - click=7.1.2=py_0
+  - cloudpickle=1.5.0=py_0
+  - colorama=0.4.3=py_0
+  - cryptography=2.9.2=py37h7a1dbc1_0
+  - cycler=0.10.0=py37_0
+  - decorator=4.4.2=py_0
+  - defusedxml=0.6.0=py_0
+  - diff-match-patch=20200713=py_0
+  - dnspython=1.16.0=py37_0
+  - docutils=0.16=py37_0
+  - entrypoints=0.3=py37_0
+  - flake8=3.8.3=py_0
+  - freetype=2.10.2=hd328e21_0
+  - future=0.18.2=py37_0
+  - google-auth=1.17.2=py_0
+  - google-auth-oauthlib=0.4.1=py_2
+  - google-pasta=0.2.0=py_0
+  - grpcio=1.27.2=py37h351948d_0
+  - h5py=2.10.0=py37h5e291fa_0
+  - hdf5=1.10.4=h7ebc959_0
+  - icc_rt=2019.0.0=h0cc432a_1
+  - icu=58.2=ha925a31_3
+  - idna=2.10=py_0
+  - imagesize=1.2.0=py_0
+  - importlib-metadata=1.7.0=py37_0
+  - importlib_metadata=1.7.0=0
+  - intel-openmp=2020.1=216
+  - intervaltree=3.0.2=py_0
+  - ipykernel=5.3.3=py37h5ca1d4c_0
+  - ipython=7.16.1=py37h5ca1d4c_0
+  - ipython_genutils=0.2.0=py37_0
+  - isort=4.3.21=py37_0
+  - jedi=0.17.1=py37_0
+  - jinja2=2.11.2=py_0
+  - joblib=0.16.0=py_0
+  - jpeg=9b=hb83a4c4_2
+  - jsonschema=3.2.0=py37_0
+  - jupyter_client=6.1.6=py_0
+  - jupyter_core=4.6.3=py37_0
+  - keyring=21.2.1=py37_0
+  - kiwisolver=1.2.0=py37h74a9793_0
+  - lazy-object-proxy=1.4.3=py37he774522_0
+  - libpng=1.6.37=h2a8f88b_0
+  - libprotobuf=3.12.3=h7bd577a_0
+  - libsodium=1.0.18=h62dcd97_0
+  - libspatialindex=1.9.3=h33f27b4_0
+  - m2w64-gcc-libgfortran=5.3.0=6
+  - m2w64-gcc-libs=5.3.0=7
+  - m2w64-gcc-libs-core=5.3.0=7
+  - m2w64-gmp=6.1.0=2
+  - m2w64-libwinpthread-git=5.0.0.4634.697f757=2
+  - markdown=3.1.1=py37_0
+  - markupsafe=1.1.1=py37he774522_0
+  - matplotlib=3.2.2=0
+  - matplotlib-base=3.2.2=py37h64f37c6_0
+  - mccabe=0.6.1=py37_1
+  - mistune=0.8.4=py37he774522_0
+  - mkl=2020.1=216
+  - mkl-service=2.3.0=py37hb782905_0
+  - mkl_fft=1.1.0=py37h45dec08_0
+  - mkl_random=1.1.1=py37h47e9c7a_0
+  - msys2-conda-epoch=20160418=1
+  - nbconvert=5.6.1=py37_0
+  - nbformat=5.0.7=py_0
+  - networkx=2.4=py_0
+  - nltk=3.5=py_0
+  - notebook=6.0.3=py37_0
+  - numpy=1.18.5=py37h6530119_0
+  - numpy-base=1.18.5=py37hc3f5095_0
+  - numpydoc=1.1.0=py_0
+  - oauthlib=3.1.0=py_0
+  - openssl=1.1.1g=he774522_0
+  - opt_einsum=3.1.0=py_0
+  - packaging=20.4=py_0
+  - pandas=1.0.5=py37h47e9c7a_0
+  - pandoc=2.10=0
+  - pandocfilters=1.4.2=py37_1
+  - paramiko=2.7.1=py_0
+  - parso=0.7.0=py_0
+  - pathtools=0.1.2=py_1
+  - patsy=0.5.1=py37_0
+  - pexpect=4.8.0=py37_0
+  - pickleshare=0.7.5=py37_0
+  - pip=20.1.1=py37_1
+  - pluggy=0.13.1=py37_0
+  - prometheus_client=0.8.0=py_0
+  - prompt-toolkit=3.0.5=py_0
+  - prompt_toolkit=3.0.5=0
+  - protobuf=3.12.3=py37h33f27b4_0
+  - psutil=5.7.0=py37he774522_0
+  - pyasn1=0.4.8=py_0
+  - pyasn1-modules=0.2.7=py_0
+  - pycodestyle=2.6.0=py_0
+  - pycparser=2.20=py_0
+  - pydocstyle=5.0.2=py_0
+  - pyflakes=2.2.0=py_0
+  - pygments=2.6.1=py_0
+  - pyjwt=1.7.1=py37_0
+  - pylint=2.5.3=py37_0
+  - pymongo=3.9.0=py37ha925a31_0
+  - pynacl=1.4.0=py37h62dcd97_1
+  - pyopenssl=19.1.0=py37_0
+  - pyparsing=2.4.7=py_0
+  - pyqt=5.9.2=py37h6538335_2
+  - pyreadline=2.1=py37_1
+  - pyrsistent=0.16.0=py37he774522_0
+  - pysocks=1.7.1=py37_0
+  - python=3.7.7=h81c818b_4
+  - python-dateutil=2.8.1=py_0
+  - python-jsonrpc-server=0.3.4=py_0
+  - python-language-server=0.34.1=py37_0
+  - pytz=2020.1=py_0
+  - pywin32=227=py37he774522_1
+  - pywin32-ctypes=0.2.0=py37_1000
+  - pywinpty=0.5.7=py37_0
+  - pyyaml=5.3.1=py37he774522_0
+  - pyzmq=19.0.1=py37ha925a31_1
+  - qdarkstyle=2.8.1=py_0
+  - qt=5.9.7=vc14h73c81de_0
+  - qtawesome=0.7.2=py_0
+  - qtconsole=4.7.5=py_0
+  - qtpy=1.9.0=py_0
+  - regex=2020.6.8=py37he774522_0
+  - requests=2.24.0=py_0
+  - requests-oauthlib=1.3.0=py_0
+  - rope=0.17.0=py_0
+  - rsa=4.0=py_0
+  - rtree=0.9.4=py37h21ff451_1
+  - scikit-learn=0.23.1=py37h25d0782_0
+  - scipy=1.5.0=py37h9439919_0
+  - seaborn=0.10.1=py_0
+  - send2trash=1.5.0=py37_0
+  - setuptools=49.2.0=py37_0
+  - sip=4.19.8=py37h6538335_0
+  - six=1.15.0=py_0
+  - snowballstemmer=2.0.0=py_0
+  - sortedcontainers=2.2.2=py_0
+  - sphinx=3.1.2=py_0
+  - sphinxcontrib-applehelp=1.0.2=py_0
+  - sphinxcontrib-devhelp=1.0.2=py_0
+  - sphinxcontrib-htmlhelp=1.0.3=py_0
+  - sphinxcontrib-jsmath=1.0.1=py_0
+  - sphinxcontrib-qthelp=1.0.3=py_0
+  - sphinxcontrib-serializinghtml=1.1.4=py_0
+  - spyder=4.1.4=py37_0
+  - spyder-kernels=1.9.2=py37_0
+  - sqlite=3.32.3=h2a8f88b_0
+  - statsmodels=0.11.1=py37he774522_0
+  - termcolor=1.1.0=py37_1
+  - terminado=0.8.3=py37_0
+  - testpath=0.4.4=py_0
+  - threadpoolctl=2.1.0=pyh5ca1d4c_0
+  - toml=0.10.1=py_0
+  - tornado=6.0.4=py37he774522_1
+  - traitlets=4.3.3=py37_0
+  - typed-ast=1.4.1=py37he774522_0
+  - ujson=1.35=py37hfa6e2cd_0
+  - urllib3=1.25.9=py_0
+  - vc=14.1=h0510ff6_4
+  - vs2015_runtime=14.16.27012=hf0eaf9b_1
+  - watchdog=0.10.3=py37_0
+  - wcwidth=0.2.5=py_0
+  - webencodings=0.5.1=py37_1
+  - werkzeug=1.0.1=py_0
+  - wheel=0.34.2=py37_0
+  - win_inet_pton=1.1.0=py37_0
+  - wincertstore=0.2=py37_0
+  - winpty=0.4.3=4
+  - wrapt=1.11.2=py37he774522_0
+  - xlrd=1.2.0=py37_0
+  - yaml=0.1.7=hc54c509_2
+  - yapf=0.30.0=py_0
+  - zeromq=4.3.2=ha925a31_2
+  - zipp=3.1.0=py_0
+  - zlib=1.2.11=h62dcd97_4
+  - pip:
+    - astunparse==1.6.3
+    - bokeh==2.0.0
+    - dask==2.12.0
+    - distributed==2.12.0
+    - fsspec==0.6.3
+    - gast==0.3.3
+    - heapdict==1.0.1
+    - hyperopt==0.2.4
+    - ipywidgets==7.5.1
+    - jellyfish==0.7.1
+    - keras-preprocessing==1.1.2
+    - llvmlite==0.31.0
+    - locket==0.2.0
+    - msgpack==1.0.0
+    - numba==0.48.0
+    - partd==1.1.0
+    - pillow==7.0.0
+    - sklearn==0.0
+    - swifter==0.301
+    - tblib==1.6.0
+    - tensorboard==2.2.2
+    - tensorboard-plugin-wit==1.7.0
+    - tensorflow-estimator==2.2.0
+    - toolz==0.10.0
+    - tqdm==4.43.0
+    - typing-extensions==3.7.4.1
+    - widgetsnbextension==3.5.1
+    - zict==2.0.0
+prefix: C:\Users\Manuel Camargo\.conda\envs\lstm_exp_v3
+