The code to train the agent.
This project currently contains a memory leak, which means that during long training runs it might use up all your memory and make the server slow down or crash!
You need Python 3 (preferably 3.6) installed, as well as the requirements from requirements.txt
:
$ pip install -r requirements.txt
Furthermore, you need to install the text-localization-environment by following its Installation instructions.
Training an agent requires two files:
- A textfile where each line contains the path to one image in the training dataset
- A numpy file (.npy) that contains the bounding boxes associated with each image. For n images this file contains a list with n entries where each entry is a list of bounding boxes in the format ((xtopleft, ytopleft), (xbottomright, ybottomright))
Datasets generated by the dataset generator fullfill these requirements. With these two files you can start training by starting the train_agent.py script. Here is an overview of the available options:
Option name | Shorted name | Explanation | Default value |
---|---|---|---|
--steps | -s | Amount of steps to train the agent | 2000 |
--gpu | ID of the GPU to be used. -1 if the CPU should be used instead | -1 | |
--imagefile | -i | Path to the file containing the image locations | 'image_locations.txt' |
--boxfile | -b | Path to the bounding boxes | 'bounding_boxes.npy' |
--tensorboard/--no-tensorboard | Whether or not to use TensorBoard logging | False | |
--help | Display these options |
If you would like the program to generate log-files appropriate for visualization in TensorBoard, you need to:
- Install tensorflow
(If you use Python 3.7 and the installation fails, use:
$ pip install tensorflow
pip install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.12.0-py3-none-any.whl
instead. See here, why.) - Run the text-localization-agent program with the
--tensorboard
flag$ python train-agent.py --tensorboard --imagefile … --boxfile …
- Start TensorBoard pointing to the
tensorboard/
directory inside the text-localization-agent project$ tensorboard --logdir=<path to text-localization-agent>/tensorboard/ … TensorBoard 1.12.0 at <link to TensorBoard UI> (Press CTRL+C to quit)
- Open the TensorBoard UI via the link that is provided when the
tensorboard
program is started (usually: http://localhost:6006)
To run the training on one of the chair's servers you need to:
- Clone the necessary repositories
- Create a new virtual environment. Note that the Python version needs to be at least 3.6 for everything to run.
The default might be a lower version so if that is the case you must make sure that the correct version is used.
You can pass the correct python version to virtualenv via the
-p
parameter, for example(If there is no Python 3.6/3.7 installed you are out of luck because we don't have sudo access)$ virtualenv -p python3.6 <envname>
- Activate the environment via
$ source <envname>/bin/activate
- Install the required packages (see section "Prerequisites"). Don't forget cupy, tb_chainer and tensorflow!
- Prepare the training data (either generate it using the dataset-generator or transfer existing data on the server)
- To avoid stopping the training after disconnecting from the server, you might want to use a terminal-multiplexer such as tmux or screen
- Set the CUDA_PATH and LD_LIBRARY_PATH variables if they are not already set. The command should be something like
$ export CUDA_PATH=/usr/local/cuda $ export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH
- To download the ResNet-152 caffemodel (it isn't downloaded automatically) see link and save it where necessary (an error will tell you where if you try to create a TextLocEnv).
- Start training!
These instructions are for starting from scratch, for example if there is already a suitable virtual environment you obviously don't need to create a new one.
- To evaluate a previously trained agent on a dataset, you may use the
evaluate
method available as a click CLI when executing:(Run$ python evaluate_agent.py
python evaluate_agent.py --help
to see the required parameters for the CLI) - If you provide the
--save
flag in the CLI above, it creates.npy
files which can be read by theevaluate_from_files
CLI afterwards:(Run$ python evaluate_from_files.py
python evaluate_from_files.py --help
to see the required parameters for the CLI) - The
evaluate_from_files
CLI allows defining an IoU threshold used for the calculation of the evaluation metrics. Furthermore, it does not only output the mean average precision (mAP) but also the precision and recall values.
- To create an image sequence of a an already trained agent acting on a specific image, use:
(Run
$ python generate_image_sequence.py
python generate_image_sequence.py --help
to see the required parameters for the CLI and have a look into thegenerate_image_sequence.py
file for instructions on creating a video out of the generated single frames using ffmpeg)