Main Repository for my MSc thesis about Procedural content generation via machine learning applied to videogames levels (Doom I/II)
Citation: Giacomello, E., Lanzi, P. L., & Loiacono, D. (2018, August). DOOM level generation using generative adversarial networks. In 2018 IEEE Games, Entertainment, Media Conference (GEM) (pp. 316-323). IEEE.
@inproceedings{giacomello2018doom,
title={DOOM level generation using generative adversarial networks},
author={Giacomello, Edoardo and Lanzi, Pier Luca and Loiacono, Daniele},
booktitle={2018 IEEE Games, Entertainment, Media Conference (GEM)},
pages={316--323},
year={2018},
organization={IEEE}
}
Your host should be able to run:
- Docker
- Nvidia-Docker (https://github.com/NVIDIA/nvidia-docker)
- Cuda 8.0
- Tensorboard (optional)
You can find all the requirements in the requirements.txt file. The most important are:
- Python 3.*
- tensorflow_gpu 1.3 with cuda 8.0 support (more recent versions of TF may work but have not been tested)
- tensorboard (software)
- numpy
- scipy
- scikit_image
- scikit_learn
- matplotlib
- networkx
- pandas
- seaborn
-
beautifulsoup4 for the scraper
-
dicttoxml for the scraper
-
requests for the scraper
-
bsp (Shell command) for generating WAD files
- Install Cuda 8.0
- Install Docker
- Install nvidia-docker from https://github.com/NVIDIA/nvidia-docker
- Move (cd) to an empty folder where you want to store the results. Make sure both docker.service and nvidia-docker.service are enabled and running.
- Launch the Docker with (be sure to copy all the parameters, you can drop
-p 6006:6006
if you don't want to access tensorboard):
nvidia-docker run -v $PWD/artifacts:/DoomPCGML/artifacts -p 6006:6006 -it edoardogiacomello/doompcgml:latest /bin/bash
This commnand will download the docker container (~2GB), open a shell inside the DoomPCGML docker and will create an "artifacts" folder in your current directory that will contain:
- Network checkpoints (your network model, saved every 100 iterations -> ~16 minutes or less)
- Tensorboard cache (training progress data).
- Generated Samples
This folder will be your only persistence level, so make sure of making backups if you have to experiment with the parameters. 6) Go to "Usage" section for further commands
Running the code directly in your python environment (without Docker or virtualenv) is simple, you only have to make sure of having a working installation of cuda 8, tensorflow and python 3.5 or above.
Clone the repository with
git clone --depth=1 https://github.com/edoardogiacomello/DoomPCGML.git
and install the remaining dependencies with
pip install --user -r requirements.txt
follow the "usage" section.
README Run scripts for most common operations with default parameters are in the project root. Make sure of calling them from the project root itself (you have to "cd" into DoomPCGML folder if you have just cloned the repository) becuse some scripts makes use of the current directory for importing all the modules. If you are using Docker the correct working directory should already be set.
Before running the model you should make sure that the datasets are extracted into the "dataset" folder. Zip files containing datasets for 128x128 levels are already included in the repository, you only have to unzip them with:
./extract.sh
this will produce two .TFRecord and one .meta files for the "128, one floor" dataset.
you have to repeat the process every time you launch the container if you are using docker
###Train the network
For training a network (or resuming a training session) you just need to issue:
./train.sh
This command will start the training and save a new network model into the artifacts/checkpoint
folder every 100 iterations.
If you need to detach from the container shell without stopping the training process you have to press ctrl+p
followed by ctrl+q
.
In order to reattach to the shell you can issue nvidia-docker attach <container_name>
. You can find the current name with the docker ps
command
.
If you need to redo the training from scratch (for example if you modified the network architecture) you have to manually delete the following folders:
/artifacts/tensorboard_results
and /artifacts/checkpoints
. This will delete every model you trained and its relative training progress data, so make sure to do a backup if you are going to need them.
If you want to generate new samples from your current network just run ./generate.sh
and the output will be produced in the artifact folder.
If you need to change the network structure (layers or input maps/features) just launch ./edit_network.sh
(nano required) or edit the file DoomLevelsGAN/network_architecture.py
remembering to reset the Artifacts folder before training the new model.
This method has the advantage that you don't have to stop the training process to launch tensorboard:
Just issue tensorboard --logdir=<cache_folder>/artifacts/tensorboard_results
where <cache_folder> is the folder you cloned the repo into or the container volume mountpoint if you are using docker.
You have run your container with an additional parameter: -p 6006:6006
, then run the command above on the container shell.
The commands explained above are for basic "black box" usage, but the script have lots of option you can configure (such as the sampling operation).
You can edit the "train" script adding several parameters for testing different input size, learning hyperparameters and so on.
###Parameter List WIP
###Outputing to WAD files You can generate working WAD files directly from the network samples. Documentation is still in progress but you can give it a try by checking the DoomGAN.py script and running the sample function with 'WAD' save parameter.
The developement machine had the following specs:
- Intel i7 940
- Nvidia GTX670 4GB
- 18GB Ram
and it took about 10 seconds per iteration with the default network architecture with the following usages:
- VRAM: 3826MiB / 4035MiB
- RAM: 1.77 GiB / 17.63 GiB
So you are goind to need a machine with at least 4GB of VRAM to run the code. I also noticed some overhead when using Docker.
The full dataset is currently unavailable online, it will be uploaded as soon as possible. (See below.)
The full dataset comes from the Id archive, an archive of levels from ID Software. Data has been scraped from www.doomworld.com in date 16/10/2017, obtaining about 9600 levels. Metadata about the levels are stored in corresponding json files but it's also available in google sheets for a more convenient consultation:
- Levels from DoomI/II taken from @TheVGLC [Provided as a separated dataset]
- Levels from "Final Doom" and "Master levels for DoomII" taken from a retail steam copy of the game. [Provided as a separated dataset]
- About 9600 levels for DoomI and DoomII taken from www.doomworld.com along with their metadata preview here
- There are still more than 2000 levels to download, but since they are for co-op/deathmatch they could present structural differences with standard levels so I decided to let them out of the dataset for now.
A dataset preview and a list of features stored at: Google Sheets
The full dataset is structured as follow:
- a root <dataset_root>/
- a .json database <dataset_root>/dataset.json in which all the level features are stored
- a <dataset_root>/Original/ folder, containing the .WAD files for each level
- a <dataset_root>/Processed/ folder, containing:
- <zip_name><wad_name><slot_name>.json file containing the features for a level (one row dataset.json)
- <zip_name><wad_name><slot_name>_<feature_map>.png image(s) containing the feature map for a level
These files are indexed from the dataset.json starting from <dataset_root>.
E.g. a path could be "Processed/myzip_MyLevel_E1M1_floormap.png"
The feature maps dimensions are (width/32, height/32), since 32 is the diameter of the smallest thing that exists on Doom.
Each pixel value is an uint8, dicrectly encoding a value (ie. the "thing type index" for thingsmap; 1,2,3,4.. for
the "floormap" enumeration or the floor_height value for the heightmap.
Dataset is currently provided as a .TFRecord file ready to be ingested by DoomGAN
Into the "dataset" folder you can find two datasets:
- 128x128-many-floors.zip: Train and validation set for levels up to 128 pixels with any number of floors
- 128x128-one-floor.zip: Train and validation set for levels up to 128 pixels with just one floor Each file comes with a .meta file that is the one to reference in the script parameters.
The project is composed of several modules that can be used separately.
Each module has a dedicated readme, if you need further information you can inspect the code documentation and comments since I try to keep the code as clear as possible.
- DoomWorldScraper contains a scraper for www.doomworld.com (which is an ID Software Level Archive mirror) from which the Community Generated levels have been collected.
- DoomWorldScraper/WADParser modified version of the parser present at the VGLC repo for better integration and automation with the scraper.
- GAN-Tests/DCGAN-slerp-interpolation Some experiments and implementations I made to DCGAN architecture about visualizing and operating on the latent space for understanding how to apply them to Doom Levels. See the corresponding readme.