The examples of datasets are in (./data_example) The example font files are downloaded from here(chinese) and here(korean).
- This code can treat both dataset in image files(.png, .jpg ...) and truetype font files(.ttf).
- To use ttf data, see 1.1.
- To use image data, see 1.2.
Prepare the TrueType font files(.ttf) to use for the training and the validation.
Put the training font files and validation font files into separate directories.
In this case, the available characters list is needed in txt format. (described in 1.1.a.)
If you want to split characters to seen characters and unseen characters, the character split information is needed in json format. (described in 2.)
- If you have the available character list of a .ttf file, save its available characters list to a text file (.txt) with the same name in the same directory with the ttf file.
- (example)
- TTF file: data/ttfs/train/MaShanZheng-Regular.ttf
- its available characters: data/ttfs/train/MaShanZheng-Regular.txt
- You can also generate the available characters files automatically using the
- How to use:
python --root_dir path/to/ttf/dir
- --root_dir: The root directory to find the .ttf files. All the .ttf files under this directory and its subdirectories will be processed.
- If you have the available character list of a .ttf file, save its available characters list to a text file (.txt) with the same name in the same directory with the ttf file.
- The images are should be placed in this format:
* data_dir
|-- font1
|-- char1.png
|-- char2.png
|-- char3.png
|-- font2
|-- char1.png
|-- char2.png
- You can see the example at
. - The images with the same style are should be grouped with the same directory.
- The name of each image file should be its character.
(not required for DM-Font)
* You can get the Chinese source font which we used from [here](
* Download this file and save to `data/chn/source.ttf`.
* The Korean source font is attached at `data/kor/source.ttf`
* Put all the source image files in a single directory.
* The name of each image file should be its character.
- Save the list of characters to use as the reference characters.
- This step can be skipped if you want to use all the available characters.
- The characters both existing in this list and available from the dataset will be used as the reference.
- Our example is in
(Chinese) anddata/kor/ref_chars.json
- Save the list of characters to generate as a json file.
- You can skip this if you want to generate every characters in the source font.
- The characters both existing in this list and available from source font will be used for the validation.
- The characters that we used for the training is in
(Chinese) anddata/kor/gen_chars.json
(not required for MX-Font and FUNIT)
- The files should be identical to those used to train the evaluating weight.
Please do not modify the indentation, because the indentation rule is very important in these configuration files.
- The files in
are the examples.
- dset: (leave blank)
- test: (leave blank)
- extension: extension of training data.
- Set this to "ttf" if you are using ttf files.
- If you are using image files set this to the image files' extension.
- data_dir: path to training data.
- source_path : path to the source font or source directory to use for the validation.
- source_ext: extension of the source data.
- If you are using a ttf file, set this to "ttf".
- If you are using image files, set this to their extension ("png", "jpg" ...).
- ref_chars: The json file containing the reference characters.
- If this is blank, all the available characters will be used as the reference.
- gen_chars: The json file containing the characters list to generate.
- If this is blank, all the available characters in the source font will be generated.
- extension: extension of training data.
- test: (leave blank)