ai2-kit tool dpdata
This toolkit is a command line wrapper of dpdata to allow user to process DeepMD dataset via command line.
ai2-kit tool dpdata # show all commands
ai2-kit tool dpdata to_ase -h # show doc of specific command
This toolkit include the following commands:
Command | Description | Example | Reference |
---|---|---|---|
read | Read dataset into memory. This command by itself is useless, you should chain other command after reading data into memory. | ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy |
Support wildcard, can be call multiple time |
write | Use MultiSystems to merge dataset and write to directory | ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - write ./path/to/merged_dataset |
|
filter | Use lambda expression to filter dataset by system data. | See in Example |
|
set_fparam | add fparam to dataset, can be float or list of float |
See in Example |
|
slice | use slice expression to process systems | see in Example |
|
sample | sample data by different methods, current supported method are even and random |
see in Example |
|
eval | use deepmd DeepPot to (re)label loaded data |
see in Example |
|
to_ase | convert dpdata format to ase format and use ase tool to process | see in Example |
Those commands are chainable and can be used to process trajectory in a pipeline fashion (separated by -
). For more information, please refer to the following examples.
# read multiple dataset generated by training workflow by wildcard and merge them into a single dataset
# you can also call `read` multiple times to read multiple dataset from different directory
ai2-kit tool dpdata read ./workdir/iters-*/train-deepmd/new_dataset/* --fmt deepmd/npy - write ./merged_dataset --fmt deepmd/npy
# You can also save data with hdf5 format
ai2-kit tool dpdata read ./workdir/iters-*/train-deepmd/new_dataset/* --fmt deepmd/npy - write ./merged.hdf5 --fmt deepmd/hdf5
# Use lambda expression to filter outlier data
ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - filter "lambda x: x['forces'].max() < 10" - write ./path/to/filtered_dataset
# Set fparam when reading data
ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy --fparam [0,1] - write ./path/to/new_dataset
# (re)label data
ai2-kit tool dpdata read dp-h2o --nolabel - eval dp-frozen.pb - write new-dp-hwo
# Drop the first 10 frames and then sample 10 frames use random method, and save it as xyz format
ai2-kit tool dpdata read dp-h2o - slice 10: - sample 10 --method random - to_ase - write h2o.xyz
o
# convert cp2k data to the format that can be used by deepmd dplr module
ai2-kit tool dpdata read ./path-to-cp2k-dir --fmt cp2k/dplr --cp2k_output="output" --wannier_file="wannier.xyz" --type_map="[H,O]" --sys_charge_map="[0,-2]" --model_charge_map="[0,-2]" --ewald_h=0.3 --ewald_beta=0.3 --ext_efield="[0.0, 0.0, 0.0]" --sel_type="[1, 2]" - write ./dataset