Dpdata Toolkit

Contents

Dpdata Toolkit#

ai2-kit tool dpdata

This toolkit is a command line wrapper of dpdata to allow user to process DeepMD dataset via command line.

Usage#

This toolkit include the following commands:

Command	Description	Example	Reference
read	Read dataset into memory. This command by itself is useless, you should chain other command after reading data into memory.	`ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy`	Support wildcard, can be call multiple time
write	Use MultiSystems to merge dataset and write to directory	`ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - write ./path/to/merged_dataset`
filter	Use lambda expression to filter dataset by system data.	`ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - filter "lambda x: x['forces'].max() < 10" - write ./path/to/filtered_dataset`
set_fparam	add `fparam` to dataset, can be float or list of float	`ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - set_fparam [0,1] - write ./path/to/filtered_dataset`

Those commands are chainable and can be used to process trajectory in a pipeline fashion (separated by -). For more information, please refer to the following examples.

Example#

# read multiple dataset generated by training workflow by wildcard and merge them into a single dataset
# you can also call `read` multiple times to read multiple dataset from different directory
ai2-kit tool dpdata read ./workdir/iters-*/train-deepmd/new_dataset/* --fmt deepmd/npy - write ./merged_dataset  --fmt deepmd/npy

# You can also save data with hdf5 format
ai2-kit tool dpdata read ./workdir/iters-*/train-deepmd/new_dataset/* --fmt deepmd/npy - write ./merged.hdf5 --fmt deepmd/hdf5


# Use lambda expression to filter outlier data
ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - filter "lambda x: x['forces'].max() < 10" - write ./path/to/filtered_dataset

# Add fparam to dataset
ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - set_fparam [0,1] - write ./path/to/filtered_dataset