ænet training set files
The Python class TrnSet in aenet.trainset module, can be used to
interact with data set files produced by ænet’s generate.x tool.
Note
The ænet executable trnset2ASCII.x needs to be configured to read
training set files. See also Installation & Set-up.
File formats
Internally, ænet uses unformatted Fortran binary files to store the
featurized atomic structure data for training. Since the
format of such files is compiler dependent, it is not straightforward to
parse them directly with Python. Instead, the TrnSet class converts
binary data set files to plain text using the trnset2ASCII.x ænet
tool. Text-based files can be further converted to HDF5 format to save
space and to allow for more efficient I/O. These conversions are done
transparently:
from aenet.trainset import TrnSet
ts = TrnSet.from_file('data.train')
This opens the training set file data.train which can be in any of
the three supported formats (Fortran binary, ASCII text, or HDF5).
For HDF5 inputs, ts.schema reports whether the file uses the classic
TrnSet layout or the HDF5StructureDataset layout.
API Reference
- class aenet.trainset.TrnSet(name: str, normalized: bool, scale: float, shift: float, atom_types: List[str], atomic_energy: List[float], num_atoms_tot: int, num_structures: int, E_min: float, E_max: float, E_av: float, filename: PathLike = None, fileformat: str = 'ascii', schema: str = None, origin: PathLike = None, has_persisted_features: bool = False, **kwargs)[source]
Class for parsing aenet training set files.
- Attention: atom type indices here internally start with zero
(whereas they start with 1 in Fortran)
- _ascii_skip_header()[source]
Skip over training set file header until first atomic structure.
- static _decode_hdf5_text(value) str[source]
- _initialize_torch_training_hdf5_state()[source]
- static _is_torch_training_hdf5(h5file) bool[source]
- static _is_trnset_hdf5(h5file) bool[source]
- _load_torch_training_structure(idx: int)[source]
- classmethod _load_torch_training_structure_from_handle(h5file, idx: int)[source]
- _read_cell_info_hdf5(structure_idx: int)[source]
Read cell information for a specific structure from HDF5.
- _read_neighbor_info_hdf5(structure_idx: int, num_atoms: int) dict[source]
Read neighbor information for a specific structure from HDF5.
Uses per-structure group format for O(1) lookup.
- Parameters:
structure_idx – Index of the structure
num_atoms – Number of atoms in structure
- Returns:
‘neighbor_counts’: (n_atoms,) array of neighbor counts
’neighbor_lists’: List of (nnb,) arrays with neighbor indices
- ’neighbor_vectors’: List of (nnb, 3) arrays with
displacement vectors
- Return type:
Dictionary with neighbor information
- _read_next_structure_ascii(read_coords, read_forces)[source]
Read next atomic structure from file.
- _read_next_structure_hdf5(read_coords, read_forces)[source]
- _read_structure_hdf5(idx, read_coords, read_forces)[source]
- _read_structure_hdf5_torch_training(idx, read_coords, read_forces)[source]
- _read_structure_hdf5_trnset(idx, read_coords, read_forces)[source]
- classmethod _read_torch_training_atom_types(h5file) List[str][source]
- classmethod _read_torch_training_atomic_energies(h5file) dict[source]
- _read_torch_training_features(idx: int, n_atoms: int) ndarray[source]
- classmethod _read_torch_training_hdf5_metadata(h5file) dict[source]
- static _torch_training_has_persisted_features(h5file) bool[source]
- _torch_training_structure_path(row, idx: int) str[source]
- close()[source]
- classmethod from_ascii_file(ascii_file: PathLike, **kwargs)[source]
Load training set from aenet ASCII file.
- Parameters:
ascii_file – path to an aenet training set file in ASCII format
- classmethod from_file(filename: PathLike, file_format: str = 'guess', **kwargs)[source]
- classmethod from_fortran_binary_file(binary_file: PathLike, ascii_file: PathLike = None, **kwargs)[source]
First convert training set file in Fortran binary format to ASCII format, then open it. This requires the tool ‘trnset2ASCII.x’.
- classmethod from_hdf5_file(hdf5_file: PathLike, **kwargs)[source]
- has_neighbor_info() bool[source]
Check if the training set file contains neighbor information.
- Returns:
True if neighbor information is available (only for HDF5 format), False otherwise.
- iter_structures(read_coords=False, read_forces=False)[source]
- property num_types
- open()[source]
Open training set file for reading.
- read_next_structure(read_coords=False, read_forces=False)[source]
- read_structure(idx: int, read_coords=False, read_forces=False)[source]
- rewind()[source]
- to_hdf5(filename: PathLike, complevel: int = 1)[source]
Save data set to HDF5 file.