ænet training set files

The Python class TrnSet in aenet.trainset module, can be used to interact with data set files produced by ænet’s generate.x tool.

Note

The ænet executable trnset2ASCII.x needs to be configured to read training set files. See also Installation & Set-up.

File formats

Internally, ænet uses unformatted Fortran binary files to store the featurized atomic structure data for training. Since the format of such files is compiler dependent, it is not straightforward to parse them directly with Python. Instead, the TrnSet class converts binary data set files to plain text using the trnset2ASCII.x ænet tool. Text-based files can be further converted to HDF5 format to save space and to allow for more efficient I/O. These conversions are done transparently:

from aenet.trainset import TrnSet
ts = TrnSet.from_file('data.train')

This opens the training set file data.train which can be in any of the three supported formats (Fortran binary, ASCII text, or HDF5). For HDF5 inputs, ts.schema reports whether the file uses the classic TrnSet layout or the HDF5StructureDataset layout.

API Reference

class aenet.trainset.TrnSet(name: str, normalized: bool, scale: float, shift: float, atom_types: List[str], atomic_energy: List[float], num_atoms_tot: int, num_structures: int, E_min: float, E_max: float, E_av: float, filename: PathLike = None, fileformat: str = 'ascii', schema: str = None, origin: PathLike = None, has_persisted_features: bool = False, **kwargs)[source]

Class for parsing aenet training set files.

Attention: atom type indices here internally start with zero

(whereas they start with 1 in Fortran)

_ascii_skip_header()[source]

Skip over training set file header until first atomic structure.

static _decode_hdf5_text(value) str[source]
_initialize_torch_training_hdf5_state()[source]
static _is_torch_training_hdf5(h5file) bool[source]
static _is_trnset_hdf5(h5file) bool[source]
_load_torch_training_structure(idx: int)[source]
classmethod _load_torch_training_structure_from_handle(h5file, idx: int)[source]
_read_cell_info_hdf5(structure_idx: int)[source]

Read cell information for a specific structure from HDF5.

Parameters:

structure_idx – Index of the structure

Returns:

  • cell: (3, 3) numpy array of lattice vectors

  • pbc: boolean, True for 3D-periodic

Return type:

Tuple of (cell, pbc) where

_read_neighbor_info_hdf5(structure_idx: int, num_atoms: int) dict[source]

Read neighbor information for a specific structure from HDF5.

Uses per-structure group format for O(1) lookup.

Parameters:
  • structure_idx – Index of the structure

  • num_atoms – Number of atoms in structure

Returns:

  • ‘neighbor_counts’: (n_atoms,) array of neighbor counts

  • ’neighbor_lists’: List of (nnb,) arrays with neighbor indices

  • ’neighbor_vectors’: List of (nnb, 3) arrays with

    displacement vectors

Return type:

Dictionary with neighbor information

_read_next_structure_ascii(read_coords, read_forces)[source]

Read next atomic structure from file.

_read_next_structure_hdf5(read_coords, read_forces)[source]
_read_structure_hdf5(idx, read_coords, read_forces)[source]
_read_structure_hdf5_torch_training(idx, read_coords, read_forces)[source]
_read_structure_hdf5_trnset(idx, read_coords, read_forces)[source]
classmethod _read_torch_training_atom_types(h5file) List[str][source]
classmethod _read_torch_training_atomic_energies(h5file) dict[source]
_read_torch_training_features(idx: int, n_atoms: int) ndarray[source]
classmethod _read_torch_training_hdf5_metadata(h5file) dict[source]
static _torch_training_has_persisted_features(h5file) bool[source]
_torch_training_structure_path(row, idx: int) str[source]
close()[source]
classmethod from_ascii_file(ascii_file: PathLike, **kwargs)[source]

Load training set from aenet ASCII file.

Parameters:

ascii_file – path to an aenet training set file in ASCII format

classmethod from_file(filename: PathLike, file_format: str = 'guess', **kwargs)[source]
classmethod from_fortran_binary_file(binary_file: PathLike, ascii_file: PathLike = None, **kwargs)[source]

First convert training set file in Fortran binary format to ASCII format, then open it. This requires the tool ‘trnset2ASCII.x’.

classmethod from_hdf5_file(hdf5_file: PathLike, **kwargs)[source]
has_neighbor_info() bool[source]

Check if the training set file contains neighbor information.

Returns:

True if neighbor information is available (only for HDF5 format), False otherwise.

iter_structures(read_coords=False, read_forces=False)[source]
property num_types
open()[source]

Open training set file for reading.

read_next_structure(read_coords=False, read_forces=False)[source]
read_structure(idx: int, read_coords=False, read_forces=False)[source]
rewind()[source]
to_hdf5(filename: PathLike, complevel: int = 1)[source]

Save data set to HDF5 file.