Training ANN Potentials (Fortran)

Note

Training as described here makes use of ænet’s compiled train.x tool. Make sure to install ænet and configure the paths as described in Installation & Set-up.

Note

Alternative: For a pure Python/PyTorch implementation that does not require Fortran, see PyTorch-based Training.

aenet-python provides tools to facilitate the training of ænet potentials directly from Python scripts. This workflow is managed primarily by the ANNPotential class.

Example notebooks

Jupyter notebooks with examples how to train potentials can be found in the notebooks directory within the repository.

Defining the Network Architecture

Before training, you need to define the architecture of the ANN for each atomic species involved. This is done using a Python dictionary where keys are the element symbols (e.g., “Si”, “O”) and values are lists of tuples. Each tuple represents a layer in the network, specifying the number of nodes and the activation function for that layer.

Supported activation functions are: 'tanh', 'linear', and 'signmoid'.

The final ANN layer is always a linear layer with one node, which outputs the energy for the corresponding atomic species. This layer does not need to be defined.

Example architecture for a silicon potential:

from aenet.mlip import ANNPotential

# Define architecture: Si with two hidden layers
# (10 nodes, tanh activation)
arch = {
    "Si": [(10, 'tanh'), (10, 'tanh')]
}

# Create the potential object
potential = ANNPotential(arch)

Training Configuration

Training parameters are managed through the TrainingConfig class, which centralizes all configuration options with built-in validation. This ensures type safety and prevents invalid parameter combinations.

The TrainingConfig class includes:

  • iterations (int): Maximum number of training iterations. Default: 0

  • method (TrainingMethod): The optimization algorithm to use. Default: Adam()

  • testpercent (int): Percentage of data for test set (0-100). Default: 0

  • max_energy (float, optional): Exclude structures with referenced cohesive or formation energy per atom above this threshold when the trainer builds datasets from raw structures=... input. If atomic_energies is omitted, the filter falls back to all-zero atomic references and uses the provided per-atom labels as-is. Prebuilt datasets must be filtered when they are constructed. Default: None

  • sampling (str, optional): Sampling method (‘sequential’, ‘random’, ‘weighted’, ‘energy’). Default: None

  • timing (bool): Enable detailed timing output. Default: False

  • save_energies (bool): Save predicted energies for training/test sets. Default: False

The configuration validates parameters at creation time, raising ValueError for invalid inputs (e.g., testpercent outside 0-100 range, invalid sampling method).

Training Methods

The training process uses optimization methods to adjust the neural network weights. Each method has specific parameters with sensible defaults. The available training methods are provided as typed classes:

  • Adam - ADAM optimizer (default)

  • BFGS - L-BFGS-B optimizer (no parameters)

  • EKF - Extended Kalman filter

  • LM - Levenberg-Marquardt

  • OnlineSD - Online steepest descent

Each training method class encodes both the algorithm name and its parameters with appropriate defaults based on the aenet Fortran implementation.

Training the Potential

Once the architecture is defined, you can train the potential using the train() method. This method automates several steps:

  1. Checks if the provided training set file exists and is compatible with the defined architecture.

  2. Creates a temporary working directory (or uses a specified one).

  3. Generates the necessary train.in file based on the architecture and training parameters.

  4. Calls the train.x executable from the configured aenet installation.

  5. Monitors the training progress with a progress bar.

  6. Collects the resulting potential files (.nn files), energy files, and timing information into the current directory upon completion.

Basic Training Example:

from aenet.mlip import ANNPotential, TrainingConfig

# Assuming 'potential' is an ANNPotential object defined as above
# and 'data.train' is your training set file.

# Simple training with defaults (uses Adam optimizer)
potential.train('data.train')

# Or customize parameters using TrainingConfig
config = TrainingConfig(iterations=1000, testpercent=10)
potential.train('data.train', config=config)

# Inline configuration also works
potential.train('data.train',
               config=TrainingConfig(iterations=1000, testpercent=10))
print("Training completed successfully.")

Using Different Training Methods:

from aenet.mlip import ANNPotential, TrainingConfig
from aenet.mlip import BFGS, Adam, LM, EKF, OnlineSD

# Use BFGS optimizer
config = TrainingConfig(iterations=1000, method=BFGS())
potential.train('data.train', config=config)

# Customize Adam parameters
config = TrainingConfig(
    iterations=1000,
    method=Adam(mu=0.005, batchsize=200),
    testpercent=10
)
potential.train('data.train', config=config)

# Use Levenberg-Marquardt with additional options
config = TrainingConfig(
    iterations=500,
    method=LM(batchsize=128, learnrate=0.05),
    sampling='random',
    max_energy=100.0
)
potential.train('data.train', config=config)

# Use Extended Kalman filter
config = TrainingConfig(
    iterations=500,
    method=EKF(lambda_=0.95, P=150.0),
    timing=True
)
potential.train('data.train', config=config)

# Use Online steepest descent
config = TrainingConfig(
    iterations=10000,
    method=OnlineSD(gamma=1e-6, alpha=0.3),
    save_energies=True
)
potential.train('data.train', config=config)

Key Parameters for train():

  • trnset_file (str or Path, optional): Path to the training set file. Defaults to 'data.train'.

  • config (TrainingConfig, optional): Training configuration object containing all training parameters (iterations, method, testpercent, max_energy, sampling, timing, save_energies). If not provided, uses default TrainingConfig() with Adam optimizer. Defaults to None.

  • workdir (str or Path, optional): A directory to store temporary files during training. If not provided, a temporary directory is created and removed afterwards. Defaults to None.

  • output_file (str or Path, optional): File path to save the standard output of the train.x executable. Defaults to 'train.out'.

See the TrainingConfig class documentation above for all available configuration parameters.

Training Method Parameters

Adam (default method)

  • mu (float): Learning rate. Default: 0.001

  • b1 (float): Exponential decay rate for first moment estimates. Default: 0.9

  • b2 (float): Exponential decay rate for second moment estimates. Default: 0.999

  • eps (float): Small constant for numerical stability. Default: 1.0e-8

  • batchsize (int): Number of structures per batch. Default: 16

  • samplesize (int): Number of structures to sample per epoch. Default: 100

BFGS

  • No configurable parameters.

  • Note: Not supported on ARM-based Macs.

EKF (Extended Kalman Filter)

  • lambda (float): Forgetting factor. Default: 0.99

  • lambda0 (float): Initial forgetting factor. Default: 0.999

  • P (float): Initial covariance. Default: 100.0

  • mnoise (float): Measurement noise. Default: 0.0

  • pnoise (float): Process noise. Default: 1.0e-5

  • wgmax (int): Maximum weight change. Default: 500

LM (Levenberg-Marquardt)

  • batchsize (int): Number of structures per batch. Default: 256

  • learnrate (float): Learning rate. Default: 0.1

  • iter (int): Number of iterations per epoch. Default: 3

  • conv (float): Convergence criterion. Default: 1e-3

  • adjust (int): Adjustment parameter. Default: 5

OnlineSD (Online Steepest Descent)

  • gamma (float): Learning rate. Default: 1.0e-5

  • alpha (float): Momentum parameter. Default: 0.25

This method requires a configured aenet installation. Use aenet config on the command line to set the paths to the aenet executables.

MPI Parallelization

Training can be accelerated using MPI parallelization if the train.x executable is built with MPI support. This allows training to run across multiple CPU cores or nodes on HPC systems.

Prerequisites

  1. The train.x executable must be compiled with MPI support

  2. MPI must be enabled in the aenet-python configuration:

$ aenet config --enable-mpi
  1. (Optional) Customize the MPI launcher for your system:

# For SLURM systems
$ aenet config --set-mpi-launcher "srun -n {num_proc} {exec}"

# Default is "mpirun -np {num_proc} {exec}"

Using MPI in Training

To enable MPI parallelization, pass the num_processes parameter to the train() method:

from aenet.mlip import ANNPotential, TrainingConfig

# Define architecture
arch = {"Si": [(10, 'tanh'), (10, 'tanh')]}
potential = ANNPotential(arch)

# Standard training (sequential, no MPI)
config = TrainingConfig(iterations=1000)
potential.train('data.train', config=config)

# MPI training with 8 processes
config = TrainingConfig(iterations=1000)
potential.train('data.train', config=config, num_processes=8)

# MPI training with custom configuration
config = TrainingConfig(
    iterations=1000,
    method=Adam(mu=0.005, batchsize=32),
    testpercent=10
)
potential.train('data.train', config=config, num_processes=16)

The num_processes parameter specifies how many MPI processes to use. The actual command executed will be based on the configured MPI launcher. For example, with the default launcher and num_processes=8, the command would be:

mpirun -np 8 /path/to/train.x train.in

Inference with Trained Potentials

Once you have trained ANN potentials, you can use them to make predictions (inference) on new atomic structures. The prediction functionality is integrated into the ANNPotential class, providing a unified interface for both training and inference.