PyTorch-Based Featurization
Note
Featurization as described here makes use of PyTorch. Make sure to
install core torch support plus the matching torch-scatter and
torch-cluster wheels as described in Installation & Set-up.
Note
Alternative: For featurization using ænet’s Fortran-based tools, see Structure featurization.
Overview
The aenet.torch_featurize module provides a pure Python/PyTorch
implementation of the Chebyshev descriptor (AUC method) for atomic
environments [1,2]. In contrast to the Fortran-based implementation,
this implementation exposes gradients via PyTorch’s automatic
differentiation mechanism and GPU support. On CPUs, it is typically
slower than the Fortran implementation.
The PyTorch implementation is a drop-in replacement for the traditional Fortran-based featurization workflow and yields (numerically) identical results.
[1] N. Artrith, A. Urban, and G. Ceder, Phys. Rev. B 96, 2017, 014112 (link1).
[2] A. M. Miksch, T. Morawietz, J. Kästner, A. Urban, N. Artrith, Mach. Learn.: Sci. Technol. 2, 2021, 031001 (link2).
Example notebooks
For a longer workflow-oriented walkthrough, including file-based input, batch processing, gradient computation, and optional GPU execution, see example-04-torch-featurization.ipynb.
Additional notebooks are available in the notebooks directory within the repository.
Basic Usage
High-Level API with AtomicStructure Objects (Recommended)
The TorchAUCFeaturizer class provides a high-level API that is compatible
with the Fortran-based AenetAUCFeaturizer. This is the recommended approach
for most users as it works directly with AtomicStructure objects:
>>> import numpy as np
>>> from aenet.geometry import AtomicStructure
>>> from aenet.torch_featurize import TorchAUCFeaturizer
>>> structure = AtomicStructure(
... np.array([
... [0.0, 0.0, 0.12],
... [0.0, 0.76, -0.47],
... [0.0, -0.76, -0.47],
... ]),
... ['O', 'H', 'H'],
... )
>>> descriptor = TorchAUCFeaturizer(
... typenames=['O', 'H'],
... rad_order=10,
... rad_cutoff=4.0,
... ang_order=3,
... ang_cutoff=1.5,
... )
>>> featurized = descriptor.featurize_structure(structure)
>>> featurized.atom_features.shape
(3, 30)
The TorchAUCFeaturizer inherits from AtomicFeaturizer and returns
FeaturizedAtomicStructure objects, providing full API compatibility with
the Fortran-based workflow. This makes it easy to switch between implementations
or integrate with existing code. For file-based input and longer multi-structure
workflows, prefer the notebook example above.
Low-Level API with PyTorch Tensors (For Advanced Users)
For advanced users who need direct access to PyTorch operations and gradients,
the ChebyshevDescriptor class provides a lower-level interface:
>>> import torch
>>> from aenet.torch_featurize import ChebyshevDescriptor
>>> descriptor = ChebyshevDescriptor(
... species=['O', 'H'],
... rad_order=10,
... rad_cutoff=4.0,
... ang_order=3,
... ang_cutoff=1.5,
... )
>>> positions = torch.tensor([
... [0.0, 0.0, 0.12],
... [0.0, 0.76, -0.47],
... [0.0, -0.76, -0.47],
... ], dtype=torch.float64)
>>> species = ['O', 'H', 'H']
>>> features = descriptor.forward_from_positions(positions, species)
>>> features.shape
torch.Size([3, 30])
This low-level API is useful when you need gradient computation for force training or other differentiable operations. The notebook example extends this with an explicit gradient workflow.
Periodic Systems
For crystals with periodic boundary conditions, use the low-level API:
>>> import torch
>>> from aenet.torch_featurize import ChebyshevDescriptor
>>> positions = torch.tensor([
... [0.0, 0.0, 0.0],
... [0.0, 2.0, 2.0],
... [2.0, 0.0, 2.0],
... [2.0, 2.0, 0.0],
... ], dtype=torch.float64)
>>> species = ['Cu', 'Cu', 'Au', 'Au']
>>> cell = torch.tensor([
... [4.0, 0.0, 0.0],
... [0.0, 4.0, 0.0],
... [0.0, 0.0, 4.0],
... ], dtype=torch.float64)
>>> pbc = torch.tensor([True, True, True], dtype=torch.bool)
>>> descriptor = ChebyshevDescriptor(
... species=['Au', 'Cu'],
... rad_order=8,
... rad_cutoff=3.5,
... ang_order=5,
... ang_cutoff=3.5,
... )
>>> features = descriptor.forward_from_positions(
... positions, species, cell=cell, pbc=pbc
... )
>>> features.shape
torch.Size([4, 30])
Or use the high-level API with TorchAUCFeaturizer which handles
periodic structures automatically from AtomicStructure objects.
GPU Acceleration
Enable GPU acceleration by specifying the device when creating the descriptor:
import torch
from aenet.torch_featurize import ChebyshevDescriptor
if torch.cuda.is_available():
# Create descriptor on GPU
descriptor = ChebyshevDescriptor(
species=['O', 'H'],
rad_order=10,
rad_cutoff=4.0,
ang_order=3,
ang_cutoff=1.5,
device='cuda',
)
# Input tensors are moved to the configured device internally
features = descriptor.forward_from_positions(positions, species)
The complete GPU walkthrough is kept in example-04-torch-featurization.ipynb so the base docs-example job can remain CPU-only.
Both TorchAUCFeaturizer and ChebyshevDescriptor support GPU
acceleration via the device parameter.
Batch Featurization
For efficient processing of multiple structures (e.g., during training), use the
BatchedFeaturizer class which wraps a ChebyshevDescriptor and processes
structures in batch:
import torch
from aenet.torch_featurize import ChebyshevDescriptor, BatchedFeaturizer
# Create base descriptor
descriptor = ChebyshevDescriptor(
species=['O', 'H'],
rad_order=10,
rad_cutoff=4.0,
ang_order=3,
ang_cutoff=1.5
)
# Wrap in BatchedFeaturizer for efficient batch processing
batch_featurizer = BatchedFeaturizer(descriptor)
# Prepare batch of structures (different sizes allowed)
batch_positions = [
torch.tensor(
[[0.0, 0.0, 0.0], [1.0, 0.0, 0.0], [0.0, 1.0, 0.0]],
dtype=torch.float64,
),
torch.tensor(
[[0.0, 0.0, 0.0], [1.0, 0.0, 0.0]],
dtype=torch.float64,
),
]
batch_species = [
['O', 'H', 'H'],
['O', 'H'],
]
# Process entire batch at once
features, batch_indices = batch_featurizer(batch_positions, batch_species)
print(features.shape) # torch.Size([5, 30])
print(batch_indices.tolist()) # [0, 0, 0, 1, 1]
The BatchedFeaturizer returns:
features: Concatenated feature tensor of shape(total_atoms, n_features)batch_indices: Tensor indicating which structure each atom belongs to
This is particularly useful in training loops where you need to process batches
of structures efficiently. For periodic systems, you can also provide
batch_cells and batch_pbc lists. The notebook example keeps the longer
batch, gradient, and GPU-oriented workflow in one place.
Performance Considerations
Angular cutoff has the largest impact on performance (scales as N²)
GPU acceleration most beneficial for systems with >100 atoms
Batch processing with
BatchedFeaturizerimproves throughput