Analytical Gradients for Chebyshev AUC Descriptors
Overview
This document summarizes the semi-analytical, vectorized gradient implementation
for the AUC (Artrith-Urban-Ceder) Chebyshev descriptor in aenet-python.
It explains the design goals, numerical stability measures, and the code
structure that computes gradients of both radial and angular symmetry functions
with respect to atomic positions.
Motivation
Performance: Avoid the slow feature-by-feature autograd loop by vectorizing across all pairs and triplets.
Robustness: Prevent NaNs arising from zero-distance image interactions in periodic systems by avoiding differentiating through norms in autograd.
Key Interfaces
Module:
src/aenet/torch_featurize/featurize.pyClass:
ChebyshevDescriptorFeature computation entry points: -
forward_from_positions(positions, species, cell, pbc)-forward(positions, species, neighbor_indices, neighbor_vectors)Gradient computation:
compute_feature_gradients(positions, species, cell, pbc)returns(features, gradients)with shapefeatures: (N, F),gradients: (N, F, N, 3)Internal methods: -
_compute_radial_gradients(...)-_compute_angular_gradients(...)
Numerical Stability
Distances are always computed with small epsilons to avoid singularities: -
d = sqrt(r · r + eps_dist)witheps_dist = 1e-20- Unit vectors normalized asu = r / (|r| + eps_norm)witheps_norm = 1e-12Angular cosine clamped to the valid range: -
cos_theta = dot(u_ij, u_ik).clamp(-1, 1)No autograd is used for the geometric derivatives (e.g., w.r.t. coordinates); instead, closed-form expressions are applied (see below).
Cutoff functions are smooth (cosine cutoff), and their derivatives are used analytically.
Radial Gradients (Summary)
For each neighbor pair (i, j) with displacement r_ij and distance
d_ij = ||r_ij||:
Basis and derivative from
RadialBasis: -G_rad(d_ij)anddG_rad/ddChain rule to coordinates: -
dG/dr_ij_vec = (dG/dd) * (r_ij / |r_ij|)
Contributions are accumulated to atom i (negative sign) and atom j
(positive sign) for both unweighted and typespin-weighted features. The
accumulation uses vectorized index_add_ into a flattened center/target
index to achieve good performance.
Angular Gradients (Summary)
For each neighbor triplet (i, j, k) with displacements r_ij, r_ik:
Distances and unit vectors: -
d_ij = ||r_ij||,d_ik = ||r_ik||-u_ij = r_ij / (|r_ij| + eps_norm), etc.Cosine of the angle at
i: -cos_theta = dot(u_ij, u_ik)(clamped to[-1, 1])Basis partial derivatives from
AngularBasis: -dG/dcos,dG/dr_ij,dG/dr_ik
Geometric derivatives (see detailed derivation)
Final chain rule to coordinates
Implementation Notes
Code:
ChebyshevDescriptor._compute_angular_gradients- Vectorized derivation for all triplets in a structure. - Avoidstorch.autograd.gradon geometric terms; only uses analytical expressions and partials provided byAngularBasis.forward_with_derivatives. - Accumulation for unweighted and typespin-weighted angular features is done via flattenedindex_add_into an(n_atoms * n_atoms, 3)workspace per feature index, reshaped back to(n_atoms, n_atoms, 3).Feature ordering (consistent with Fortran): - Single species:
[radial, angular]- Multi-species:[rad_unweighted, ang_unweighted, rad_weighted, ang_weighted]
Testing and Validation
Periodic gradient tests in
src/aenet/torch_featurize/tests/test_gradients.py::TestPeriodicGradientsNaN reproduction check:
reproduce_nan.py(verifies that no NaNs occur under PBC self-interactions)Stress tests:
test_stress_gradients.py(small cells, dense neighbor environments, multi-species code paths)Finite-difference spot checks to confirm analytical correctness.
Performance Considerations
The radial part is fully vectorized.
The angular accumulation currently loops over
n_angfor index-add efficiency and clarity; if profiling identifies it as a bottleneck, a batched scatter approach can be implemented to remove this small loop.
Analytical Gradient of Angular Symmetry Functions
Key definitions
r_ij = ||r_j - r_i||, r_ik = ||r_k - r_i||
u_ij = (r_j - r_i) / r_ij, u_ik = (r_k - r_i) / r_ik
cos(theta) = u_ij · u_ik (clamped to [-1, 1])
Angular symmetry function:
G_ang = T_n(cos(theta)) * f_c(r_ij) * f_c(r_ik)
Partials from basis:
dG/d(cos) = dT_n/d(cos(theta)) * f_c(r_ij) * f_c(r_ik)
dG/dr_ij = T_n(cos(theta)) * df_c(r_ij)/dr_ij * f_c(r_ik)
dG/dr_ik = T_n(cos(theta)) * f_c(r_ij) * df_c(r_ik)/dr_ik
Geometric derivatives:
dcos/dr_j = (1/r_ij) * (u_ik - cos(theta) * u_ij)
dcos/dr_k = (1/r_ik) * (u_ij - cos(theta) * u_ik)
dcos/dr_i = - (dcos/dr_j + dcos/dr_k)
Final gradients:
dG/dr_j = (dG/dcos) * (dcos/dr_j) + (dG/dr_ij) * u_ij
dG/dr_k = (dG/dcos) * (dcos/dr_k) + (dG/dr_ik) * u_ik
dG/dr_i = (dG/dcos) * (dcos/dr_i) - (dG/dr_ij) * u_ij - (dG/dr_ik) * u_ik
Notes
Distances and unit vectors are evaluated with small epsilons for numerical stability (see analytical_gradients.rst).
These expressions avoid autograd on geometric quantities and are used directly in the vectorized implementation.