Advanced Structure Transformations

This guide covers advanced features including transformation chains, stochastic sampling, reproducibility, and iterator composition patterns.

See Basic Structure Transformations for an introduction to individual transformations.

Transformation Chains

TransformationChain allows sequential application of multiple transformations using depth-first streaming. This is useful for generating complex structural variations by composing simple transformations.

Basic Chain Example

from aenet.geometry import AtomicStructure
from aenet.geometry.transformations import (
    AtomDisplacementTransformation,
    CellVolumeTransformation,
    TransformationChain
)

structure = AtomicStructure.from_file('structure.xsf')

# Create a chain of transformations
chain = TransformationChain([
    AtomDisplacementTransformation(displacement=0.05),
    CellVolumeTransformation(min_percent=-2, max_percent=2, steps=3)
])

# Process structures lazily
for s in chain.apply_transformation(structure):
    process(s)

# Or get all structures at once
all_structures = list(chain.apply_transformation(structure))
print(f"Generated {len(all_structures)} structures")
# Output: Generated 72 structures (24 displaced × 3 volumes)

How chains work:

The chain applies transformations sequentially using depth-first streaming:

First transformation generates structures from input
Each generated structure is immediately passed to the second transformation
Results flow through the entire chain before the next structure is processed

This depth-first approach ensures correct multiplicative behavior and memory efficiency.

Controlling Output Size

Since chains can produce many structures, use standard Python tools to limit output:

Using itertools.islice():

import itertools

chain = TransformationChain([
    AtomDisplacementTransformation(displacement=0.05),
    CellVolumeTransformation(min_percent=-5, max_percent=5, steps=10)
])

# Get only first 100 structures
first_100 = list(itertools.islice(
    chain.apply_transformation(structure), 100
))

Using a counter:

results = []
for i, s in enumerate(chain.apply_transformation(structure)):
    if i >= 1000:
        break
    results.append(s)

Filtering with conditions:

# Only keep structures meeting criteria
good_structures = [
    s for s in itertools.islice(
        chain.apply_transformation(structure), 10000
    )
    if meets_criteria(s)
][:100]  # Keep first 100 that meet criteria

Iterator Composition

Chains are iterators, so they compose naturally with Python’s itertools:

Interleaving multiple chains:

import itertools

chain1 = TransformationChain([transform1, transform2])
chain2 = TransformationChain([transform3, transform4])

# Alternate between chains
combined = itertools.chain(
    chain1.apply_transformation(structure),
    chain2.apply_transformation(structure)
)

Processing in batches:

def batch_iter(iterable, batch_size):
    """Yield batches of structures."""
    iterator = iter(iterable)
    while True:
        batch = list(itertools.islice(iterator, batch_size))
        if not batch:
            break
        yield batch

# Process 10 structures at a time
for batch in batch_iter(chain.apply_transformation(structure), 10):
    process_batch(batch)

Parallel processing:

from multiprocessing import Pool

def process_structure(s):
    # Expensive computation
    return result

# Generate structures lazily, process in parallel
with Pool(4) as pool:
    results = pool.map(
        process_structure,
        itertools.islice(chain.apply_transformation(structure), 1000)
    )

Stochastic Transformations

RandomDisplacementTransformation generates random atomic displacement vectors. The displacements can be optionally orthonormalized to create an independent basis of perturbations. Unlike deterministic transformations, it uses randomness and requires careful attention to reproducibility.

Basic Usage

from aenet.geometry.transformations import RandomDisplacementTransformation

# Generate random structures with 0.1 Å RMS displacement
# By default, generates 3N-3 orthonormal structures
transform = RandomDisplacementTransformation(
    rms=0.1,
    random_state=42,  # Seed for reproducibility
    orthonormalize=True,  # Generate orthonormal basis (default)
    remove_translations=True  # Remove translational modes (default)
)

# Get all random structures
random_structures = list(transform.apply_transformation(structure))
print(f"Generated {len(random_structures)} random structures")

# Generate non-orthonormalized random samples
transform_random = RandomDisplacementTransformation(
    rms=0.1,
    max_structures=50,  # Specify number of samples
    orthonormalize=False,  # Just random perturbations
    random_state=42
)

Parameters:

rms: Target root-mean-square displacement in Angstroms
max_structures: Maximum number of structures to generate. If None, defaults to 3N-3 (default: None)
random_state: Integer seed or numpy Generator for reproducibility
orthonormalize: If True, generate orthonormal basis via QR decomposition (default: True)
remove_translations: If True, remove 3 uniform translation modes (default: True)

Algorithm Details

When orthonormalize=True (default):

The transformation generates orthonormal displacement vectors in 3N-dimensional space (N atoms, 3 coordinates per atom):

Generate random 3N×M matrix (M = max_structures)
Apply QR decomposition to obtain orthonormal columns
Optionally project out 3 translational modes
Re-orthonormalize remaining vectors via second QR decomposition
Normalize each vector to target RMS

When orthonormalize=False:

Random displacement samples are generated independently:

Generate random 3N-dimensional vector from normal distribution
Optionally remove center-of-mass translation
Normalize to target RMS
Repeat for each structure (no orthogonality constraint)

RMS displacement is defined as:

\[\text{RMS} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \|\mathbf{d}_i\|^2}\]

where \(\mathbf{d}_i\) is the displacement vector for atom \(i\).

Orthogonality ensures displacement vectors are mutually independent:

\[\mathbf{v}_i \cdot \mathbf{v}_j = \delta_{ij}\]

where \(\delta_{ij}\) is the Kronecker delta (1 if i=j, 0 otherwise).

Translation Mode Removal

When remove_translations=True, the three uniform translational modes are removed:

\[\mathbf{t}_x = (\underbrace{1, 0, 0, 1, 0, 0, \ldots}_{\text{N atoms}}), \quad \mathbf{t}_y = (0, 1, 0, 0, 1, 0, \ldots), \quad \mathbf{t}_z = (0, 0, 1, 0, 0, 1, \ldots)\]

After projection, 3N-3 orthonormal vectors remain. For a single atom (3 degrees of freedom), this results in zero vectors, so the transformation yields no structures.

Re-orthonormalization after projection is critical to maintain mutual orthogonality of the remaining vectors.

Reproducibility

For scientific reproducibility, always control randomness in stochastic transformations.

Using Integer Seeds

# Same seed → identical results
transform1 = RandomDisplacementTransformation(rms=0.1, random_state=42)
transform2 = RandomDisplacementTransformation(rms=0.1, random_state=42)

results1 = list(transform1.apply_transformation(structure))
results2 = list(transform2.apply_transformation(structure))

# results1 and results2 are identical

Using numpy Generator

For more control, use a numpy Generator:

import numpy as np

# Create explicit generator
rng = np.random.default_rng(seed=12345)

transform = RandomDisplacementTransformation(
    rms=0.1,
    max_structures=50,
    random_state=rng
)

# Generator state is advanced after use
results = list(transform.apply_transformation(structure))

# Reusing same generator produces different results (state advanced)
more_results = list(transform.apply_transformation(structure))

Reproducibility Checklist

For fully reproducible workflows:

Pin package versions in requirements.txt or environment.yml
Always set random_state for stochastic transformations
Document seeds in scripts and logs
Version control transformation parameters
Record numpy version (RNG implementation may vary)
Log all generated structures with metadata

Example:

import json
import numpy as np
from aenet.geometry.transformations import RandomDisplacementTransformation

# Configuration
config = {
    'rms': 0.1,
    'max_structures': 100,
    'seed': 42,
    'orthonormalize': True,
    'remove_translations': True,
    'numpy_version': np.__version__
}

# Save configuration
with open('transform_config.json', 'w') as f:
    json.dump(config, f, indent=2)

# Apply transformation
transform = RandomDisplacementTransformation(
    rms=config['rms'],
    max_structures=config['max_structures'],
    random_state=config['seed'],
    orthonormalize=config['orthonormalize'],
    remove_translations=config['remove_translations']
)

structures = list(transform.apply_transformation(structure))

Complete Workflow Example

Generate training data for a machine learning potential:

from aenet.geometry import AtomicStructure
from aenet.geometry.transformations import (
    RandomDisplacementTransformation,
    CellVolumeTransformation,
    IsovolumetricStrainTransformation,
    TransformationChain
)
import itertools

# Load reference structure
structure = AtomicStructure.from_file('reference.xsf')

# Define workflow: random displacements + volume + strain
chain = TransformationChain([
    RandomDisplacementTransformation(
        rms=0.08,
        max_structures=20,
        random_state=42
    ),
    CellVolumeTransformation(
        min_percent=-3,
        max_percent=3,
        steps=3
    ),
    IsovolumetricStrainTransformation(
        direction=1,
        len_min=0.95,
        len_max=1.05,
        steps=3
    )
])

# Generate up to 500 structures
print("Generating training structures...")
training_structures = list(itertools.islice(
    chain.apply_transformation(structure), 500
))
print(f"Generated {len(training_structures)} structures")

# Save for training
for i, s in enumerate(training_structures):
    s.to_file(f'training_data/struct_{i:04d}.xsf')

print("Training data generation complete")

This workflow:

Generates 20 random displacements (orthonormal, RMS=0.08 Å)
Applies 3 volume scalings to each
Applies 3 isovolumetric strains to each
Limits total output to 500 structures
Saves all structures for training

Expected output: 20 × 3 × 3 = 180 structures (or 500 if limited)

Performance Considerations

Computational Complexity:

AtomDisplacementTransformation: O(N) per structure, yields 3N structures
CellVolumeTransformation: O(1) per structure, yields M structures
IsovolumetricStrainTransformation: O(1) per structure, yields M structures
ShearStrainTransformation: O(1) per structure, yields M structures
RandomDisplacementTransformation: O(N²M) for QR decomposition (when orthonormalize=True)

where N = number of atoms, M = number of steps/structures.

Memory Usage:

Iterator-based design: Only one structure in memory at a time
Use list() only when you need all structures at once
Process structures as they’re generated for maximum efficiency

Recommendations:

For small systems (< 100 atoms): No special considerations needed
For medium systems (100-1000 atoms): Use itertools.islice() to limit output
For large systems (> 1000 atoms): Process structures one at a time
QR decomposition for N=100 atoms, M=100 structures: < 1 second

Memory-Efficient Patterns

Process and discard:

# Don't store all structures
for s in chain.apply_transformation(structure):
    result = expensive_calculation(s)
    save_result(result)
    # Structure is discarded after processing

Accumulate results, not structures:

# Store computed properties, not structures
energies = []
for s in chain.apply_transformation(structure):
    energy = compute_energy(s)
    energies.append(energy)

Stream to disk:

# Write structures as they're generated
for i, s in enumerate(chain.apply_transformation(structure)):
    if i >= 1000:
        break
    s.to_file(f'output_{i:04d}.xsf')

Combining Deterministic and Stochastic

Mix deterministic and stochastic transformations for comprehensive sampling:

import itertools

# Deterministic baseline
deterministic_chain = TransformationChain([
    CellVolumeTransformation(min_percent=-5, max_percent=5, steps=5),
    IsovolumetricStrainTransformation(direction=1, len_min=0.9, len_max=1.1, steps=5)
])

# Stochastic perturbations
stochastic = RandomDisplacementTransformation(rms=0.05, max_structures=10, random_state=42)

# Combine: deterministic structures + random displacements for each
all_structures = []
for base_structure in deterministic_chain.apply_transformation(structure):
    # Original deterministic structure
    all_structures.append(base_structure)
    # Plus random variations
    for displaced in stochastic.apply_transformation(base_structure):
        all_structures.append(displaced)

print(f"Total: {len(all_structures)} structures")
# 25 deterministic + 25×10 random = 275 structures

Troubleshooting

Problem: Too many structures generated

Solution: Use itertools.islice() to limit output

Problem: Out of memory

Solution: Process structures one at a time, don’t use list()

Problem: Non-reproducible results

Solution: Always set random_state for stochastic transformations

Problem: Structures too similar

Solution: Increase displacement/strain magnitudes or reduce steps

Problem: QR decomposition slow

Solution: Reduce max_structures for RandomDisplacementTransformation, or use orthonormalize=False for faster random sampling

Problem: Not enough structures for single atom

Solution: With remove_translations=True, single atoms have 3N-3=0 vectors; use remove_translations=False to get 3 displacement vectors