Skip to content

Builder

System assembly: polymer chain construction from CGSmiles topology and monomer libraries.

Quick reference

Symbol Summary Preferred for
PolymerBuilder Build chains from CGSmiles + library + connector + placer Full control over assembly
polymer(cgsmiles, ...) CGSmiles → chain in one call Quick prototyping
Connector Port selection rules + reaction binding Defining which ports react
Placer Geometric placement (separator + orienter) Controlling inter-monomer geometry
CovalentSeparator Covalent radii-based distance Default monomer spacing
LinearOrienter Linear chain orientation Default growth direction

Canonical example

from molpy.builder.polymer import (
    PolymerBuilder, Connector, Placer,
    CovalentSeparator, LinearOrienter,
)
from molpy.builder import polymer

builder = PolymerBuilder(
    library={"EO": eo_template},
    connector=Connector(port_map={("EO","EO"): (">","<")}, reacter=rxn),
    placer=Placer(separator=CovalentSeparator(buffer=-0.1),
                  orienter=LinearOrienter()),
)
result = builder.build("{[#EO]|10}")
chain = result.polymer

# Or use the one-call entry function:
result = polymer("{[#EO]|10}", library={"EO": eo_template}, reacter=rxn)
chain = result.polymer

Full API

Crystal

crystal

Crystal lattice builder.

Tile a Bravais lattice over a range of unit cells and (optionally) clip the result to a geometric :class:molpy.core.region.Region.

Example

from molpy.builder import Lattice, build_crystal from molpy.core.region import BoxRegion lat = Lattice.fcc(a=3.52, species="Ni") structure = build_crystal(lat, repeats=(4, 4, 4))

or clip a 30 Å cube out of a larger tile:

structure = build_crystal(lat, BoxRegion(lengths=[30, 30, 30]))

Lattice

Lattice(cell, basis=None)

Bravais lattice = cell matrix + list of basis :class:Site objects.

The cell matrix stores the three lattice vectors as rows::

cell = [[a1x, a1y, a1z],
        [a2x, a2y, a2z],
        [a3x, a3y, a3z]]

Construct directly with a matrix, or use :meth:from_vectors / :meth:sc / :meth:bcc / :meth:fcc / :meth:rocksalt.

bcc classmethod
bcc(a, species)

Body-centered cubic lattice (2 atoms / cell).

cart_to_frac
cart_to_frac(cart)

Cartesian → fractional: cart @ cell⁻¹.

fcc classmethod
fcc(a, species)

Face-centered cubic lattice (4 atoms / cell).

frac_to_cart
frac_to_cart(frac)

Fractional → Cartesian: frac @ cell.

from_vectors classmethod
from_vectors(a1, a2, a3, basis=None)

Build a lattice from three lattice vectors.

rocksalt classmethod
rocksalt(a, species_a, species_b)

Rocksalt (NaCl) structure — two interpenetrating FCC sublattices.

sc classmethod
sc(a, species)

Simple cubic lattice (1 atom / cell).

with_site
with_site(site)

Return a new lattice with site appended to the basis.

Site dataclass

Site(label, species, fractional, charge=0.0, attrs=None)

Lattice basis site in fractional coordinates.

Attributes:

Name Type Description
label str

Site identifier (e.g. "A", "B1").

species str

Chemical species or type name (e.g. "Ni", "Cl").

fractional tuple[float, float, float]

Fractional coordinates (u, v, w) relative to the cell.

charge float

Partial charge (default 0.0).

attrs dict[str, Any] | None

Optional auxiliary attributes.

build_crystal

build_crystal(lattice, region=None, *, repeats=None)

Tile lattice and (optionally) clip to a Cartesian region.

Parameters:

Name Type Description Default
lattice Lattice

Bravais lattice with basis sites.

required
region Region | None

Geometric region in Cartesian space (e.g. :class:molpy.core.region.BoxRegion, SphereRegion, or any Region combination via & | ~). Atoms outside the region are discarded.

None
repeats tuple[int, int, int] | None

Number of unit cells along each lattice vector, (nx, ny, nz). If omitted, the tile range is inferred from region.bounds. At least one of region or repeats must be provided.

None

Returns:

Type Description
Atomistic

class:Atomistic containing the kept atoms and a box set to the

Atomistic

full tiled super-cell (cell scaled row-wise by repeats).

Polymer

polymer

Polymer assembly module.

Provides linear polymer assembly with both topology-only and chemical reaction connectors, plus optional geometric placement via Placer strategies.

BuildPolymer dataclass

BuildPolymer(reaction_preset='dehydration', use_placer=True)

Bases: Tool

Build a polymer chain from CGSmiles notation and a monomer library.

Preferred for
  • Assembling a single chain from pre-prepared monomers.
  • Iterating over a system plan to build chains one at a time.
Avoid when
  • You want end-to-end build from a string (use polymer() or BuildSystem).
  • You need custom reaction logic (use PolymerBuilder directly).

Attributes:

Name Type Description
reaction_preset str

Name of reaction preset (default "dehydration").

use_placer bool

Enable geometric placement of monomers.

run
run(cgsmiles, library)

Build a polymer chain.

Parameters:

Name Type Description Default
cgsmiles str

CGSmiles notation (e.g. "{[#EO]|10}").

required
library dict[str, Atomistic]

Mapping from label to prepared Atomistic monomer.

required

Returns:

Type Description
dict[str, Any]

Dict with "polymer" (Atomistic), "total_steps" (int),

dict[str, Any]

and "connection_history" (list).

BuildPolymerAmber dataclass

BuildPolymerAmber(reaction_preset='dehydration', force_field='gaff2', charge_method='bcc', conda_env=None, work_dir='amber_work')

Bases: Tool

Build a polymer chain using the AmberTools backend.

Uses antechamber, parmchk2, prepgen, and tleap to assemble a polymer from a CGSmiles string and a monomer library. Returns both MolPy structures and AMBER topology/coordinate files.

Preferred for
  • Polymer systems that need AMBER force field parameters (GAFF/GAFF2).
  • Workflows that feed into AMBER or LAMMPS with AMBER-style inputs.
Avoid when
  • You do not need force field parameters (use BuildPolymer).
  • AmberTools is not installed.

Attributes:

Name Type Description
reaction_preset str | None

Named preset for leaving group detection. When None, hydrogen atoms bonded to port atoms are auto-detected.

force_field str

Amber force field ("gaff" or "gaff2").

charge_method str

Antechamber charge method.

conda_env str | None

Conda environment containing AmberTools.

work_dir str

Directory for intermediate files.

run
run(cgsmiles, library)

Build a polymer using AmberTools.

Parameters:

Name Type Description Default
cgsmiles str

CGSmiles notation (e.g. "{[#EO]|10}").

required
library dict[str, Atomistic]

Mapping from label to prepared Atomistic monomer. Each monomer must have port="<" (head) and port=">" (tail) annotations.

required

Returns:

Type Description
dict[str, Any]

Dict with "frame", "forcefield", "prmtop_path",

dict[str, Any]

"inpcrd_path", "pdb_path", "monomer_count".

BuildSystem dataclass

BuildSystem(reaction_preset='dehydration', add_hydrogens=True, optimize=True, random_seed=None)

Bases: Tool

End-to-end polymer system construction from G-BigSMILES.

Parses a G-BigSMILES string and delegates to the GBigSmilesCompiler to produce a list of Atomistic chains.

Preferred for
  • Building a complete polydisperse system in one call.
  • When you do not need to inspect the system plan before building.
Avoid when
  • You need to inspect or modify the plan first (use PlanSystem + BuildPolymer).
  • You need the Amber backend (use BuildPolymerAmber).

Attributes:

Name Type Description
reaction_preset str

Name of reaction preset.

add_hydrogens bool

Add explicit hydrogens during monomer preparation.

optimize bool

Optimize monomer geometry.

random_seed int | None

Random seed for reproducibility.

run
run(gbigsmiles)

Build a polymer system from a G-BigSMILES string.

Parameters:

Name Type Description Default
gbigsmiles str

G-BigSMILES notation (e.g. "{[<]CCOCC[>]}|schulz_zimm(1500,3000)||5e5|").

required

Returns:

Type Description
list[Atomistic]

List of Atomistic structures (one per chain).

Chain dataclass

Chain(dp, monomers, mass)

Represents a single polymer chain.

Attributes:

Name Type Description
dp int

Degree of polymerization (number of monomers)

monomers list[str]

List of monomer identifiers in the chain

mass float

Total mass of the chain (g/mol)

Connector

Connector(reacter, *, port_map=None, overrides=None)

Select ports and execute reactions between adjacent monomers.

Port selection strategy (applied in order): 1. Explicit port_map lookup for (left_label, right_label) 2. Compatibility: > on left pairs with < on right 3. Single-port: each side has exactly one unconsumed port 4. Common name: both sides share a port name (for $ ports) 5. Raise AmbiguousPortsError

connect
connect(left, right, left_type, right_type, port_atom_L, port_atom_R, typifier=None)

Execute the chemical reaction between two structures.

get_reacter
get_reacter(left_type, right_type)

Get the appropriate Reacter for a structure pair.

select_ports
select_ports(left, right, left_ports, right_ports, ctx)

Select which ports to connect.

Parameters:

Name Type Description Default
left Atomistic

Left Atomistic structure.

required
right Atomistic

Right Atomistic structure.

required
left_ports Mapping[str, list[Atom]]

Available ports on left (name -> list[Atom]).

required
right_ports Mapping[str, list[Atom]]

Available ports on right (name -> list[Atom]).

required
ctx ConnectorContext

Context with step info and labels.

required

Returns:

Type Description
tuple[str, int, str, int, None]

(left_port_name, left_idx, right_port_name, right_idx, None)

ConnectorContext

Bases: dict[str, Any]

Shared context passed to the connector during linear build.

Keys: - step: int (current connection step index) - left_label: str (label of left monomer) - right_label: str (label of right monomer) - sequence: list[str] (full sequence being built)

CovalentSeparator

CovalentSeparator(buffer=0.0)

Separator based on typical bond lengths (for bonded atoms).

Uses realistic bond lengths based on element types. Typical bond lengths: - C-C: 1.54 Å (single), 1.34 Å (double) - C-O: 1.43 Å (single), 1.23 Å (double) - C-N: 1.47 Å (single) - O-H: 0.96 Å - N-H: 1.01 Å

Initialize covalent separator.

Parameters:

Name Type Description Default
buffer float

Additional buffer distance in Angstroms (default: 0.0) Can be negative to account for slight compression

0.0
get_separation
get_separation(left_struct, right_struct, left_port, right_port)

Calculate separation based on typical bond lengths.

Parameters:

Name Type Description Default
left_struct Atomistic

Previous structure in sequence

required
right_struct Atomistic

Next structure to place

required
left_port Atom

Connection port on left structure

required
right_port Atom

Connection port on right structure

required

Returns:

Type Description
float

Separation distance = typical_bond_length + buffer

DPDistribution

Bases: Protocol

Protocol for distributions that sample degree of polymerization directly.

Distributions implementing this protocol can sample DP values without requiring monomer mass information. This is suitable for distributions defined in DP space (e.g., Poisson, Uniform).

dp_pmf
dp_pmf(dp_array)

Probability mass function for DP values.

Parameters:

Name Type Description Default
dp_array ndarray

Array of DP values

required

Returns:

Type Description
ndarray

Array of probability mass values

sample_dp
sample_dp(rng)

Sample degree of polymerization from distribution.

Parameters:

Name Type Description Default
rng Generator

NumPy random number generator

required

Returns:

Type Description
int

Degree of polymerization (>= 1)

FlorySchulzPolydisperse

FlorySchulzPolydisperse(a, random_seed=None)

Flory-Schulz (geometric) distribution for degree of polymerization.

PMF: P(N = k) = a^2 * k * (1 - a)^(k-1), k = 1, 2, ...

Parameters:

Name Type Description Default
a float

Probability parameter (0 < a < 1), related to extent of reaction.

required
random_seed int | None

Optional random seed.

None
dp_pmf
dp_pmf(dp_array)

Flory-Schulz PMF.

sample_dp
sample_dp(rng)

Sample DP from Flory-Schulz distribution (>= 1).

GrowthKernel

Bases: Protocol

Protocol for local transition function in port-level stochastic growth.

A GrowthKernel decides which monomer (if any) to add next for a given reactive port on the growing polymer. This encapsulates the reaction probability logic from G-BigSMILES notation.

choose_next_for_port
choose_next_for_port(polymer, port, candidates, rng=None)

Choose next monomer for a given port.

Parameters:

Name Type Description Default
polymer Atomistic

Current polymer structure

required
port Atom

Port to extend from

required
candidates Sequence[MonomerTemplate]

Available monomer templates

required
rng Generator | None

Random number generator for sampling

None

Returns:

Name Type Description
MonomerPlacement MonomerPlacement | None

Add this template at target port

None MonomerPlacement | None

Terminate this port (implicit end-group)

LinearOrienter

Orienter for linear polymer arrangement.

Aligns the next monomer so that: 1. The two port atoms are separated by the specified distance 2. The port connection axis of the next monomer aligns with the port connection axis of the previous monomer 3. The monomer extends in a linear fashion

get_orientation
get_orientation(left_struct, right_struct, left_port, right_port, separation)

Calculate linear alignment transformation.

Strategy: 1. Get direction vector from left port anchor (outward) 2. Place right structure so its port anchor is at the target position 3. Align right structure's port direction with left port direction

Parameters:

Name Type Description Default
left_struct Atomistic

Previous structure in sequence

required
right_struct Atomistic

Next structure to place

required
left_port Atom

Connection port on left structure

required
right_port Atom

Connection port on right structure

required
separation float

Distance between port anchors

required

Returns:

Type Description
tuple[ndarray, ndarray]

Tuple of (translation_vector, rotation_matrix)

MassDistribution

Bases: Protocol

Protocol for distributions that sample molecular weight directly.

Distributions implementing this protocol sample mass values directly from the distribution without converting through DP. This is suitable for distributions defined in mass space (e.g., Schulz-Zimm).

mass_pdf
mass_pdf(mass_array)

Probability density function for mass values.

Parameters:

Name Type Description Default
mass_array ndarray

Array of mass values (g/mol)

required

Returns:

Type Description
ndarray

Array of probability density values

sample_mass
sample_mass(rng)

Sample molecular weight from distribution.

Parameters:

Name Type Description Default
rng Generator

NumPy random number generator

required

Returns:

Type Description
float

Molecular weight (g/mol, > 0)

MonomerPlacement dataclass

MonomerPlacement(template, target_descriptor_id)

Decision for next monomer placement during stochastic growth.

Represents the output of a GrowthKernel's decision: which template to add and which port on that template to connect.

Attributes:

Name Type Description
template MonomerTemplate

MonomerTemplate to add

target_descriptor_id int

Which port descriptor on the new monomer to connect

Example

placement = MonomerPlacement( ... template=eo_template, ... target_descriptor_id=1 # Connect via port descriptor 1 ... ) print(f"Add {placement.template.label} at port {placement.target_descriptor_id}")

MonomerTemplate dataclass

MonomerTemplate(label, structure, port_descriptors, mass, metadata=dict())

Template for a monomer with port descriptors and metadata.

This represents a monomer type that can be instantiated multiple times during stochastic growth. Each instantiation creates a fresh copy of the structure.

Attributes:

Name Type Description
label str

Monomer label (e.g., "EO2", "PS")

structure Atomistic

Base Atomistic structure (will be copied on instantiation)

port_descriptors dict[int, PortDescriptor]

Mapping from descriptor_id to PortDescriptor

mass float

Molecular weight (g/mol)

metadata dict[str, Any]

Additional metadata (optional)

Example

template = MonomerTemplate( ... label="EO", ... structure=eo_monomer, ... port_descriptors={ ... 0: PortDescriptor(0, "<", role="left"), ... 1: PortDescriptor(1, ">", role="right"), ... }, ... mass=44.05, ... ) fresh_copy = template.instantiate() print(f"Template: {template.label}, mass={template.mass} g/mol")

get_all_descriptors
get_all_descriptors()

Get all port descriptors for this template.

Returns:

Type Description
list[PortDescriptor]

List of all PortDescriptor objects sorted by descriptor_id

Example

template = MonomerTemplate(...) descriptors = template.get_all_descriptors() for desc in descriptors: ... print(f"Port {desc.descriptor_id}: {desc.port_name}")

get_port_by_descriptor
get_port_by_descriptor(descriptor_id)

Get port descriptor for a specific descriptor ID.

Parameters:

Name Type Description Default
descriptor_id int

Descriptor ID to look up

required

Returns:

Type Description
PortDescriptor | None

PortDescriptor if found, None otherwise

Example

template = MonomerTemplate(...) left_port = template.get_port_by_descriptor(0) if left_port: ... print(f"Port: {left_port.port_name}, role: {left_port.role}")

instantiate
instantiate()

Create a fresh copy of the structure.

Each instantiation is independent with separate atoms and bonds, allowing the same template to be used multiple times in a polymer.

Returns:

Type Description
Atomistic

New Atomistic instance with independent atoms and bonds

Example

template = MonomerTemplate(label="EO", structure=eo_monomer, ...) copy1 = template.instantiate() copy2 = template.instantiate() copy1 is not copy2 # Different objects True

Placer

Placer(separator, orienter)

Combined placer for positioning structures during assembly.

Uses a Separator to determine distance and an Orienter to determine orientation.

Initialize placer.

Parameters:

Name Type Description Default
separator Separator

Separator for calculating distance

required
orienter LinearOrienter

Orienter for calculating orientation

required
place_monomer
place_monomer(left_struct, right_struct, left_port, right_port)

Position right_struct relative to left_struct.

Modifies right_struct's atomic coordinates in-place.

Parameters:

Name Type Description Default
left_struct Atomistic

Previous structure in sequence

required
right_struct Atomistic

Next structure to place

required
left_port Atom

Connection port on left structure

required
right_port Atom

Connection port on right structure

required

PlanSystem dataclass

PlanSystem(random_seed=None)

Bases: Tool

Plan a polydisperse polymer system from distribution parameters.

Returns chain specifications (DP, monomer sequence, mass) without creating any atoms. Use this to validate distribution parameters before committing to an expensive build.

Preferred for
  • Previewing system composition before building.
  • Iterating on distribution parameters cheaply.
Avoid when
  • You want chains built directly (use BuildSystem or polymer_system).

Attributes:

Name Type Description
random_seed int | None

Random seed for reproducibility.

run
run(monomer_weights, monomer_mass, distribution_type, distribution_params, target_total_mass, end_group_mass=0.0, max_rel_error=0.02)

Plan a polydisperse polymer system.

Parameters:

Name Type Description Default
monomer_weights dict[str, float]

Weight fractions for each monomer label.

required
monomer_mass dict[str, float]

Molar mass (g/mol) per monomer label.

required
distribution_type str

Distribution name (e.g. "schulz_zimm").

required
distribution_params dict[str, float]

Distribution parameters as {"p0": ..., "p1": ...}.

required
target_total_mass float

Target total system mass (g/mol).

required
end_group_mass float

Mass of end groups per chain (g/mol).

0.0
max_rel_error float

Maximum relative error for total mass.

0.02

Returns:

Type Description
dict[str, Any]

Dict with "chains" (list of chain dicts), "total_mass",

dict[str, Any]

and "target_mass".

PoissonPolydisperse

PoissonPolydisperse(lambda_param, random_seed=None)

Poisson distribution for the degree of polymerization (DP).

Zero-truncated: sampled k=0 is mapped to k=1.

Parameters:

Name Type Description Default
lambda_param float

Mean of the Poisson distribution (> 0).

required
random_seed int | None

Optional random seed.

None
dp_pmf
dp_pmf(dp_array)

Zero-truncated Poisson PMF.

sample_dp
sample_dp(rng)

Sample DP from zero-truncated Poisson distribution (>= 1).

PolydisperseChainGenerator

PolydisperseChainGenerator(seq_generator, monomer_mass, end_group_mass=0.0, distribution=None)

Middle layer: Chain-level generator.

Responsible for: - Sampling chain size: - Either in DP-space via a DPDistribution (sample_dp) - Or in mass-space via a MassDistribution (sample_mass) - Using a SequenceGenerator to build the chain sequence - Computing the mass of a chain using monomer mass table and optional end-group mass

Does NOT know anything about total system mass. Only returns one chain at a time.

Initialize polydisperse chain generator.

Parameters:

Name Type Description Default
seq_generator SequenceGenerator

Sequence generator for generating monomer sequences

required
monomer_mass dict[str, float]

Dictionary mapping monomer identifiers to their masses (g/mol)

required
end_group_mass float

Mass of end groups (g/mol), default 0.0

0.0
distribution DPDistribution | MassDistribution | None

Distribution implementing DPDistribution or MassDistribution protocol

None
build_chain
build_chain(rng)

Sample DP, generate monomer sequence, and compute mass.

Parameters:

Name Type Description Default
rng Generator

np.random.Generator number generator

required

Returns:

Type Description
Chain

Chain object with dp, monomers, and mass

sample_dp
sample_dp(rng)

Sample a degree of polymerization from the distribution.

Parameters:

Name Type Description Default
rng Generator

np.random.Generator number generator

required

Returns:

Type Description
int

Degree of polymerization (>= 1)

sample_mass
sample_mass(rng)

Sample a target chain mass from a mass-based distribution.

Parameters:

Name Type Description Default
rng Generator

np.random.Generator number generator

required

Returns:

Type Description
float

Target chain mass in g/mol (>= 0)

PolymerBuildResult dataclass

PolymerBuildResult(polymer, connection_history=list(), total_steps=0)

Result of building a polymer.

PolymerBuilder

PolymerBuilder(library, connector, typifier=None, placer=None)

Build polymers from CGSmiles notation with support for arbitrary topologies.

This builder parses CGSmiles strings and constructs polymers using a graph-based approach, supporting: - Linear chains: {[#A][#B][#C]} - Branched structures: {[#A]([#B])[#C]} - Cyclic structures: {[#A]1[#B][#C]1} - Repeat operators: {[#A]|10}

Example

builder = PolymerBuilder( ... library={"EO2": eo2_monomer, "PS": ps_monomer}, ... connector=connector, ... typifier=typifier, ... ) result = builder.build("{[#EO2]|8[#PS]}")

Initialize the polymer builder.

Parameters:

Name Type Description Default
library Mapping[str, Atomistic]

Mapping from CGSmiles labels to Atomistic monomer structures

required
connector Connector

Connector for port selection and chemical reactions

required
typifier TypifierBase | None

Optional typifier for automatic retypification

None
placer Placer | None

Optional Placer for positioning structures before connection

None
build
build(cgsmiles)

Build a polymer from a CGSmiles string.

Parameters:

Name Type Description Default
cgsmiles str

CGSmiles notation string (e.g., "{[#EO2]|8[#PS]}")

required

Returns:

Type Description
PolymerBuildResult

PolymerBuildResult containing the assembled polymer and metadata

Raises:

Type Description
ValueError

If CGSmiles is invalid

SequenceError

If labels in CGSmiles are not found in library

PortDescriptor dataclass

PortDescriptor(descriptor_id, port_name, role=None, bond_kind=None, compat=None)

Descriptor for a reactive port on a monomer template.

Port descriptors identify ports with unique IDs and store metadata about port behavior (role, bond type, compatibility).

Attributes:

Name Type Description
descriptor_id int

Unique ID within template (e.g., 0, 1, 2)

port_name str

Port name on atom (e.g., "<", ">", "branch")

role str | None

Port role (e.g., "left", "right", "branch")

bond_kind str | None

Bond type (e.g., "-", "=", "#")

compat set[str] | None

Compatibility set for port matching

Example

desc = PortDescriptor( ... descriptor_id=0, ... port_name="<", ... role="left", ... bond_kind="-", ... compat={"donor"} ... ) print(f"Descriptor {desc.descriptor_id}: port '{desc.port_name}' ({desc.role})")

PrepareMonomer dataclass

PrepareMonomer(add_hydrogens=True, optimize=True, gen_topology=True)

Bases: Tool

Parse a BigSMILES monomer string and produce an Atomistic structure.

Pipeline: parse BigSMILES → convert to Atomistic with port markers → generate 3D coordinates via RDKit (if available) → compute angles/dihedrals.

Preferred for
  • Preparing monomers for BuildPolymer or polymer().
  • One-step SMILES-to-3D when you need port annotations.
Avoid when
  • You already have an Atomistic struct (use RDKit adapter directly).
  • You need custom 3D embedding parameters (use Generate3D).

Attributes:

Name Type Description
add_hydrogens bool

Add explicit hydrogens during 3D generation.

optimize bool

Optimize geometry after 3D embedding.

gen_topology bool

Compute angles and dihedrals.

run
run(smiles)

Prepare a monomer from a BigSMILES string.

Parameters:

Name Type Description Default
smiles str

BigSMILES string (e.g. "{[<]CCOCC[>]}").

required

Returns:

Type Description
Atomistic

Atomistic structure with ports marked and optional 3D coordinates.

ProbabilityTableKernel

ProbabilityTableKernel(probability_tables, end_group_templates=None)

GrowthKernel based on G-BigSMILES probability tables.

This kernel uses pre-computed probability tables that map each port descriptor to weighted choices over (template, target_descriptor_id) pairs. Weights are integers that are normalized to probabilities during sampling.

Initialize probability table kernel.

Parameters:

Name Type Description Default
probability_tables dict[int, list[tuple[MonomerTemplate, int, int]]]

Maps descriptor_id -> [(template, target_desc, integer_weight)] Integer weights are normalized to probabilities during sampling.

required
end_group_templates dict[int, MonomerTemplate] | None

Maps descriptor_id -> end-group template (no ports)

None
choose_next_for_port
choose_next_for_port(polymer, port, candidates, rng=None)

Choose next monomer based on probability table.

Parameters:

Name Type Description Default
polymer Atomistic

Current polymer structure

required
port Atom

Port to extend from

required
candidates Sequence[MonomerTemplate]

Available monomer templates

required
rng Generator | None

Random number generator (uses default if None)

None

Returns:

Type Description
MonomerPlacement | None

MonomerPlacement or None (terminate)

SchulzZimmPolydisperse

SchulzZimmPolydisperse(Mn, Mw, random_seed=None)

Schulz-Zimm molecular weight distribution for polydisperse polymer chains.

Implements :class:MassDistribution - sampling is done directly in molecular-weight space.

The probability density is:

.. math::

f(M) = \frac{z^{z+1}}{\Gamma(z+1)}
       \frac{M^{z-1}}{M_n^{z}}
       \exp\left(-\frac{z M}{M_n}\right),

where z = Mn / (Mw - Mn). This is equivalent to a Gamma distribution with shape z and scale theta = Mw - Mn.

Parameters:

Name Type Description Default
Mn float

Number-average molecular weight (g/mol).

required
Mw float

Weight-average molecular weight (g/mol), must satisfy Mw > Mn.

required
random_seed int | None

Optional random seed.

None
mass_pdf
mass_pdf(mass_array)

Probability density function for mass values.

sample_mass
sample_mass(rng)

Sample molecular weight from Schulz-Zimm (Gamma) distribution.

SequenceGenerator

Bases: Protocol

Protocol for sequence generators.

A sequence generator controls how monomers are arranged in a single chain.

expected_composition
expected_composition()

Return expected long-chain monomer fractions.

Returns:

Type Description
dict[str, float]

Dictionary mapping monomer identifiers to expected fractions

generate_sequence
generate_sequence(dp, rng)

Generate a monomer sequence of specified degree of polymerization.

Parameters:

Name Type Description Default
dp int

Degree of polymerization (number of monomers)

required
rng Generator

numpy random Generator

required

Returns:

Type Description
list[str]

List of monomer identifiers (strings)

StochasticChain dataclass

StochasticChain(polymer, dp, mass, growth_history=list())

Result of stochastic BFS growth.

Contains the assembled polymer structure along with metadata about the growth process.

Attributes:

Name Type Description
polymer Atomistic

The assembled Atomistic structure

dp int

Degree of polymerization (number of monomers added)

mass float

Total molecular weight (g/mol)

growth_history list[dict[str, Any]]

Metadata for each monomer addition step

Example

chain = StochasticChain( ... polymer=final_structure, ... dp=25, ... mass=1101.25, ... growth_history=[...] ... ) print(f"Built polymer: DP={chain.dp}, mass={chain.mass:.1f} g/mol")

SystemPlan dataclass

SystemPlan(chains, total_mass, target_mass)

Represents a complete system plan with all chains.

Attributes:

Name Type Description
chains list[Chain]

List of all chains in the system

total_mass float

Total mass of all chains (g/mol)

target_mass float

Target total mass that was requested (g/mol)

SystemPlanner

SystemPlanner(chain_generator, target_total_mass, max_rel_error=0.02, max_chains=None, enable_trimming=True)

Top layer: System-level planner.

Responsible for: - Enforcing a target total mass for the overall system - Iteratively requesting chains from PolydisperseChainGenerator - Maintaining a running sum of total mass - Stopping when mass reaches target window, and optionally trimming the final chain

Does NOT micromanage sequence probabilities or DP distribution; only orchestrates at the ensemble level.

Initialize system planner.

Parameters:

Name Type Description Default
chain_generator PolydisperseChainGenerator

Chain generator for building chains

required
target_total_mass float

Target total system mass (g/mol)

required
max_rel_error float

Maximum relative error allowed (default 0.02 = 2%)

0.02
max_chains int | None

Maximum number of chains to generate (None = no limit)

None
enable_trimming bool

Whether to enable chain trimming to better hit target mass

True
plan_system
plan_system(rng)

Repeatedly ask chain_generator for new chains until accumulated mass reaches target_total_mass within max_rel_error.

Parameters:

Name Type Description Default
rng Generator

np.random.Generator number generator

required

Returns:

Type Description
SystemPlan

SystemPlan with all chains and total mass

UniformPolydisperse

UniformPolydisperse(min_dp, max_dp, random_seed=None)

Uniform distribution over degree of polymerization (DP).

All integer DP values between min_dp and max_dp (inclusive) are equally likely.

Parameters:

Name Type Description Default
min_dp int

Lower bound (>= 1).

required
max_dp int

Upper bound (>= min_dp).

required
random_seed int | None

Optional random seed.

None
dp_pmf
dp_pmf(dp_array)

PMF: equal probability for all integer DP in [min_dp, max_dp].

sample_dp
sample_dp(rng)

Sample DP uniformly from [min_dp, max_dp].

VdWSeparator

VdWSeparator(buffer=0.0)

Separator based on van der Waals radii.

Calculates separation as sum of VdW radii of the two port anchor atoms, plus an optional buffer distance.

NOTE: VdW radii are designed for non-bonded contacts (~3-4 Å). For bonded atoms, use CovalentSeparator instead.

Initialize VdW separator.

Parameters:

Name Type Description Default
buffer float

Additional buffer distance in Angstroms (default: 0.0)

0.0
get_separation
get_separation(left_struct, right_struct, left_port, right_port)

Calculate separation based on VdW radii.

Parameters:

Name Type Description Default
left_struct Atomistic

Previous structure in sequence

required
right_struct Atomistic

Next structure to place

required
left_port Atom

Connection port on left structure

required
right_port Atom

Connection port on right structure

required

Returns:

Type Description
float

Separation distance = vdw_left + vdw_right + buffer

WeightedSequenceGenerator

WeightedSequenceGenerator(monomer_weights)

Sequence generator based on monomer weights/proportions.

Each selection is independent (no memory of previous selections).

expected_composition
expected_composition()

Return expected long-chain monomer fractions.

generate_sequence
generate_sequence(dp, rng)

Generate a sequence of specified degree of polymerization.

Parameters:

Name Type Description Default
dp int

Degree of polymerization (number of monomers)

required
rng Generator

numpy random Generator

required

Returns:

Type Description
list[str]

List of monomer identifiers

generate_3d

generate_3d(mol, add_hydrogens=True, optimize=True)

Generate 3D coordinates for a molecular structure via RDKit.

Thin re-export of :func:molpy.adapter.rdkit.generate_3d for use inside polymer-building workflows.

Parameters:

Name Type Description Default
mol Atomistic

Atomistic structure (typically from parser.parse_molecule)

required
add_hydrogens bool

Add implicit hydrogens before embedding

True
optimize bool

Run force-field geometry optimization after embedding

True

Returns:

Type Description
Atomistic

New Atomistic with 3D coordinates and (optionally) explicit hydrogens

Raises:

Type Description
ImportError

if RDKit is not installed

polymer

polymer(spec, *, library=None, reaction_preset='dehydration', use_placer=True, add_hydrogens=True, optimize=True, random_seed=None, backend='default', amber_config=None)

Build a single polymer chain from a string specification.

Auto-detects notation type (for the default backend):

  • G-BigSMILES (contains | annotation): polymer("{[<]CCOCC[>]}|10|")
  • CGSmiles + inline fragments (contains .{#): polymer("{[#EO]|10}.{#EO=[<]COC[>]}")
  • Pure CGSmiles (requires library kwarg): polymer("{[#EO]|10}", library={"EO": eo_monomer})

For the Amber backend:

  • polymer("{[#EO]|10}", library={"EO": eo}, backend="amber")

Parameters:

Name Type Description Default
spec str

Polymer specification string.

required
library Mapping[str, Atomistic] | None

Monomer library (required for pure CGSmiles and Amber).

None
reaction_preset str

Reaction preset name.

'dehydration'
use_placer bool

Enable geometric placement (default backend only).

True
add_hydrogens bool

Add hydrogens during 3D generation.

True
optimize bool

Optimize geometry.

True
random_seed int | None

Random seed for reproducibility.

None
backend Backend

Builder backend — "default" or "amber".

'default'
amber_config Any

Optional AmberPolymerBuilderConfig for fine-grained control of the Amber backend. When None, defaults are used.

None

Returns:

Type Description
Atomistic | Any

Atomistic (default backend) or AmberBuildResult (amber backend).

polymer_system

polymer_system(spec, *, reaction_preset='dehydration', add_hydrogens=True, optimize=True, random_seed=None)

Build a multi-chain polymer system from G-BigSMILES.

Example::

chains = polymer_system(
    "{[<]CCOCC[>]}|schulz_zimm(1500,3000)||5e5|",
    random_seed=42,
)

Parameters:

Name Type Description Default
spec str

G-BigSMILES specification string.

required
reaction_preset str

Reaction preset name.

'dehydration'
add_hydrogens bool

Add hydrogens during 3D generation.

True
optimize bool

Optimize geometry.

True
random_seed int | None

Random seed for reproducibility.

None

Returns:

Type Description
list[Atomistic]

List of Atomistic structures (one per chain).

prepare_monomer

prepare_monomer(bigsmiles, typifier=None, *, add_hydrogens=True, optimize=True, gen_angle=True, gen_dihe=True)

Parse, embed in 3D, augment topology, and optionally typify a monomer.

Bundles the four-step pattern that appears in every polymer-building workflow::

m = mp.parser.parse_monomer(bigsmiles)
m = generate_3d(m, add_hydrogens=True, optimize=True)
m = m.get_topo(gen_angle=True, gen_dihe=True)
m = typifier.typify(m)

Parameters:

Name Type Description Default
bigsmiles str

BigSMILES string (e.g. "{[][<]OCCOCCOCCO[>][]}").

required
typifier

Optional typifier instance (e.g. OplsAtomisticTypifier). When provided, force-field types are assigned before returning.

None
add_hydrogens bool

Add implicit hydrogens during 3D generation.

True
optimize bool

Run force-field geometry optimisation after embedding.

True
gen_angle bool

Generate angle interactions from bonds.

True
gen_dihe bool

Generate dihedral interactions from bonds.

True

Returns:

Type Description
Atomistic

Fully prepared Atomistic monomer ready for reactions or export.

Polymer DSL tools

High-level polymer-building tools and entry functions (PrepareMonomer, BuildPolymer, PlanSystem, BuildSystem, BuildPolymerAmber, polymer, polymer_system, prepare_monomer, generate_3d).

dsl

Polymer building tools.

Tools that wrap the parser, adapter, builder, and reacter modules into single-call workflows for common polymer construction tasks.

Tools (auto-registered in ToolRegistry): - PrepareMonomer — BigSMILES → 3D Atomistic with ports - BuildPolymer — CGSmiles + library → assembled chain - PlanSystem — distribution parameters → chain plan (no atoms) - BuildSystem — G-BigSMILES → list of built chains

Convenience functions: - polymer() — auto-detect notation, build single chain - polymer_system() — G-BigSMILES → multi-chain system

BuildPolymer dataclass

BuildPolymer(reaction_preset='dehydration', use_placer=True)

Bases: Tool

Build a polymer chain from CGSmiles notation and a monomer library.

Preferred for
  • Assembling a single chain from pre-prepared monomers.
  • Iterating over a system plan to build chains one at a time.
Avoid when
  • You want end-to-end build from a string (use polymer() or BuildSystem).
  • You need custom reaction logic (use PolymerBuilder directly).

Attributes:

Name Type Description
reaction_preset str

Name of reaction preset (default "dehydration").

use_placer bool

Enable geometric placement of monomers.

run
run(cgsmiles, library)

Build a polymer chain.

Parameters:

Name Type Description Default
cgsmiles str

CGSmiles notation (e.g. "{[#EO]|10}").

required
library dict[str, Atomistic]

Mapping from label to prepared Atomistic monomer.

required

Returns:

Type Description
dict[str, Any]

Dict with "polymer" (Atomistic), "total_steps" (int),

dict[str, Any]

and "connection_history" (list).

BuildPolymerAmber dataclass

BuildPolymerAmber(reaction_preset='dehydration', force_field='gaff2', charge_method='bcc', conda_env=None, work_dir='amber_work')

Bases: Tool

Build a polymer chain using the AmberTools backend.

Uses antechamber, parmchk2, prepgen, and tleap to assemble a polymer from a CGSmiles string and a monomer library. Returns both MolPy structures and AMBER topology/coordinate files.

Preferred for
  • Polymer systems that need AMBER force field parameters (GAFF/GAFF2).
  • Workflows that feed into AMBER or LAMMPS with AMBER-style inputs.
Avoid when
  • You do not need force field parameters (use BuildPolymer).
  • AmberTools is not installed.

Attributes:

Name Type Description
reaction_preset str | None

Named preset for leaving group detection. When None, hydrogen atoms bonded to port atoms are auto-detected.

force_field str

Amber force field ("gaff" or "gaff2").

charge_method str

Antechamber charge method.

conda_env str | None

Conda environment containing AmberTools.

work_dir str

Directory for intermediate files.

run
run(cgsmiles, library)

Build a polymer using AmberTools.

Parameters:

Name Type Description Default
cgsmiles str

CGSmiles notation (e.g. "{[#EO]|10}").

required
library dict[str, Atomistic]

Mapping from label to prepared Atomistic monomer. Each monomer must have port="<" (head) and port=">" (tail) annotations.

required

Returns:

Type Description
dict[str, Any]

Dict with "frame", "forcefield", "prmtop_path",

dict[str, Any]

"inpcrd_path", "pdb_path", "monomer_count".

BuildSystem dataclass

BuildSystem(reaction_preset='dehydration', add_hydrogens=True, optimize=True, random_seed=None)

Bases: Tool

End-to-end polymer system construction from G-BigSMILES.

Parses a G-BigSMILES string and delegates to the GBigSmilesCompiler to produce a list of Atomistic chains.

Preferred for
  • Building a complete polydisperse system in one call.
  • When you do not need to inspect the system plan before building.
Avoid when
  • You need to inspect or modify the plan first (use PlanSystem + BuildPolymer).
  • You need the Amber backend (use BuildPolymerAmber).

Attributes:

Name Type Description
reaction_preset str

Name of reaction preset.

add_hydrogens bool

Add explicit hydrogens during monomer preparation.

optimize bool

Optimize monomer geometry.

random_seed int | None

Random seed for reproducibility.

run
run(gbigsmiles)

Build a polymer system from a G-BigSMILES string.

Parameters:

Name Type Description Default
gbigsmiles str

G-BigSMILES notation (e.g. "{[<]CCOCC[>]}|schulz_zimm(1500,3000)||5e5|").

required

Returns:

Type Description
list[Atomistic]

List of Atomistic structures (one per chain).

PlanSystem dataclass

PlanSystem(random_seed=None)

Bases: Tool

Plan a polydisperse polymer system from distribution parameters.

Returns chain specifications (DP, monomer sequence, mass) without creating any atoms. Use this to validate distribution parameters before committing to an expensive build.

Preferred for
  • Previewing system composition before building.
  • Iterating on distribution parameters cheaply.
Avoid when
  • You want chains built directly (use BuildSystem or polymer_system).

Attributes:

Name Type Description
random_seed int | None

Random seed for reproducibility.

run
run(monomer_weights, monomer_mass, distribution_type, distribution_params, target_total_mass, end_group_mass=0.0, max_rel_error=0.02)

Plan a polydisperse polymer system.

Parameters:

Name Type Description Default
monomer_weights dict[str, float]

Weight fractions for each monomer label.

required
monomer_mass dict[str, float]

Molar mass (g/mol) per monomer label.

required
distribution_type str

Distribution name (e.g. "schulz_zimm").

required
distribution_params dict[str, float]

Distribution parameters as {"p0": ..., "p1": ...}.

required
target_total_mass float

Target total system mass (g/mol).

required
end_group_mass float

Mass of end groups per chain (g/mol).

0.0
max_rel_error float

Maximum relative error for total mass.

0.02

Returns:

Type Description
dict[str, Any]

Dict with "chains" (list of chain dicts), "total_mass",

dict[str, Any]

and "target_mass".

PrepareMonomer dataclass

PrepareMonomer(add_hydrogens=True, optimize=True, gen_topology=True)

Bases: Tool

Parse a BigSMILES monomer string and produce an Atomistic structure.

Pipeline: parse BigSMILES → convert to Atomistic with port markers → generate 3D coordinates via RDKit (if available) → compute angles/dihedrals.

Preferred for
  • Preparing monomers for BuildPolymer or polymer().
  • One-step SMILES-to-3D when you need port annotations.
Avoid when
  • You already have an Atomistic struct (use RDKit adapter directly).
  • You need custom 3D embedding parameters (use Generate3D).

Attributes:

Name Type Description
add_hydrogens bool

Add explicit hydrogens during 3D generation.

optimize bool

Optimize geometry after 3D embedding.

gen_topology bool

Compute angles and dihedrals.

run
run(smiles)

Prepare a monomer from a BigSMILES string.

Parameters:

Name Type Description Default
smiles str

BigSMILES string (e.g. "{[<]CCOCC[>]}").

required

Returns:

Type Description
Atomistic

Atomistic structure with ports marked and optional 3D coordinates.

generate_3d

generate_3d(mol, add_hydrogens=True, optimize=True)

Generate 3D coordinates for a molecular structure via RDKit.

Thin re-export of :func:molpy.adapter.rdkit.generate_3d for use inside polymer-building workflows.

Parameters:

Name Type Description Default
mol Atomistic

Atomistic structure (typically from parser.parse_molecule)

required
add_hydrogens bool

Add implicit hydrogens before embedding

True
optimize bool

Run force-field geometry optimization after embedding

True

Returns:

Type Description
Atomistic

New Atomistic with 3D coordinates and (optionally) explicit hydrogens

Raises:

Type Description
ImportError

if RDKit is not installed

polymer

polymer(spec, *, library=None, reaction_preset='dehydration', use_placer=True, add_hydrogens=True, optimize=True, random_seed=None, backend='default', amber_config=None)

Build a single polymer chain from a string specification.

Auto-detects notation type (for the default backend):

  • G-BigSMILES (contains | annotation): polymer("{[<]CCOCC[>]}|10|")
  • CGSmiles + inline fragments (contains .{#): polymer("{[#EO]|10}.{#EO=[<]COC[>]}")
  • Pure CGSmiles (requires library kwarg): polymer("{[#EO]|10}", library={"EO": eo_monomer})

For the Amber backend:

  • polymer("{[#EO]|10}", library={"EO": eo}, backend="amber")

Parameters:

Name Type Description Default
spec str

Polymer specification string.

required
library Mapping[str, Atomistic] | None

Monomer library (required for pure CGSmiles and Amber).

None
reaction_preset str

Reaction preset name.

'dehydration'
use_placer bool

Enable geometric placement (default backend only).

True
add_hydrogens bool

Add hydrogens during 3D generation.

True
optimize bool

Optimize geometry.

True
random_seed int | None

Random seed for reproducibility.

None
backend Backend

Builder backend — "default" or "amber".

'default'
amber_config Any

Optional AmberPolymerBuilderConfig for fine-grained control of the Amber backend. When None, defaults are used.

None

Returns:

Type Description
Atomistic | Any

Atomistic (default backend) or AmberBuildResult (amber backend).

polymer_system

polymer_system(spec, *, reaction_preset='dehydration', add_hydrogens=True, optimize=True, random_seed=None)

Build a multi-chain polymer system from G-BigSMILES.

Example::

chains = polymer_system(
    "{[<]CCOCC[>]}|schulz_zimm(1500,3000)||5e5|",
    random_seed=42,
)

Parameters:

Name Type Description Default
spec str

G-BigSMILES specification string.

required
reaction_preset str

Reaction preset name.

'dehydration'
add_hydrogens bool

Add hydrogens during 3D generation.

True
optimize bool

Optimize geometry.

True
random_seed int | None

Random seed for reproducibility.

None

Returns:

Type Description
list[Atomistic]

List of Atomistic structures (one per chain).

prepare_monomer

prepare_monomer(bigsmiles, typifier=None, *, add_hydrogens=True, optimize=True, gen_angle=True, gen_dihe=True)

Parse, embed in 3D, augment topology, and optionally typify a monomer.

Bundles the four-step pattern that appears in every polymer-building workflow::

m = mp.parser.parse_monomer(bigsmiles)
m = generate_3d(m, add_hydrogens=True, optimize=True)
m = m.get_topo(gen_angle=True, gen_dihe=True)
m = typifier.typify(m)

Parameters:

Name Type Description Default
bigsmiles str

BigSMILES string (e.g. "{[][<]OCCOCCOCCO[>][]}").

required
typifier

Optional typifier instance (e.g. OplsAtomisticTypifier). When provided, force-field types are assigned before returning.

None
add_hydrogens bool

Add implicit hydrogens during 3D generation.

True
optimize bool

Run force-field geometry optimisation after embedding.

True
gen_angle bool

Generate angle interactions from bonds.

True
gen_dihe bool

Generate dihedral interactions from bonds.

True

Returns:

Type Description
Atomistic

Fully prepared Atomistic monomer ready for reactions or export.

Tool framework

Tool and ToolRegistry are the internal base classes that the builder DSL tools are built on. They are not public top-level exports.

_tool

Tool framework for executable builder operations.

Provides:

  • ToolRegistry: auto-discovery registry for Tool subclasses
  • Tool: frozen-dataclass ABC for executable tools (builders, transforms)

Tool dataclass

Tool()

Bases: ABC

Base class for executable tools (builders, transforms).

Concrete subclasses are auto-registered in ToolRegistry and discovered by the MCP server. Tool is intended for molecular operations that produce or transform structures.

Usage::

@dataclass(frozen=True)
class MyTool(Tool):
    param: int = 10

    def run(self, input: str) -> dict:
        return {"result": input, "param": self.param}

tool = MyTool(param=5)
result = tool("hello")  # delegates to run()
run abstractmethod
run(*args, **kwargs)

Core tool logic. Subclasses must implement.

ToolRegistry

Auto-discovery registry for Tool subclasses.

Concrete Tool subclasses register themselves automatically via __init_subclass__. The MCP server iterates this registry to discover and expose available tools.

Usage::

for name, cls in ToolRegistry.get_all().items():
    print(f"{name}: {cls.__doc__}")
get classmethod
get(name)

Look up a Tool subclass by class name.

get_all classmethod
get_all()

Return all registered concrete Tool subclasses.