Topology-Driven Assembly with CGSmiles¶
This guide shows how CGSmiles expressions can specify linear, cyclic, and branched polymer architectures under a common builder configuration.
!!! note "Prerequisites"
This guide requires RDKit (for generate_3d), the oplsaa.xml force field, and familiarity with Stepwise Polymer Construction.
One builder, multiple architectures¶
In the stepwise guide, the reaction kernel was always the same — only the loop structure changed. CGSmiles pushes that idea further: the same builder configuration produces different architectures depending solely on the topology expression.
linear: {[#EO2]|4[#PS]}
ring: {[#EO2]1[#PS][#EO2][#PS][#EO2]1}
branch: {[#PS][#EO3]([#PS])[#PS]}
The builder does not change between these three products. Only the string changes.
Parse the topology string before you build anything¶
A CGSmiles expression encodes both the monomer labels and the connectivity pattern between them as a graph. Validating that graph before any chemistry runs costs almost nothing, and it catches label typos and structural mistakes — missing ring-closure digits, wrong branching parentheses — before they surface as cryptic errors deep inside the builder.
from molpy.parser import parse_cgsmiles
expressions = {
"linear": "{[#EO2]|4[#PS]}",
"ring": "{[#EO2]1[#PS][#EO2][#PS][#EO2]1}",
"branch": "{[#PS][#EO3]([#PS])[#PS]}",
}
for name, expr in expressions.items():
ir = parse_cgsmiles(expr)
labels = [node.label for node in ir.base_graph.nodes]
print(f"{name}: nodes={len(ir.base_graph.nodes)}, labels={labels}")
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[1], line 1 ----> 1 from molpy.parser import parse_cgsmiles 2 3 expressions = { 4 "linear": "{[#EO2]|4[#PS]}", File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/__init__.py:4 1 """MolPy — Composable molecular modeling in Python.""" 3 # Submodules ----> 4 from . import adapter, data, engine, io, legacy, parser, potential, typifier 6 # Core 7 from .core.atomistic import Angle, Atom, Atomistic, Bond, Dihedral, Improper File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/adapter/__init__.py:12 10 # Optional RDKit adapter 11 try: # pragma: no cover ---> 12 from .rdkit import MP_ID, Generate3D, OptimizeGeometry, RDKitAdapter, generate_3d 14 _HAS_RDKIT = True 15 except ModuleNotFoundError: # rdkit missing File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/adapter/rdkit.py:18 15 from rdkit import Chem 16 from rdkit.Chem import AllChem ---> 18 from molpy.core.atomistic import Atomistic 20 from .base import Adapter 22 MP_ID = "mp_id" File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/core/__init__.py:1 ----> 1 from .atomistic import Angle, Atom, Atomistic, Bond, Dihedral, Improper 2 from .box import Box 3 from .cg import Bead, CGBond, CoarseGrain File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/core/atomistic.py:232 228 """Third substituent atom.""" 229 return self.endpoints[3] --> 232 class Atomistic(Struct, MembershipMixin, SpatialMixin, ConnectivityMixin, molrs.Graph): 233 """All-atom molecular structure with full topological information. 234 235 Atomistic is the primary container for molecular systems in MolPy. It (...) 245 Atom, Bond, Angle, Dihedral, Struct, Frame 246 """ 248 def __init__(self, **props) -> None: AttributeError: module 'molrs' has no attribute 'Graph'
With the graph validated, the next step is giving each node a physical structure.
Every label in the topology string needs a molecular template¶
The parser produces a graph whose nodes carry labels like EO2, EO3, and PS. The builder resolves each label by looking it up in a library of typed Atomistic objects. If a label is present in the CGSmiles string but absent from the library, the build fails immediately. All templates use $ as the reactive port marker.
import molpy as mp
from molpy.typifier import OplsAtomisticTypifier
ff = mp.io.read_xml_forcefield("oplsaa.xml")
typifier = OplsAtomisticTypifier(ff, strict_typing=True)
BIGSMILES = {
"EO2": "{[][$]OCCO[$][]}",
"EO3": "{[][$]OCC(CO[$])(CO[$])[]}",
"PS": "{[][$]OCC(c1ccccc1)CO[$][]}",
}
def build_monomer(bigsmiles, typifier):
monomer = mp.parser.parse_monomer(bigsmiles)
monomer = mp.adapter.generate_3d(monomer, add_hydrogens=True, optimize=False)
monomer = monomer.get_topo(gen_angle=True, gen_dihe=True)
monomer = typifier.typify(monomer)
return monomer
library = {label: build_monomer(bs, typifier) for label, bs in BIGSMILES.items()}
for label, mon in library.items():
ports = [a.get("port") for a in mon.atoms if a.get("port")]
print(f"{label}: atoms={len(mon.atoms)}, ports={ports}")
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[2], line 1 ----> 1 import molpy as mp 2 from molpy.typifier import OplsAtomisticTypifier 3 4 ff = mp.io.read_xml_forcefield("oplsaa.xml") File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/__init__.py:4 1 """MolPy — Composable molecular modeling in Python.""" 3 # Submodules ----> 4 from . import adapter, data, engine, io, legacy, parser, potential, typifier 6 # Core 7 from .core.atomistic import Angle, Atom, Atomistic, Bond, Dihedral, Improper File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/adapter/__init__.py:12 10 # Optional RDKit adapter 11 try: # pragma: no cover ---> 12 from .rdkit import MP_ID, Generate3D, OptimizeGeometry, RDKitAdapter, generate_3d 14 _HAS_RDKIT = True 15 except ModuleNotFoundError: # rdkit missing File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/adapter/rdkit.py:18 15 from rdkit import Chem 16 from rdkit.Chem import AllChem ---> 18 from molpy.core.atomistic import Atomistic 20 from .base import Adapter 22 MP_ID = "mp_id" File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/core/__init__.py:1 ----> 1 from .atomistic import Angle, Atom, Atomistic, Bond, Dihedral, Improper 2 from .box import Box 3 from .cg import Bead, CGBond, CoarseGrain File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/core/atomistic.py:232 228 """Third substituent atom.""" 229 return self.endpoints[3] --> 232 class Atomistic(Struct, MembershipMixin, SpatialMixin, ConnectivityMixin, molrs.Graph): 233 """All-atom molecular structure with full topological information. 234 235 Atomistic is the primary container for molecular systems in MolPy. It (...) 245 Atom, Bond, Angle, Dihedral, Struct, Frame 246 """ 248 def __init__(self, **props) -> None: AttributeError: module 'molrs' has no attribute 'Graph'
With a complete library, the only remaining definition is the chemistry that connects one template to another.
The reaction chemistry is the same regardless of topology¶
The reaction kernel here is identical to the dehydration condensation defined in the Stepwise Polymer Construction guide. The select_hydroxyl_group function finds the -OH leaving group on the left monomer's reaction site; select_one_hydrogen picks one H from the right monomer's port atom. Together they remove -OH and -H, form a new C–O bond, and release water. None of that logic depends on whether the final product is a linear chain, a ring, or a branch — the topology is entirely the responsibility of the CGSmiles string.
??? note "Reaction setup (identical to the stepwise guide)"
The select_hydroxyl_group function finds the -OH leaving group on the left monomer's reaction site. The select_one_hydrogen function picks one H from the right monomer's port atom. Together they implement dehydration condensation: -OH + -H are removed, forming a new C-O bond and releasing water.
from molpy.core.atomistic import Atom, Atomistic
from molpy.builder.polymer import (
Connector,
CovalentSeparator,
LinearOrienter,
Placer,
PolymerBuilder,
)
from molpy.reacter import (
Reacter,
find_neighbors,
form_single_bond,
select_neighbor,
select_self,
)
def select_hydroxyl_group(struct: Atomistic, site: Atom) -> list[Atom]:
for nb in find_neighbors(struct, site):
if nb.get("element") != "O":
continue
hs = [a for a in find_neighbors(struct, nb, element="H")]
if hs:
return [nb, hs[0]]
raise ValueError("No hydroxyl group found")
def select_one_hydrogen(struct: Atomistic, site: Atom) -> list[Atom]:
hs = [a for a in find_neighbors(struct, site, element="H")]
if not hs:
raise ValueError("No hydrogen found")
return [hs[0]]
rxn = Reacter(
name="dehydration",
anchor_selector_left=select_neighbor("C"),
anchor_selector_right=select_self,
leaving_selector_left=select_hydroxyl_group,
leaving_selector_right=select_one_hydrogen,
bond_former=form_single_bond,
)
rules = {(l, r): ("$", "$") for l in library for r in library}
connector = Connector(port_map=rules, reacter=rxn)
placer = Placer(
separator=CovalentSeparator(buffer=-0.1),
orienter=LinearOrienter(),
)
builder = PolymerBuilder(
library=library,
connector=connector,
placer=placer,
typifier=typifier,
)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[3], line 1 ----> 1 from molpy.core.atomistic import Atom, Atomistic 2 from molpy.builder.polymer import ( 3 Connector, 4 CovalentSeparator, File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/__init__.py:4 1 """MolPy — Composable molecular modeling in Python.""" 3 # Submodules ----> 4 from . import adapter, data, engine, io, legacy, parser, potential, typifier 6 # Core 7 from .core.atomistic import Angle, Atom, Atomistic, Bond, Dihedral, Improper File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/adapter/__init__.py:12 10 # Optional RDKit adapter 11 try: # pragma: no cover ---> 12 from .rdkit import MP_ID, Generate3D, OptimizeGeometry, RDKitAdapter, generate_3d 14 _HAS_RDKIT = True 15 except ModuleNotFoundError: # rdkit missing File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/adapter/rdkit.py:18 15 from rdkit import Chem 16 from rdkit.Chem import AllChem ---> 18 from molpy.core.atomistic import Atomistic 20 from .base import Adapter 22 MP_ID = "mp_id" File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/core/__init__.py:1 ----> 1 from .atomistic import Angle, Atom, Atomistic, Bond, Dihedral, Improper 2 from .box import Box 3 from .cg import Bead, CGBond, CoarseGrain File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/core/atomistic.py:232 228 """Third substituent atom.""" 229 return self.endpoints[3] --> 232 class Atomistic(Struct, MembershipMixin, SpatialMixin, ConnectivityMixin, molrs.Graph): 233 """All-atom molecular structure with full topological information. 234 235 Atomistic is the primary container for molecular systems in MolPy. It (...) 245 Atom, Bond, Angle, Dihedral, Struct, Frame 246 """ 248 def __init__(self, **props) -> None: AttributeError: module 'molrs' has no attribute 'Graph'
The builder is now fully configured. The three expressions from the opening section are the only inputs that differ between the three products.
The CGSmiles string alone determines the architecture¶
Passing the three expressions to the same builder produces three structurally distinct polymers. The linear expression encodes a chain; the ring expression encodes a cycle by repeating the ring-closure digit; the branch expression encodes a trifunctional junction by using parentheses. The builder resolves each graph edge as one call to the reaction kernel and one call to the placer — it has no separate mode for rings or branches.
for name, expr in expressions.items():
result = builder.build(expr)
polymer = result.polymer
print(f"{name}: atoms={len(polymer.atoms)}, steps={result.total_steps}")
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[4], line 1 ----> 1 for name, expr in expressions.items(): 2 result = builder.build(expr) 3 polymer = result.polymer 4 print(f"{name}: atoms={len(polymer.atoms)}, steps={result.total_steps}") NameError: name 'expressions' is not defined
The same builder, the same reaction, the same library — only the CGSmiles string changes. This is the key advantage of topology-driven assembly: new architectures do not require new code. Once a product exists in memory, writing it to disk follows a single pattern regardless of topology.
Exporting each product to LAMMPS follows the same pattern¶
Each product can be exported using the same pattern.
import numpy as np
from pathlib import Path
output_dir = Path("03_output")
output_dir.mkdir(exist_ok=True)
for name, expr in expressions.items():
result = builder.build(expr)
typed_polymer = typifier.typify(result.polymer)
frame = typed_polymer.to_frame()
atoms = frame["atoms"]
if "mol_id" not in atoms:
atoms["mol_id"] = np.ones(atoms.nrows, dtype=int)
coords = np.column_stack([atoms["x"], atoms["y"], atoms["z"]])
lo = coords.min(axis=0) - 5.0
hi = coords.max(axis=0) + 5.0
frame.box = mp.Box(matrix=hi - lo, origin=lo)
mp.io.write_lammps_system(output_dir / name, frame, ff)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[5], line 7 3 4 output_dir = Path("03_output") 5 output_dir.mkdir(exist_ok=True) 6 ----> 7 for name, expr in expressions.items(): 8 result = builder.build(expr) 9 typed_polymer = typifier.typify(result.polymer) 10 frame = typed_polymer.to_frame() NameError: name 'expressions' is not defined
Troubleshooting¶
Debug in this order:
- Parse CGSmiles and verify node/bond counts first
- Confirm each label exists in the library
- Confirm connector rules exist for each reacting label pair
- Print selected site/leaving atoms if reaction fails
- Tune
CovalentSeparator(buffer=...)if geometry overlaps
See also: Stepwise Polymer Construction, Crosslinked Networks.