Simulation explodes with 2 TorchForces #98

jharrymoore · 2023-03-13T19:28:42Z

Hi,

I'm seeing simulations explode when I add two separate TorchForces for different small molecules to the system - specifically the molecule corresponding to the first force explodes in the minimiser. This seems to happen only when I use MACE models for both components - switching to the ani2x potential for one or both TorchForce works fine. The build script for the conda environment is in my fork of openmmtools https://github.com/jharrymoore/openmmtools/blob/dual_topology/mace-install.sh

...and here is the run script that should reproduce the problem

from openmm.app import PDBFile, Modeller, PME
from openmm import app
from openmm import unit
import sys
from openmmtools.openmm_torch.repex import MixedSystemConstructor
from openmm import LangevinMiddleIntegrator
from openmm.openmm import Platform

from openmmtools.openmm_torch.utils import (
    initialize_mm_forcefield,
)
import torch
from openff.toolkit.topology import Molecule
from tempfile import mkstemp
from ase.io import read
from openmmml import MLPotential
from openmmml.models.mace_potential import MacePotentialImplFactory
from openmmml.models.anipotential import ANIPotentialImplFactory
import logging

torch.set_default_dtype(torch.float64)

logger = logging.getLogger("DEBUG")

MLPotential.registerImplFactory("mace", MacePotentialImplFactory())
MLPotential.registerImplFactory("ani2x", ANIPotentialImplFactory())

mol1_file = "test_files/butane.sdf"
mol2_file = "test_files/hexane.sdf"

mol1 = Molecule.from_file(mol1_file)
mol2 = Molecule.from_file(mol2_file)

_, tmpfile = mkstemp(suffix=".xyz")
mol1._to_xyz_file(tmpfile)
atoms_mol1 = read(tmpfile)
_, tmpfile = mkstemp(suffix=".xyz")
mol2._to_xyz_file(tmpfile)
atoms_mol2 = read(tmpfile)

modeller = Modeller(mol1.to_topology().to_openmm(), mol1.conformers[0] + 1.0*unit.nanometer)
modeller_mol2 = Modeller(mol2.to_topology().to_openmm(), mol2.conformers[0])
modeller.add(modeller_mol2.topology, modeller_mol2.positions)
mol1_res = [r for r in modeller.topology.residues() if r.name == "butane"][0]
mol2_res = [r for r in modeller.topology.residues() if r.name == "hexane"][0]
print([a.index for a in mol1_res.atoms()], [a.index for a in mol2_res.atoms()])
mol1_index, mol2_index = [a.index for a in mol1_res.atoms()], [a.index for a in mol2_res.atoms()]

forcefield = initialize_mm_forcefield([mol1, mol2], forcefields=[
            "amber/protein.ff14SB.xml",
            "amber/tip3p_standard.xml",
            "amber14/DNA.OL15.xml",
        ])
modeller.addSolvent(forcefield=forcefield, padding=1.2 * unit.nanometers)


system = forcefield.createSystem(modeller.topology,
            nonbondedMethod=PME,
            nonbondedCutoff=1.0*unit.nanometer,
            constraints=None,
)

# create a mixed system from the MACE potential

system = MixedSystemConstructor(
    system=system,
    topology=modeller.topology,
    nnpify_id=mol1_res.name,
    nnpify_type="resname",
    atoms_obj=atoms_mol1,
    nnp_potential="mace",
    filename="test_files/mace-model.model",
    dtype=torch.float64,
    interpolate=False,
    nl="torch_nl_n2"
).mixed_system

system = MixedSystemConstructor(
    system=system,
    topology=modeller.topology,
    nnpify_id=mol2_res.name,
    nnpify_type="resname",
    atoms_obj=atoms_mol2,
    nnp_potential="mace",
    filename="test_files/mace-model.model",
    dtype=torch.float64,
    interpolate=False,
    nl="torch_nl_n2"
).mixed_system
        
# write system to file
with open("test_files/created_system.pdb", "w") as f:
    PDBFile.writeFile(modeller.topology, modeller.positions, f)

# create simulation
integrator = LangevinMiddleIntegrator(300 * unit.kelvin, 10 / unit.picosecond, 1 * unit.femtosecond)

simulation = app.Simulation(modeller.topology, system, integrator, platform=Platform.getPlatformByName("CPU"))

simulation.context.setPositions(modeller.positions)

# turn off the nonbonded forces between the two molecules by setting exceptions

print("Minimising system")
simulation.minimizeEnergy()
print("Minimisation complete")

# write out minimised system
with open("test_files/minimised_system.pdb", "w") as f:
    PDBFile.writeFile(modeller.topology, simulation.context.getState(getPositions=True).getPositions(), f)

simulation.reporters.append(app.PDBReporter('test_files/trajectory.pdb', 1000, enforcePeriodicBox=False))
# add stdout reporter
simulation.reporters.append(app.StateDataReporter(sys.stdout, 1000, step=True, potentialEnergy=True, temperature=True, speed=True, totalSteps=100000, remainingTime=True, separator='\t'))

simulation.step(100000)

Any insight into what might be going wrong would be greatly appreciated!

Many thanks,
Harry

test_files.zip

peastman · 2023-03-13T20:27:36Z

When you say it explodes, what exactly does that mean? What behavior do you observe?

Is it possible the MACE model uses some kind of global state that gets incorrectly shared between the models?

jharrymoore · 2023-03-13T21:37:24Z

The minimised structure of the affected molecule ends up with a very distorted geometry (notably the same geometry:

This results in NaN positions within a couple of steps of MD. It also seems to be deterministic - simulation minimises to exactly the same coordinates across multiple runs.

I am not sure about global states - nothing obvious comes to mind, do you have any suggestions for what might cause this?

Potentially unrelated, but I do also find that when I run multiple mixed MACE/MM simulations simultaneously on the same GPU (e.g. using openmmtool's repex with MPI), I see similar geometry distortion during minimisation and subsequent NaN positions.

peastman · 2023-03-13T21:59:24Z

Potentially unrelated, but I do also find that when I run multiple mixed MACE/MM simulations simultaneously on the same GPU (e.g. using openmmtool's repex with MPI), I see similar geometry distortion during minimisation and subsequent NaN positions.

That does suggest some kind of state is getting shared between the two simulations. Try this experiment. First, run two independent simulations at the same time on the same GPU. Each one should be an independent script launched separately. That shouldn't allow any communication between them, so you shouldn't see the artifacts.

Now have a single Python script run two simulations at once. First create a single System object. Now use threading to create two threads. Each thread should create a Context for the System, minimize it, and simulate it. Since they're both in the same process, that does create opportunities for state to get shared between them.

sef43 · 2023-03-14T10:10:37Z

Hi @jharrymoore

I haven't been able to run your exact script due to some versioning issues of the other packages, but using my own openmm-ml fork with mace in I have been able to reproduce this, and then maybe correct it. The problem occurs for me when the torchscripted models saved and then loaded by openmmtorch.torchForce have the same name. The ones made here:

        # Convert it to TorchScript and save it.
        module = torch.jit.script(maceforce)
        module.save(filename)

        # Create the TorchForce and add it to the System.
        force = openmmtorch.TorchForce(filename)
        force.setForceGroup(forceGroup)
        force.setUsesPeriodicBoundaryConditions(is_periodic)
        #force.setOutputsForces(True)
        system.addForce(force)

If they have different names the minimization appears to work correctly. Here is an example script with both behaviours:

To run it you will need this version of openmm-ml: https://github.com/sef43/openmm-ml/tree/mace/examples/mace

pip install git+https://github.com/sef43/openmm-ml@mace

from openmm.app import PDBFile, Modeller, PME
from openmm import app
from openmm import unit
import sys
from openmm import LangevinMiddleIntegrator
from openmm.openmm import Platform
from openff.toolkit.topology import Molecule
from ase.io import read
from tempfile import mkstemp
from openmmforcefields.generators import SMIRNOFFTemplateGenerator
from openmmml import MLPotential



for torchscript_filename1,torchscript_filename2, version in zip(['maceforce.pt', 'maceforce1.pt' ],
                                                                ['maceforce.pt', 'maceforce2.pt'], 
                                                                ['same_name', 'different_name']):
    

    mol1_file = "test_files/butane.sdf"
    mol2_file = "test_files/hexane.sdf"

    mol1 = Molecule.from_file(mol1_file)
    mol2 = Molecule.from_file(mol2_file)

    _, tmpfile = mkstemp(suffix=".xyz")
    mol1._to_xyz_file(tmpfile)
    atoms_mol1 = read(tmpfile)
    _, tmpfile = mkstemp(suffix=".xyz")
    mol2._to_xyz_file(tmpfile)
    atoms_mol2 = read(tmpfile)


    modeller = Modeller(mol1.to_topology().to_openmm(), (atoms_mol1.positions+10.0)*unit.angstrom)
    modeller_mol2 = Modeller(mol2.to_topology().to_openmm(), atoms_mol2.positions*unit.angstrom)
    modeller.add(modeller_mol2.topology, modeller_mol2.positions)
    print(modeller.topology)

    mol1_indexs=[i for i in range(0,14)]
    mol2_indexs=[i for i in range(14,34)]

    ff = app.ForceField("amber/protein.ff14SB.xml",
                "amber/tip3p_standard.xml",
                "amber14/DNA.OL15.xml",)


    smirnoff = SMIRNOFFTemplateGenerator(molecules=[mol1, mol2])
    ff.registerTemplateGenerator(smirnoff.generator)

    modeller.addSolvent(forcefield=ff, padding=1.2 * unit.nanometers)


    mm_system = ff.createSystem(modeller.topology,
                nonbondedMethod=PME,
                nonbondedCutoff=1.0*unit.nanometer,
                constraints=None,
    )

    potential = MLPotential('mace', model_path='test_files/mace-model.model')

    # model mol1 with MACE
    print("adding mol1 mace force, saving to torchscript file: ", torchscript_filename1)
    temp_system = potential.createMixedSystem(modeller.topology, mm_system, mol1_indexs, filename=torchscript_filename1)

    # model mol2 with MACE
    print("adding mol2 mace force, saving to torchscript file: ", torchscript_filename2)
    mm_ml_system = potential.createMixedSystem(modeller.topology, temp_system, mol2_indexs, filename=torchscript_filename2)


    # write system to file
    with open("test_files/created_system_"+version+".pdb", "w") as f:
        PDBFile.writeFile(modeller.topology, modeller.positions, f)

    # create simulation
    integrator = LangevinMiddleIntegrator(300 * unit.kelvin, 10 / unit.picosecond, 1 * unit.femtosecond)
    simulation = app.Simulation(modeller.topology, mm_ml_system, integrator, platform=Platform.getPlatformByName("CUDA"))
    simulation.context.setPositions(modeller.positions)

    # Comute the potential energy
    state = simulation.context.getState(getEnergy=True)
    print("initial E = ", state.getPotentialEnergy())

    print("Minimising system")
    simulation.minimizeEnergy()
    print("Minimisation complete")
    state = simulation.context.getState(getEnergy=True)
    print("Minized E = ", state.getPotentialEnergy())

    # write out minimised system
    with open("test_files/minimised_system"+version+".pdb", "w") as f:
        PDBFile.writeFile(modeller.topology, simulation.context.getState(getPositions=True).getPositions(), f)

The output is this:

<Topology; 2 chains, 2 residues, 34 atoms, 32 bonds>
adding mol1 mace force, saving to torchscript file:  maceforce.pt
Running MACEForce on device:  cuda  with dtype:  torch.float64
adding mol2 mace force, saving to torchscript file:  maceforce.pt
Running MACEForce on device:  cuda  with dtype:  torch.float64
initial E =  -9696.358828774246 kJ/mol
Minimising system
Minimisation complete
Minized E =  -66457.51997970016 kJ/mol
<Topology; 2 chains, 2 residues, 34 atoms, 32 bonds>
adding mol1 mace force, saving to torchscript file:  maceforce1.pt
Running MACEForce on device:  cuda  with dtype:  torch.float64
adding mol2 mace force, saving to torchscript file:  maceforce2.pt
Running MACEForce on device:  cuda  with dtype:  torch.float64
initial E =  -9673.335930073983 kJ/mol
Minimising system
Minimisation complete
Minized E =  -66226.91515553842 kJ/mol

And the minimized structures look like this:
Same names:

Different names:

Can you try repeating your test but with different filenames for the intermediate torchscript files?

jharrymoore · 2023-03-14T11:29:58Z

Hi Stephen, this has indeed done the trick, thank you very much for you help!

sef43 · 2023-03-14T11:49:58Z

Great! hopefully this should fix your other issues of running multiple simulations at the same time, I believe the openmmtorch.TorchForce loader just stores the filename of the torchscript file, in then loads the torchscript module when OpenMM makes the context, which happens when you start actually running the simulation, so when another file is saved with the same name in between calling MLPotential and starting the run the first torchscript module will have been overwritten by the second one. Hence mol1 is the one that ends up distorted.

raimis · 2023-03-14T12:46:22Z

The will be fixed when #97 is released.

raimis added the help wanted Extra attention is needed label Mar 14, 2023

jharrymoore closed this as completed Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulation explodes with 2 TorchForces #98

Simulation explodes with 2 TorchForces #98

jharrymoore commented Mar 13, 2023 •

edited

Loading

peastman commented Mar 13, 2023

jharrymoore commented Mar 13, 2023 •

edited

Loading

peastman commented Mar 13, 2023

sef43 commented Mar 14, 2023

jharrymoore commented Mar 14, 2023

sef43 commented Mar 14, 2023 •

edited

Loading

raimis commented Mar 14, 2023

Simulation explodes with 2 TorchForces #98

Simulation explodes with 2 TorchForces #98

Comments

jharrymoore commented Mar 13, 2023 • edited Loading

peastman commented Mar 13, 2023

jharrymoore commented Mar 13, 2023 • edited Loading

peastman commented Mar 13, 2023

sef43 commented Mar 14, 2023

jharrymoore commented Mar 14, 2023

sef43 commented Mar 14, 2023 • edited Loading

raimis commented Mar 14, 2023

jharrymoore commented Mar 13, 2023 •

edited

Loading

jharrymoore commented Mar 13, 2023 •

edited

Loading

sef43 commented Mar 14, 2023 •

edited

Loading