Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulation explodes with 2 TorchForces #98

Closed
jharrymoore opened this issue Mar 13, 2023 · 7 comments
Closed

Simulation explodes with 2 TorchForces #98

jharrymoore opened this issue Mar 13, 2023 · 7 comments
Labels
help wanted Extra attention is needed

Comments

@jharrymoore
Copy link

jharrymoore commented Mar 13, 2023

Hi,

I'm seeing simulations explode when I add two separate TorchForces for different small molecules to the system - specifically the molecule corresponding to the first force explodes in the minimiser. This seems to happen only when I use MACE models for both components - switching to the ani2x potential for one or both TorchForce works fine. The build script for the conda environment is in my fork of openmmtools https://github.com/jharrymoore/openmmtools/blob/dual_topology/mace-install.sh

...and here is the run script that should reproduce the problem

from openmm.app import PDBFile, Modeller, PME
from openmm import app
from openmm import unit
import sys
from openmmtools.openmm_torch.repex import MixedSystemConstructor
from openmm import LangevinMiddleIntegrator
from openmm.openmm import Platform

from openmmtools.openmm_torch.utils import (
    initialize_mm_forcefield,
)
import torch
from openff.toolkit.topology import Molecule
from tempfile import mkstemp
from ase.io import read
from openmmml import MLPotential
from openmmml.models.mace_potential import MacePotentialImplFactory
from openmmml.models.anipotential import ANIPotentialImplFactory
import logging

torch.set_default_dtype(torch.float64)

logger = logging.getLogger("DEBUG")

MLPotential.registerImplFactory("mace", MacePotentialImplFactory())
MLPotential.registerImplFactory("ani2x", ANIPotentialImplFactory())

mol1_file = "test_files/butane.sdf"
mol2_file = "test_files/hexane.sdf"

mol1 = Molecule.from_file(mol1_file)
mol2 = Molecule.from_file(mol2_file)

_, tmpfile = mkstemp(suffix=".xyz")
mol1._to_xyz_file(tmpfile)
atoms_mol1 = read(tmpfile)
_, tmpfile = mkstemp(suffix=".xyz")
mol2._to_xyz_file(tmpfile)
atoms_mol2 = read(tmpfile)

modeller = Modeller(mol1.to_topology().to_openmm(), mol1.conformers[0] + 1.0*unit.nanometer)
modeller_mol2 = Modeller(mol2.to_topology().to_openmm(), mol2.conformers[0])
modeller.add(modeller_mol2.topology, modeller_mol2.positions)
mol1_res = [r for r in modeller.topology.residues() if r.name == "butane"][0]
mol2_res = [r for r in modeller.topology.residues() if r.name == "hexane"][0]
print([a.index for a in mol1_res.atoms()], [a.index for a in mol2_res.atoms()])
mol1_index, mol2_index = [a.index for a in mol1_res.atoms()], [a.index for a in mol2_res.atoms()]

forcefield = initialize_mm_forcefield([mol1, mol2], forcefields=[
            "amber/protein.ff14SB.xml",
            "amber/tip3p_standard.xml",
            "amber14/DNA.OL15.xml",
        ])
modeller.addSolvent(forcefield=forcefield, padding=1.2 * unit.nanometers)


system = forcefield.createSystem(modeller.topology,
            nonbondedMethod=PME,
            nonbondedCutoff=1.0*unit.nanometer,
            constraints=None,
)

# create a mixed system from the MACE potential

system = MixedSystemConstructor(
    system=system,
    topology=modeller.topology,
    nnpify_id=mol1_res.name,
    nnpify_type="resname",
    atoms_obj=atoms_mol1,
    nnp_potential="mace",
    filename="test_files/mace-model.model",
    dtype=torch.float64,
    interpolate=False,
    nl="torch_nl_n2"
).mixed_system

system = MixedSystemConstructor(
    system=system,
    topology=modeller.topology,
    nnpify_id=mol2_res.name,
    nnpify_type="resname",
    atoms_obj=atoms_mol2,
    nnp_potential="mace",
    filename="test_files/mace-model.model",
    dtype=torch.float64,
    interpolate=False,
    nl="torch_nl_n2"
).mixed_system
        
# write system to file
with open("test_files/created_system.pdb", "w") as f:
    PDBFile.writeFile(modeller.topology, modeller.positions, f)

# create simulation
integrator = LangevinMiddleIntegrator(300 * unit.kelvin, 10 / unit.picosecond, 1 * unit.femtosecond)

simulation = app.Simulation(modeller.topology, system, integrator, platform=Platform.getPlatformByName("CPU"))

simulation.context.setPositions(modeller.positions)

# turn off the nonbonded forces between the two molecules by setting exceptions

print("Minimising system")
simulation.minimizeEnergy()
print("Minimisation complete")

# write out minimised system
with open("test_files/minimised_system.pdb", "w") as f:
    PDBFile.writeFile(modeller.topology, simulation.context.getState(getPositions=True).getPositions(), f)

simulation.reporters.append(app.PDBReporter('test_files/trajectory.pdb', 1000, enforcePeriodicBox=False))
# add stdout reporter
simulation.reporters.append(app.StateDataReporter(sys.stdout, 1000, step=True, potentialEnergy=True, temperature=True, speed=True, totalSteps=100000, remainingTime=True, separator='\t'))

simulation.step(100000)

Any insight into what might be going wrong would be greatly appreciated!

Many thanks,
Harry

test_files.zip

@peastman
Copy link
Member

When you say it explodes, what exactly does that mean? What behavior do you observe?

Is it possible the MACE model uses some kind of global state that gets incorrectly shared between the models?

@jharrymoore
Copy link
Author

jharrymoore commented Mar 13, 2023

The minimised structure of the affected molecule ends up with a very distorted geometry (notably the same geometry:
image
This results in NaN positions within a couple of steps of MD. It also seems to be deterministic - simulation minimises to exactly the same coordinates across multiple runs.

I am not sure about global states - nothing obvious comes to mind, do you have any suggestions for what might cause this?

Potentially unrelated, but I do also find that when I run multiple mixed MACE/MM simulations simultaneously on the same GPU (e.g. using openmmtool's repex with MPI), I see similar geometry distortion during minimisation and subsequent NaN positions.

@peastman
Copy link
Member

Potentially unrelated, but I do also find that when I run multiple mixed MACE/MM simulations simultaneously on the same GPU (e.g. using openmmtool's repex with MPI), I see similar geometry distortion during minimisation and subsequent NaN positions.

That does suggest some kind of state is getting shared between the two simulations. Try this experiment. First, run two independent simulations at the same time on the same GPU. Each one should be an independent script launched separately. That shouldn't allow any communication between them, so you shouldn't see the artifacts.

Now have a single Python script run two simulations at once. First create a single System object. Now use threading to create two threads. Each thread should create a Context for the System, minimize it, and simulate it. Since they're both in the same process, that does create opportunities for state to get shared between them.

@raimis raimis added the help wanted Extra attention is needed label Mar 14, 2023
@sef43
Copy link

sef43 commented Mar 14, 2023

Hi @jharrymoore

I haven't been able to run your exact script due to some versioning issues of the other packages, but using my own openmm-ml fork with mace in I have been able to reproduce this, and then maybe correct it. The problem occurs for me when the torchscripted models saved and then loaded by openmmtorch.torchForce have the same name. The ones made here:

        # Convert it to TorchScript and save it.
        module = torch.jit.script(maceforce)
        module.save(filename)

        # Create the TorchForce and add it to the System.
        force = openmmtorch.TorchForce(filename)
        force.setForceGroup(forceGroup)
        force.setUsesPeriodicBoundaryConditions(is_periodic)
        #force.setOutputsForces(True)
        system.addForce(force)

If they have different names the minimization appears to work correctly. Here is an example script with both behaviours:

To run it you will need this version of openmm-ml: https://github.com/sef43/openmm-ml/tree/mace/examples/mace

pip install git+https://github.com/sef43/openmm-ml@mace
from openmm.app import PDBFile, Modeller, PME
from openmm import app
from openmm import unit
import sys
from openmm import LangevinMiddleIntegrator
from openmm.openmm import Platform
from openff.toolkit.topology import Molecule
from ase.io import read
from tempfile import mkstemp
from openmmforcefields.generators import SMIRNOFFTemplateGenerator
from openmmml import MLPotential



for torchscript_filename1,torchscript_filename2, version in zip(['maceforce.pt', 'maceforce1.pt' ],
                                                                ['maceforce.pt', 'maceforce2.pt'], 
                                                                ['same_name', 'different_name']):
    

    mol1_file = "test_files/butane.sdf"
    mol2_file = "test_files/hexane.sdf"

    mol1 = Molecule.from_file(mol1_file)
    mol2 = Molecule.from_file(mol2_file)

    _, tmpfile = mkstemp(suffix=".xyz")
    mol1._to_xyz_file(tmpfile)
    atoms_mol1 = read(tmpfile)
    _, tmpfile = mkstemp(suffix=".xyz")
    mol2._to_xyz_file(tmpfile)
    atoms_mol2 = read(tmpfile)


    modeller = Modeller(mol1.to_topology().to_openmm(), (atoms_mol1.positions+10.0)*unit.angstrom)
    modeller_mol2 = Modeller(mol2.to_topology().to_openmm(), atoms_mol2.positions*unit.angstrom)
    modeller.add(modeller_mol2.topology, modeller_mol2.positions)
    print(modeller.topology)

    mol1_indexs=[i for i in range(0,14)]
    mol2_indexs=[i for i in range(14,34)]

    ff = app.ForceField("amber/protein.ff14SB.xml",
                "amber/tip3p_standard.xml",
                "amber14/DNA.OL15.xml",)


    smirnoff = SMIRNOFFTemplateGenerator(molecules=[mol1, mol2])
    ff.registerTemplateGenerator(smirnoff.generator)

    modeller.addSolvent(forcefield=ff, padding=1.2 * unit.nanometers)


    mm_system = ff.createSystem(modeller.topology,
                nonbondedMethod=PME,
                nonbondedCutoff=1.0*unit.nanometer,
                constraints=None,
    )

    potential = MLPotential('mace', model_path='test_files/mace-model.model')

    # model mol1 with MACE
    print("adding mol1 mace force, saving to torchscript file: ", torchscript_filename1)
    temp_system = potential.createMixedSystem(modeller.topology, mm_system, mol1_indexs, filename=torchscript_filename1)

    # model mol2 with MACE
    print("adding mol2 mace force, saving to torchscript file: ", torchscript_filename2)
    mm_ml_system = potential.createMixedSystem(modeller.topology, temp_system, mol2_indexs, filename=torchscript_filename2)


    # write system to file
    with open("test_files/created_system_"+version+".pdb", "w") as f:
        PDBFile.writeFile(modeller.topology, modeller.positions, f)

    # create simulation
    integrator = LangevinMiddleIntegrator(300 * unit.kelvin, 10 / unit.picosecond, 1 * unit.femtosecond)
    simulation = app.Simulation(modeller.topology, mm_ml_system, integrator, platform=Platform.getPlatformByName("CUDA"))
    simulation.context.setPositions(modeller.positions)

    # Comute the potential energy
    state = simulation.context.getState(getEnergy=True)
    print("initial E = ", state.getPotentialEnergy())

    print("Minimising system")
    simulation.minimizeEnergy()
    print("Minimisation complete")
    state = simulation.context.getState(getEnergy=True)
    print("Minized E = ", state.getPotentialEnergy())

    # write out minimised system
    with open("test_files/minimised_system"+version+".pdb", "w") as f:
        PDBFile.writeFile(modeller.topology, simulation.context.getState(getPositions=True).getPositions(), f)

The output is this:

<Topology; 2 chains, 2 residues, 34 atoms, 32 bonds>
adding mol1 mace force, saving to torchscript file:  maceforce.pt
Running MACEForce on device:  cuda  with dtype:  torch.float64
adding mol2 mace force, saving to torchscript file:  maceforce.pt
Running MACEForce on device:  cuda  with dtype:  torch.float64
initial E =  -9696.358828774246 kJ/mol
Minimising system
Minimisation complete
Minized E =  -66457.51997970016 kJ/mol
<Topology; 2 chains, 2 residues, 34 atoms, 32 bonds>
adding mol1 mace force, saving to torchscript file:  maceforce1.pt
Running MACEForce on device:  cuda  with dtype:  torch.float64
adding mol2 mace force, saving to torchscript file:  maceforce2.pt
Running MACEForce on device:  cuda  with dtype:  torch.float64
initial E =  -9673.335930073983 kJ/mol
Minimising system
Minimisation complete
Minized E =  -66226.91515553842 kJ/mol

And the minimized structures look like this:
Same names:

same_name

Different names:
different_name

Can you try repeating your test but with different filenames for the intermediate torchscript files?

@jharrymoore
Copy link
Author

Hi Stephen, this has indeed done the trick, thank you very much for you help!

@sef43
Copy link

sef43 commented Mar 14, 2023

Great! hopefully this should fix your other issues of running multiple simulations at the same time, I believe the openmmtorch.TorchForce loader just stores the filename of the torchscript file, in then loads the torchscript module when OpenMM makes the context, which happens when you start actually running the simulation, so when another file is saved with the same name in between calling MLPotential and starting the run the first torchscript module will have been overwritten by the second one. Hence mol1 is the one that ends up distorted.

@raimis
Copy link
Contributor

raimis commented Mar 14, 2023

The will be fixed when #97 is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants