Complete Guide to Protein-Ligand MD Simulations with Gromacs

10.02.26 11:05 AM - By Scientiflow

Step-by-step guide on Protein-Ligand MD simulation

protein-ligand.jpg

Introduction

Protein-ligand simulation is a computational approach used to study the interaction between protein (receptor or enzyme) and small molecule i.e ligand. They are important tools in drug discovery. It helps the researchers to examine the binding efficiency of drug candidate to the target protein and study their interactions in the dynamic mode. It aims to analyze the stability of protein ligand complex.

Importance of Protein ligand simulation

It helps to identify the potential drug candidates

It provides the detailed mechanism of how the drug binds to the target

They are faster and cheaper in comparison to experimental methods like X-ray and NMR

Prerequisites for Protein-ligand simulation

Docked Complex

Optimized protein without missing atoms or residues

Protein with proper charged residues

Optimized ligand molecule

Optimized protein and ligand is docked to find the optimal binding pose and binding affinity

The protein ligand complex with the favorable binding pose and binding energy is taken to simulation to study the time-dependent behavior of protein-ligand complex

Softwares required

Gromacs

Steps in protein-ligand simulation

Data Collection

Select the best protein-ligand complex after docking

Separate the protein ligand complex through the following commands

grep UNL 3HTB.pdb > jz4.pdb

Here, UNL represents ligand molecule

jz4.pdb : ligand molecule

Additionally we need to remove the water molecules like below:

grep -v HOH 3HTB.pdb > 3HTB_clean.pdb

Removes any other atoms apart from the amino-acids. HOH represents the water molecule in this case

3HTB_clean.pdb : The docked protein alone

Protein topology creation

This step provides required information for the simulation software to calculate the forces and energies acting on each atom

gmx pdb2gmx -f 3HTB_clean.pdb -o 3HTB_processed.gro -ignh

Here, pdb2gmx : Converts input file from pdb format to gmx (gromacs format)

-f : input file name; o: output file name

-ignh: Ignores hydrogen atom in the protein and pdb2gmx adds hydrogen based on the field selected

This command will ask the user to select the forcefield as given below

1.jpg

In this case, CHARMM27 is selected (option 8) . In all-atom force field, each atom including hydrogen is represented and parameter is assigned. In contrast united atom force field will combine the non-polar hydrogens with their parent atoms.

All atom force field will offer higher accuracy while studying molecular interactions like hydrogen bond, Van der waal and electrostatic interaction. It requires high computation power and time. 

In next step, software will ask to select the water model

Water model is a mathematical representation of water molecules

It is essential for studying the protein folding and ligand binding

3.jpg

In this case,

TIP3P : Transferable Intermolecular Potential is selected

Water molecule is represented as three interaction sites

All the water models are rigid body water models where the bond angle and bond length remains unchanged during simulation

It is widely used in biomolecular simulations and is commonly used with AMBER, CHARMM and OPLS force field

TIP3P and SPC requires less computational power and is commonly used in protein-ligand simulations

Output files generated

Topology file (.top)

It contains topology of the system like atom types, charge, bond length between atoms, bond angle, dihedral angles of the atom

Gromacs file (.gro)

Gromacs compatible co-ordinate file contains the atomic co-ordinates of all protein residues and rebuilt hydrogen atoms based on the selected force field

Position restraint file (posre.itp)

It contains parameters to restrain the position of selected atoms during energy minimization

Next task.....

Ligand topology creation

Gromacs contain predefined forcefield parameters such as bond length, bond angle and torsion angle for standard amino acids, nucleotide sequences alone. Ligands with non-standard atoms requires alternate option for topology creation.

ACPYPE : Amber-99SB-ILDN Swissparam : Charmm27 CGENFF : Charmm37

Obabel conversion

Before we begin to parameterize the ligand, we need to convert the ligand file from pdb to mol2 file, with adding H atoms through open-babel. This can be achieved through your below command:

obabel jz4.pdb -O jz4.mol2 -h

Then we need to fix the bonds through running the perl script, via below command:

perl sort_mol2_bonds.pl jz4.mol2 jz4_fix.mol2

Now we need to upload this jz4_fix.mol2 file to Swiss param, which will generate the parameter files for ligand molecule, required for us to integrate with Gromacs to perform MD simulation.

  1. Simply upload our latest jz4_fix.mol2 file to Swiss Param website.
  2. Choose MMFF based approach.
  3. Press Launch Parameterize to submit to Swiss param.
  4. Download the tar file, and you will find the jz4_fix.itp and jz4_fix.pdb files inside the directory.
  5. First, copy the jz4_fix.pdb to the current working directory, where we run protein-ligand MD simulation.
  6. Open this file jz4_fix.itp, and copy the contents of [ atomtypes ] and [ pairtypes ] and paste it in a new file named jz4.prm and save them.
  7. Now again open the file jz4_fix.itp, except the content of [ atomtypes ] and [ pairtypes ], copy everything and create a new file named jz4.itp, and save it.
  8. Now transfer these two files -> jz4.prm and jz4.itp to current working directory where you run Protein-Ligand MD Simulation.

Now, open your topol.top file, and include the jz4.prm after the charmm27.ff/forcefield.itp file, as indicated below.


; Include forcefield parameters 
#include "charmm27.ff/forcefield.itp" 

; Include ligand parameters 
#include "jz4.prm" 

[ moleculetype ] 
; Name            nrexcl
Protein_chain_A     3

Similarly, include the jz4.itp file between the posre.itp and charmm27.ff/tip3p.itp file in your topol.top as indicated below:

; Include Position restraint file

#ifdef POSRES
#include "posre.itp"
#endif

; Include ligand parameters
#include "jz4.itp"
; Include water topology

#include "charmm27.ff/tip3p.itp"

One last thing, at the end of topol.top file, you will notice it contains information about the different molecules contained in the system. Since we are planning to incorporate ligand file into our system, we need to include it like below: 

[ molecules ]
; Compound        #mols
Protein_chain_A     1
LIG                1

Complex building

Now, we need to generate Gromacs format for ligand from PDB we obtained from Swiss param results. This is exactly the same way as we did for protein at the first step. This can be achieved by simply running the below command.

gmx editconf -f jz4_fix.pdb -o jz4.gro

Now we need to merge the structure of both Protein and Ligand, to make them as single complex.

To achieve this, open the Protein file in gromacs format (i.e 3HTB_processed.gro). Copy all the contents of the file, and paste it in new file named complex.gro file.

Now, open the ligand file (i.e jz4.gro), you will see below content:

Glycine aRginine prOline Methionine Alanine Cystine Serine
  22
    1LIG     C7    1   2.154  -2.723  -0.414
    1LIG     C8    2   2.203  -2.677  -0.537
    1LIG     C9    3   2.264  -2.552  -0.545
    1LIG    C10    4   2.275  -2.474  -0.430
    1LIG    C11    5   2.166  -2.645  -0.300
    1LIG    C12    6   2.227  -2.519  -0.306
    1LIG     C4    7   2.427  -2.410  -0.004
    1LIG    C13    8   2.253  -2.443  -0.179
    1LIG    C14    9   2.396  -2.472  -0.138
    1LIG    OAB   10   2.338  -2.351  -0.438
    1LIG      H   11   2.362  -2.451   0.073
    1LIG     H1   12   2.414  -2.301  -0.008
    1LIG     H2   13   2.531  -2.432   0.022
    1LIG     H3   14   2.108  -2.821  -0.408
    1LIG     H4   15   2.195  -2.739  -0.625
    1LIG     H5   16   2.304  -2.516  -0.639
    1LIG     H6   17   2.131  -2.683  -0.204
    1LIG     H7   18   2.183  -2.475  -0.101
    1LIG     H8   19   2.236  -2.336  -0.195
    1LIG     H9   20   2.412  -2.580  -0.133
    1LIG    H10   21   2.465  -2.432  -0.213
    1LIG    H11   22   2.337  -2.316  -0.348
0.00000   0.00000   0.00000

Copy the contents from 1LIG and paste everything except the last line - which indicates the size of the box.

So your final complex.gro file should look something like below:

 LYSOZYME
 2614
    1MET      N    1   0.556  -1.596  -0.893
    1MET     H1    2   0.522  -1.511  -0.855
    1MET     H2    3   0.520  -1.608  -0.986
    1MET     H3    4   0.527  -1.673  -0.836
......
......
......
......
  163ASN      C 2612   0.621  -0.740  -0.126
  163ASN    OT1 2613   0.624  -0.616  -0.140
  163ASN    OT2 2614   0.683  -0.703  -0.011
    1LIG     C7    1   2.154  -2.723  -0.414
    1LIG     C8    2   2.203  -2.677  -0.537
    1LIG     C9    3   2.264  -2.552  -0.545
    1LIG    C10    4   2.275  -2.474  -0.430
    1LIG    C11    5   2.166  -2.645  -0.300
    1LIG    C12    6   2.227  -2.519  -0.306
    1LIG     C4    7   2.427  -2.410  -0.004
    1LIG    C13    8   2.253  -2.443  -0.179
    1LIG    C14    9   2.396  -2.472  -0.138
    1LIG    OAB   10   2.338  -2.351  -0.438
    1LIG      H   11   2.362  -2.451   0.073
    1LIG     H1   12   2.414  -2.301  -0.008
    1LIG     H2   13   2.531  -2.432   0.022
    1LIG     H3   14   2.108  -2.821  -0.408
    1LIG     H4   15   2.195  -2.739  -0.625
    1LIG     H5   16   2.304  -2.516  -0.639
    1LIG     H6   17   2.131  -2.683  -0.204
    1LIG     H7   18   2.183  -2.475  -0.101
    1LIG     H8   19   2.236  -2.336  -0.195
    1LIG     H9   20   2.412  -2.580  -0.133
    1LIG    H10   21   2.465  -2.432  -0.213
    1LIG    H11   22   2.337  -2.316  -0.348
   5.99500   5.19182   9.66100   0.00000   0.00000  -2.99750   0.00000   0.00000   0.00000

One last thing before we move further. We need to update the total number of atoms present in complex.gro file through updating at the top after protein, which currently contains 2614, it should be added with 22 atoms of ligand to finally obtain 2636:

 LYSOZYME
 2636
    1MET      N    1   0.556  -1.596  -0.893
    1MET     H1    2   0.522  -1.511  -0.855
    1MET     H2    3   0.520  -1.608  -0.986
.....
.....
.....
    1LIG     H9   20   2.412  -2.580  -0.133
    1LIG    H10   21   2.465  -2.432  -0.213
    1LIG    H11   22   2.337  -2.316  -0.348
   5.99500   5.19182   9.66100   0.00000   0.00000  -2.99750   0.00000   0.00000   0.00000

Now we are done with all the process related to creating the complex.gro file. Hope you all succeed in creating it too!

Box creation and Solvent addition

Now, Create the solavtion box for the protein-ligand complex

gmx editconf -f complex.gro -o newbox.gro -c -d 1.0 -bt cubic

editconf : Used while editing the co-ordinate file

f: Input ; o: Output file name

c: Centres protein in box; -d 1.0 : Places 0.1 nm away from the edges of the protein to box.

bt : Box type : Cubic. Other options like dodecahedron or triclinic can also be explored.

The complx_box.gro can be visualized in visualization software like pymol to visualize the box

gmx solvate -cp newbox.gro -cs spc216.gro -p topol.top -o solv.gro

cp: Configuration of protein in complex_box.gro

cs: Solvent configuration

Output file solv.gro will now contain the solvent molecules and can be visualized.

The topol.top file also automatically added with the information about solvent molecules as given in the image at the end

8.jpg

Adding Ions

Ions are added to the system. Ions.mdp is obtained here

gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr

grompp: Process the co-ordinate files and topol.top to generate atomic level input (ions.tpr) in order to add ions

gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral

Explanation for above command is:

genion : tool process topol.top by removing water and adding ions

ions.tpr is generated from the previous step

pname : name of positive ion

nname: name of negative ion

The net charge of the protein is neutralized and specific ion concentration can also be added on the basis of study

The above command will prompts for what molecule to be replaced with ions, select SOL by typing the appropriate number in the prompt.

Energy Minimization

To perform energy minimization, you need to have em.mdp file downloaded from the official gromacs tutorial.

Now run the below two commands:

gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr

gmx mdrun -v -deffnm em

You will see energy minimization taking place with Gromacs.

NVT Equilibration

Restraining the ligand

gmx make_ndx -f jz4.gro -o index_jz4.ndx

gmx make_ndx -f jz4.gro -o index_jz4.ndx
 > 0 & ! a H*
 > q

gmx genrestr -f jz4.gro -n index_jz4.ndx -o posre_jz4.itp -fc 1000 1000 1000

It will prompt you for selection, choose the newly created index group - System_&_!H* - mostly it will be numbered at 3.

Now we need to include this posre_jz4.itp file in our topol.top file, closely after we have added jz4.itp as below:

; Ligand position restraints
#ifdef POSRES
#include "posre_jz4.itp"
#endif 

; Include ligand parameters
#include "jz4.itp"
; Ligand position restraints

#ifdef POSRES
#include "posre_jz4.itp"
#endif

; Include water topology
#include "charmm27.ff/tip3p.itp"

So, we added posre_jz4.itp file to be included in topol.top file as indicated above, between the ligand parameters and water topology file.

Now run the below command, it will prompt for selection like:

gmx make_ndx -f em.gro -o index.ndx
...
  0 System              : 47615 atoms
  1 Protein             :  2614 atoms
  2 Protein-H           :  1301 atoms
  3 C-alpha             :   163 atoms
  4 Backbone            :   489 atoms
  5 MainChain           :   651 atoms
  6 MainChain+Cb        :   803 atoms
  7 MainChain+H         :   813 atoms
  8 SideChain           :  1801 atoms
  9 SideChain-H         :   650 atoms
 10 Prot-Masses         :  2614 atoms
 11 non-Protein         : 45001 atoms
 12 Other               :    22 atoms
 13 LIG                 :    22 atoms
 14 CL                  :     6 atoms
 15 Water               : 44973 atoms
 16 SOL                 : 44973 atoms
 17 non-Water           :  2642 atoms
 18 Ion                 :     6 atoms
 19 Water_and_ions      : 44979 atoms


 nr : group      '!': not  'name' nr name   'splitch' nr    Enter: list groups   'a': atom       '&': and  'del' nr         'splitres' nr   'l': list residues
 't': atom type  '|': or   'keep' nr        'splitat' nr    'h': help  'r': residue              'res' nr         'chain' char
 "name": group             'case': case sensitive           'q': save and quit   'ri': residue index

When it prompts for selection like above, choose

> 1 | 13 > q

Now, download the nvt.mdp file from official gromacs. Before you can directly use this for your work, you need to update the tc-grps in your mdp file like tc-grps = Protein_LIG Water_and_ions to achieve our desired "Protein Non-Protein" effect.

This above change has to be performed in all the three mdp files, nvt.mdp, npt.mdp and md.mdp.

gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -n index.ndx -o nvt.tpr gmx mdrun -deffnm nvt -v

NPT Equilibration

From now on, running NPT equilibration is quite simple and straight forward.

You just download the npt.mdp file from official gromacs website, and modified the tc-grps as above, and execute the two below commands one-by-one.

gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -r nvt.gro -p topol.top -n index.ndx -o npt.tpr

gmx mdrun -deffnm npt

MD Simulation

gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md_0_10.tpr

gmx mdrun -deffnm md_0_10 -v

Post-trajectory Analysis

It centers the molecule of interest in the simulation box

Fix broken molecules that may have been split across the box edges due to periodic boundary conditions (PBC).

"PBC artifacts, such as molecules appearing broken or jumping across box boundaries, are corrected.

gmx trjconv -s md_0_10.tpr -f md_0_10.xtc -o md_0_10_center.xtc -center -pbc mol -ur compact

Select Protein, press enter

Select System, press enter

MD-Analysis

Protein - RMSD

RMSD is a measure used to quantify the difference between two structures, typically in molecular dynamics (MD) simulations, to assess how much a structure deviates from a reference structure over time (initial structure).

Evaluates the overall stability of the systems

Provides insight into the stability of the protein backbone and how it is affected by ligand binding

A low RMSD indicates that the protein structure is stable and remains close to its native conformation during the simulation

Focusing on the backbone reduces noise caused by flexible side chains, providing a clearer picture of the protein’s overall structural dynamics

Root Mean Square Deviation

gmx rms -s md_0_10.tpr -f md_0_10_center.xtc -n index.ndx -o protein_rmsd.xvg -tu ns

Select Backbone

again

Select Backbone

protein_rmsd.xvg is generated and is visualized using Xmgrace in linux and Qtgrace in Windows

Protein_RMSF

Root Mean Square Fluctuation

  • The RMSF values provide insight into the flexibility of each residue in the protein over the course of the simulation. Typically, loop regions in proteins tend to exhibit higher fluctuations due to their dynamic nature, whereas regions involved in secondary structures, such as α-helices and β-sheets, show lower fluctuations, indicating structural stability


gmx rmsf -s md_0_10.tpr -f md_0_10_center.xtc -o protein_rmsf.xvg -res

Select C-alpha and press enter

.xvg is generated

Radius of gyration

Rg measures the distribution of the atoms in the protein relative to its center of mass over time. A lower and stable Rg indicates that the protein maintains a compact and stable structure during the simulation, while significant fluctuations or a higher Rg value suggest the protein might be unfolding or becoming less compact

gmx gyrate -s md_0_10.tpr -f md_0_10_center.xtc -o gyrate.xvg

Select Protein and press enter

Hydrogen bond

gmx hbond -s md_0_10.tpr -f md_0_10_center.xtc -n index.ndx -num  hbond.xvg -tu ns

Select Protein and press enter

Select LIG and press enter

hbond.xvg is created for further analysis


Protein-ligand simulations are powerful tools in drug discovery, providing insights into binding interactions, stability, and dynamics at the molecular level. By leveraging these simulations, researchers can predict binding affinities, optimize lead compounds, and accelerate the development of novel therapeutics. As computational methods continue to advance, protein-ligand simulations will play an increasingly vital role in modern drug design, bridging the gap between in silico predictions and experimental validation.


Every interaction begins with a bond — in science, as in life.


Scientiflow