Rescoring Poses in Rosetta

This is going to be a very Rosetta specific article, if you haven’t used it before this probably is not for you. In short: I recently had to do a lot of simulations in Rosetta, ranging from simple relaxations over protein design to comparative modelling. As the problem with simulations in Rosetta is mostly to make it run I usually do not add too much after I have done this.  Now Rosetta Script offers this nice functionality of <SimpleMetrics> where you can specify all kind of metrics that should be evaluated for the generated poses.

So in order to do this afterwards without adding additional overhead to an barely already running script we can simply write a new rescore.xml where we insert all the metrics our heart desires

<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="ref15" weights="ref2015.wts" />
</SCOREFXNS>
<RESIDUE_SELECTORS>
<Chain name="Protein" chains="A"/>
<Chain name="DNA_Strain1" chains="B"/>
<Chain name="DNA_Strain2" chains="C"/>
<Chain name="DNA" chains="B,C"/>
</RESIDUE_SELECTORS>
<TASKOPERATIONS>
</TASKOPERATIONS>
<FILTERS>
</FILTERS>
<SIMPLE_METRICS>
<TimingProfileMetric name="timing" />
<RMSDMetric name="rmsd" rmsd_type="rmsd_all" use_native="1" />
<InteractionEnergyMetric name="E_DNA"
residue_selector="Protein"
residue_selector2="DNA"
scorefxn="ref15"
/>
<TotalEnergyMetric name="total_energy" />
<PerResidueEnergyMetric name="energy_per_res" output_as_pdb_nums="true" use_native="true" />
</SIMPLE_METRICS>
<MOVERS>
<RunSimpleMetrics name="run_metrics" metrics="timing,rmsd,E_DNA,total_energy,energy_per_res" prefix="rescore_" />
</MOVERS>
<PROTOCOLS>
<Add mover_name="run_metrics" />
</PROTOCOLS>
<OUTPUT scorefxn="ref15" />
</ROSETTASCRIPTS>

Here a PDB consisting of 3 chains A,B and C is to be rescored. The Protein here was a Polymerase with DNA bound, which is
the reaseon for the different chains: The protein is chain A and the two strains of the DNA are labeled as chains B and C.
Therefore we need multiple entries in the <RESIDUE_SELECTORS> region to do things with these regions.
Initially I wanted to calculate the interface energy not only between the DNA and Protein but also between the Protein and the single DNA strains. However it seems you can’t reuse a simple metric. At least I always got parsing errors from the Rosetta XML script verification stage.
Only upon removing all but one of the identical Simple_Metrics, would my script run.
To rescore the previously generated poses we need to generate a pdb_list as input, which is simply a text file containing the path
to all the pdb poses. To run the rescoring it is convenient to write all the flags into a single file rosetta_rescore.options.

-database /rosetta_src_2020.08.61146_bundle/main/database/
-parser:protocol rescore.xml
-in:file:native best.pdb
-in:file:l pdb_list_full.dat
-out:file:scorefile rescore.sc
-out:file:score_only

Using this we can then run the rescoring directly by doing

mpirun –np $N_CORES rosetta_scripts.mpi.linuxgccrelease @rosetta_rescore.options

As I did my run on the LB2 Cluster at TU-Darmstadt, I wrote a slurm script for doing this which looks as follows.

#!/bin/bash
#SBATCH -J RosettaCM
#SBATCH -n 192
#SBATCH --time=04:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH -A project0XXXX
#SBATCH -p test24
#SBATCH -C avx512
module purge
export MODULEPATH=$MODULEPATH:/work/projects/$USERNAME/projectmodules
module load rosetta
CMD="srun rosetta_scripts.mpi.linuxgccrelease @rosetta_rescore.options"
echo $CMD
eval $CMD

And there you have it, this is nothing special however I often enough forget how to do it and therefore thought
it could be interesting to have this documented somewhere, where it eventually might also help someone else.

This entry was posted in Computational Biology. Bookmark the permalink.