Theoretical defect of slide-window

Many sequence-based predictions use a slide-window approach, which is easy to implement. The essence of slide-window approach can be traced back to the fragment based prediction of protein structure, which underlies the homologous search of fragment. However, the problem of binding site prediction is different from protein structure prediction. And residues close in space that form the binding interface can be far away in sequence. In a slide-window scheme, residue neighbors far away in sequence cannot be considered. But the neighboring residues form the environment of the target residue and can determine the binding. Slide-window approach may result in a dangerous situation that the prediction of a target residue can stay the same even all its spatial neighbors have been mutated. This also explains the reason why structure-based predictions are more accurate than sequence based ones. And sequence-based predictions still have room for improvement to avoid slide window approach and fill this gap.

This Figure illustrates slide-window approach is less likely to consider real neighboring environment. Case of poly(A)-binding protein (PDB id: 1cvj chain A). Residue F102 is mutated into His, Glu and Asp. Variations around F102 is easy to comprehend but two other regions close in space are also related. RBscore consider the spatial neighbors, and show difference in region 127-129 and region 172-179. BindN+ show certain difference in the region 172-179. PPRInt and RNABindRPlus hardly show difference in prediction. Sequence-based methods are less likely to depict the effect of the mutation to the other regions far away in sequence. As demonstrated by deep mutational scanning, the single point mutation to Glu and Asp obviously make difference in the binding and should also affect other residue neighbors. Slide-window approach is difficult to capture such difference.

Similarly, template based approach is also based on homologous search and may directly use the binding sites of the template. However, when some of the neighbor residues have been mutated can directly lead to a different binding, which cannot be captured by this approach.