Binary Classification of binding sites from non-binding sites
Current prediction methods for NA binding residues normally compare all residues in all proteins together to measure the area under the ROC curve (AUC) for assessment. However, different NA binding proteins may have different affinity to Nucleic Acid and, thus, treating different proteins in the same way could be dangerous.
For binding sites prediction, the main aim is to locate the key binding region rather than all the details of binding sites, and we only need to discriminate binding residues against non-binding residues in the same protein. A simple illustration is shown in the Figure:
Figure. Illustration of the NA binding energy distributions on the accuracy as deduced by AUC. The red and blue lines show the distribution of NA binding affinities of residues in two different proteins. Dashed lines show two cutoffs for the binding sites, while the green regions are the binding sites. Since the two proteins have different affinities to NA, the energy cutoffs are different and should not be compared together. Otherwise, protein 1 would include false positive residues, while protein 2 includes false negative residues.
aaRNA show similar level of wAUC on data set meta_R44 (0.82) and Sungwook_R267(0.83), but its sensitivity on meta_R44 (0.8) is much higher than on Sungwook_R267(0.52) while specificity show the opposite, 0.73 vs. 0.89.
In fact, the program shows stable prediction accuracies on both of the sets, but the binary defined sensitivity is a trade-off of specificity and determined by a pre-set cutoff. If two different cutoffs could be set (similar to the figure), the resulted specificities and sensitivities could be on similar level.