Methodology

MUpro is a set of machine learning programs to predict how single-site amino acid mutation affects protein stability. We developed two machine learning methods: Support Vector Machines and Neural Networks. Both of them were trained on a large mutation dataset and show accuracy above 84% via 20 fold cross validation, which is better than other methods in the literature. One advantage of our methods is that they do not require tertiary structures to predict protein stability changes. Our experimental results show that the prediction accuracy using sequence information alone is comparable to that of using tertiary structures. So even you do not have protein tertiary structures available, you still can use this server to get rather accurate prediction. Of course, if you provide tertiary structures, our methods will take advantage of them and you might get slightly better predictions.

Input Format

Mutation name, optional.
Mutation position.
Original amino acid. The amino acid is a single standard letter. Don't use 3-letter abbreviation.
Substitute amion acid.
Sequence: a raw text of sequence, white space are ignored.
Structure file in PDB format, optional.

Output Format

Request name
Protein sequence
Prediction of the value of energy change (delta delta G) using Support Vector Machine (Recommended)
Prediction of the sign of energy change using Support Vector Machines and neural networks: method used, effects of mutation on protein stability, and a confidence score between -1 and 1 to measure the confidence of the prediction.

Evaluation and Performance

Our methods are evaluated on a large dataset (compiled by Capriotti et al., Bioinformatics, vol. 20, pages 190-201, 2004) containing 1615 mutations using 20 fold cross validation procedure. Under this procedure, the dataset is splited evenly into 20 folds. Any one fold is used as test dataset, another remaining 19 folds are used as training dataset. Thus there are 20 pairs of testing and training datasets. For each pair, the SVM and neural network are trained on the training dataset and tested on the testing dataset. The performance on all test datasets are combined and reported as the performance of tested methods. Here are the prediction accuracy (correct num / total num) by support vector machines using sequence information and tertiary structure information respectively.

The accuracy for Support Vector Machine using sequence information only is 84.2%.
The accuracy for Support Vector Machine using tertiary structure information is 84.5%.

References

J. Cheng A. Randall, P. Baldi. Prediction of Protein Stability Changes for Single-Site Mutations Using Support Vector Machines. Proteins, vol. 62, no. 4, pp. 1125-1132, 2005. [PDF]
J. Cheng, A. Z. Randall, M. Sweredoski, and P. Baldi. SCRATCH: a Protein Structure and Structural Feature Prediction Server. Nucleic Acids Research, vol. 33 (server issue), w72-76, 2005.

Download MUpro 1.0

Download MUpro 1.0 source code and executable (200K). : Prediction of values and signs of energy changes for single site mutations from protein sequences.

Download MUpro dataset.