Scientists predict driver mutations of future SARS-CoV-2 variants of concern

A team of scientists from the USA has recently predicted the driver mutations that may appear in future variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The prediction is based on currently available genetic surveillance data on amino acid mutations present in SARS-CoV-2 variants. A detailed description of the study is currently available on the medRxiv* preprint server.

Study: Predicting the mutational drivers of future SARS-CoV-2 variants of concern. Image Credit: Design_Cells / Shutterstock


With the progression of the coronavirus disease 2019 (COVID-19) pandemic, several mutations have appeared in the protein sequences of SARS-CoV-2. The majority of these mutations remain neutral, and thus, do not improve the functional and/or immunogenic fitness of SARS-CoV-2. However, mutations that appear under positive selection pressure can potentially increase viral fitness, and thus, can contribute to viral evolution. Such driver mutations are primarily responsible for the emergence of novel viral variants with functional and immunogenic benefits.

Among SARS-CoV-2 variants, some are designated as the “Variants of Concern (VOC)” and some as the “Variant of Interest (VOI)” depending on the extent of negative clinical impacts they exert. The majority of VOCs contain multiple spike mutations that are significantly associated with increased infectivity, virulence, and/or immune escape ability. Thus, early identification of driver mutations that may appear in future VOCs/VOIs would be a great advantage to public health strategies.

In the current study, the scientists have analyzed currently available genomic surveillance data to predict potential driver mutations that may appear in future SARS-CoV-2 variants. They have hypothesized that the prediction will help identify dominant driver mutations that are responsible for viral evolution over time.

For forecast modeling, they have tested the importance of features comprising epidemiology, evolution, immunology, and neural network-based protein sequence modeling. Specifically, they have defined the patterns of rapid mutation transmission both at global and regional levels, as well as explained the relative predictive significance of amino acid mutations in relation to immunity, transmissibility, evolution, and epidemiology. By utilizing historical information from previous waves, they have done backtesting of a forecast model that predicts future transmission of mutations. Finally, they have demonstrated how predicted mutations may potentially influence clinical antibodies.      

Transmission of spreading mutations

The scientists analyzed more than 900,000 spike sequences to define mutational transmission within SARS-CoV-2 VOCs/VOIs both at global and regional levels. They defined “spreading mutations” as a specified fold change in frequency across multiple countries during the 3rd wave of the pandemic, using the three months prior data as a reference.

For example, they estimated that the frequency of a potentially transmitting mutation P681R increased 4-times in 15 countries and 20-times in 7 countries. This mutation is currently predominant in the B.1.617.1/2/3 variants of SARS-CoV-2. Taken together, these observations indicate that the forecasting model has successfully detected the increase in the frequency of P681R well before the emergence of P681R-associated COVID-19 cases in India.

Similarly, the scientists analyzed the patterns of mutational transmission at the regional level within the USA. In addition to well-documented mutations, they identified a panel of less known mutations, including T478K. This mutation seemed to have highly increased frequency in some US regions. Overall, by using the forecasting model, they potently identified the dynamics of mutations in SARS-CoV-2 variants, as well as determined the transmission of less documented mutations at the global and regional levels.

Characterization of spreading mutations

The scientists identified the characteristics of mutations that may predict the emergence of potential viral variants. By specifically focusing on the spike receptor-binding domain (RBD), they identified that angiotensin-converting enzyme 2 (ACE2) binding affinity and epitope – antibody binding affinity could potentially predict the mutational spreading.

In contrast to biological characteristics, epidemiological characteristics including the “Environmental Performance Index (EPI) Score” showed the highest predictive ability. By analyzing lineage expansion and recurrent mutations using the EPI metric, the scientists indicated that viral fitness is improved by a specific mutation.

Overall, by analyzing a panel of different biological and epidemiological characteristics of amino acid mutations, they concluded that the most effective prediction of mutational spreading could be made from immunity, transmissibility, evolution, language model, and epidemiologic features.    

Emergence of viral variants

By analyzing global and regional dynamics of mutations, the scientists observed that global epidemiology metrics are better than state-level metrics in predicting mutational spreading at the state level.

Importantly, using the forecasting model, they successfully predicted potential VOC-related mutations for more than 5 months before reaching a global frequency of 1%.

Prediction of mutational spreading

By using global metrics on the current data, the scientists prepared a panel of 22 forecasted mutations that may potentially contribute to SARS-CoV-2 VOCs over the coming months. They observed that the most forecasted mutations are associated with a consistent increase in frequency at the global level. Moreover, they identified that some of these mutations interfere with the binding ability of clinical antibodies.  

These forecasted mutations are not present in currently circulating viral variants, such as B.1.1.7, B.1.351, P.1, or B.1.427/B.1.429. Based on the prediction, the scientists suggest that for better management of the pandemic, a possible contribution of top forecasted mutations to viral infectivity or immune escape ability should be analyzed with priority.

*Important notice

medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Maher MC. 2021. Predicting the mutational drivers of future SARS-CoV-2 variants of concern. MedRxiv. doi:,

Posted in: Medical Science News | Medical Research News | Disease/Infection News

Tags: ACE2, Amino Acid, Angiotensin, Angiotensin-Converting Enzyme 2, Antibodies, Antibody, binding affinity, Coronavirus, Coronavirus Disease COVID-19, Enzyme, Epidemiology, Evolution, Frequency, Genetic, Genomic, Immunology, Language, Mutation, Pandemic, Phylogeny, Protein, Public Health, Receptor, Respiratory, SARS, SARS-CoV-2, Severe Acute Respiratory, Severe Acute Respiratory Syndrome, Spike Protein, Syndrome

Comments (0)

Written by

Dr. Sanchari Sinha Dutta

Dr. Sanchari Sinha Dutta is a science communicator who believes in spreading the power of science in every corner of the world. She has a Bachelor of Science (B.Sc.) degree and a Master's of Science (M.Sc.) in biology and human physiology. Following her Master's degree, Sanchari went on to study a Ph.D. in human physiology. She has authored more than 10 original research articles, all of which have been published in world renowned international journals.

Source: Read Full Article