The Language of a Mutated Virus
Written by Gloria Wang
Natural Language Processing (NLP) is a branch of AI that deals specifically with the communication between computers and people using human language. But aside from being able to understand languages like English, Chinese, and German, NLP algorithms are now able to understand the language of genes. A team of researchers from MIT recently used a combination of NLP algorithms designed for modeling protein sequences and genetic codes to predict mutations that allow viruses to avoid detection of antibodies in the immune system, a process known as viral immune escape.
As with all machine learning systems, NLP models must be trained. But instead of training this model on sentences and phrases, MIT researchers used tens of thousands of genetic sequences from three different viruses: influenza, HIV, and SARS-CoV-2, better known as the coronavirus. Their goal is to identify mutations that allow viral immune escape, or, in terms of linguistics, “mutations that change a virus’s meaning without making it grammatically incorrect” (Heaven 2021).
Figure 1
Image of “Mutated” sentences compared to the original sentence
Source: Hie 2021
For example, take the following sentences “mutated” from the original “winegrowers revel in good season”: “winegrowers revel in strong season,” and “winegrowers revel in flu season.” Both variations have the same grammatical structure, but one has changed the meaning of the sentence significantly more than the other. The virus mutation where the meaning has changed the most significantly is the one that is flagged as mutations which allow viral immune escape.
Comparing their predictions of escape mutations to real viruses in the lab, researchers found that accuracy ranged from area under the curve (AOC) scores of 0.69 to 0.85, better than many state-of-the-art models. This procedure shows serious potential for public health. Understanding which mutations can go undetected by last year’s antibodies can help determine how well previous vaccines and antibodies will fare this year.
Most notably, the team ran the model on new variants of coronavirus, including the highly infectious UK mutation, as well as variants from Denmark, Singapore, Malaysia, and South Africa— all in which a high potential for viral immune escape was found.
And this is just the beginning. With this technology, scientists can get a better understanding of the world around us, extending NLP technologies beyond simply human language, and into the language of a mutated virus.
References
Hie, B., et al. (2021). Learning the language of viral evolution and escape. Science, Vol. 371, Issue 6526, pp. 284-288, DOI: 10.1126/science.abd7331. Retrieved 18 January 2021, from https://science.sciencemag.org/content/371/6526/284.
Heaven, W. (2021). AIs that read sentences are now catching coronavirus mutations. MIT Technology Review. Retrieved 18 January 2021, from https://www.technologyreview.com/2021/01/14/1016162/ai-language-nlp-coronavirus-hiv-flu-mutations-antinbodies-immune-vaccines/.