Generative AI will be designing new drugs all on its own in the near future

By Trevor Laurence Jockims

Eli Lilly chief information and digital officer Diogo Rau was recently involved in some experiments in the office, but not the typical drug research work that you might expect to be among the lab tinkering inside a major pharmaceutical company.

Lilly has been using generative AI to search through millions of molecules. With AI able to move at a speed of discovery which in five minutes can generate as many molecules as Lilly could synthesize in an entire year in traditional wet labs, it make sense to test the limits of artificial intelligence in medicine. But there’s no way to know if the abundance of AI-generated designs will work in the real world, and that’s something skeptical company executives wanted to learn more about.

The top AI-generated biological designs, molecules that Rau described as having “weird-looking structures” that could not be matched to much in the company’s existing molecular database, but that looked like potentially strong drug candidates, were taken to Lilly research scientists. Executives, including Rau, expected scientists to dismiss the AI results.

“They can’t possibly be this good?” he remembered thinking before presented the AI results.

The scientists were expected to point out everything wrong with the AI-generated designs, but what they offered in response was a surprise to Lilly executives: ”‘It’s interesting; we hadn’t thought about designing a molecule that way,’” Rau recalled them saying as he related the story, previously unreported, to attendees at last November’s CNBC Technology Executive Council Summit.

“That was an epiphany for me,” Rau said. “We always talk about training the machines, but another art is where the machines produce ideas based on a data set that humans wouldn’t have been able to see or visualize. This spurs even more creativity by opening pathways in medicine development that humans may not have otherwise explored.”

According to executives working at the intersection of AI and health care, the field is on a trajectory that will see medicines completely generated by AI in the near future; according to some, within a few years at most it will become a norm in drug discovery. Generative AI is rapidly accelerating its applicability to the developments and discovery of new medications, in a move that will reshape not only the pharmaceutical industry but ground-level ideas that have been built into the scientific method for centuries.

When Google’s DeepMind broke the protein mold

The moment this trajectory first became clear was years before ChatGPT broke through into the public consciousness. It was “the AlphaFold moment” in 2021, according to Kimberly Powell, vice president of health care at Nvidia, when Google’s DeepMind AI unit — which had become famous for showing how different AI’s creative thinking could be from humans in the Chinese strategy game of Go — pioneered the application of AI large language models to biology. “AlphaFold was this pivotal moment when we could train these transformer models with very large data sets and go from amino acid sequence to a protein structure, which is at the core of doing drug development and design,” Powell said.

The advances related to AI are taking place within a field of biology that has been increasingly digitized at what Powell describes as “unprecedented scales and resolutions.”

It’s a medical revolution that includes spatial genomics scanning millions of cells within tissue, in 3-D, and AI model-building that specifically benefits from a catalog of chemicals already in a digital form which allows generative AI transformer models to now go to work on them. “This training can be done using unsupervised and self-supervised learning, and it can be done not only rapidly but imaginatively: the AI can ‘think’ of drug models that a human would not,” Powell said.

An analogy for understanding the development of AI drugs can be found in the mechanisms of ChatGPT. “It’s essentially been trained on every book, every webpage, every PDF document, and it’s encoded the knowledge of the world in such a way that you can ask it questions and it can generate you answers,” Powell said.

The GPT-version of drug discovery

Drug discovery is a process of witnessing interactions and changes in biological behavior, but what would take months, or years, in a lab, can be represented in computer models that simulate traditional biological behavior. “And when you can simulate their behavior, you can predict how things might work together and interact,” she said. “We now have this ability to represent the world of drugs — biology and chemistry — because we have AI supercomputers using AI and a GPT -like method, and with all of the digital biology data, we can represent the world of drugs in a computer for the very first time.”

It’s a radical departure from the classic empirical method that has dominated the last century of drug discovery: extensive experimentation, subsequent gathering of data, analysis of the data on a human level, followed by another design process based on those results. Experimentation within the walls of a company followed by several decision points that scientists and executives hope will result in successful clinical trials. “It’s a very artisanal process,” Powell said. As a result, it’s a drug discovery process that has a 90% failure rate.

AI backers believe it will save time and improve success rates, transforming the classic process into engineering that is more systematic and repeatable, allowing drug researchers to build off a higher success rate. Citing results from recent studies published in NaturePowell noted that Amgen found a drug discovery process that once might have taken years can be cut down to months with the help of AI. Even more important — given the cost of drug development, which can range from $30M to $300M per trial — the success rate jumped when AI was introduced to the process early on. After a two-year traditional development process, the probability of success was 50/50. At the end of the faster AI-augmented process, the success rate rose to 90%, Powell said, .

“The progress of drug discovery, we predict, should massively go up,” Powell said. Some of the noted flaws of generative AI, its propensity to “hallucinate” for example, could prove to be powerful in drug discovery. “Over the last many decades, we have kind of been looking at the same targets, but what if we can use the generative approach to open up new targets?” she added.

‘Hallucinating’ new drugs

Protein discovery is an example. Biological evolution works by identifying a protein that works well, and then nature moves on. It doesn’t test all the other proteins that may also work, or work better. AI, on the other hand, can begin its work with non-existent proteins within models, an approach that would be untenable in a classic empirical model. By the numbers, AI has a much bigger discovery set to explore. With a potential number of proteins that could act as a therapy essentially infinite, Powell said — 10 to the power of 160, or ten with one hundred and sixty zeroes — the existing limit on working with the proteins nature has given humanity is exploded. “You can use these models to hallucinate proteins that might have all of the functions and features we need. It can go where a human mind wouldn’t, but a computer can,” Powell said.

The University of Texas at Austin recently purchased one of the largest NVIDIA computing clusters for its new Center for Generative AI.

“Just as ChatGPT is able to learn from strings of letters, chemicals can be represented as strings, and we can learn from them,” said Andy Ellington, professor of molecular biosciences. AI is learning to distinguish drugs from non-drugs, and to create new drugs, in the same way that ChatGPT can create sentences, Ellington said. “As these advances are paired with ongoing efforts in predicting protein structures, it should soon be possible to identify drug-like compounds that can be fit to key targets,” he said.

Daniel Diaz, a postdoctoral fellow in computer science who leads the deep proteins group at UT’s Institute for Foundations of Machine Learning, said most current AI work on drugs is centered on small molecule discovery, but he thinks the bigger impact will be in the development of novel biologics (protein-based drugs), where he is already seeing how AI can speed up the process of finding the best designs.

A UT Austin group is currently running animal experiments on a therapeutic for breast cancer that is an engineered version of a human protein that degrades a key metabolite that breast cancer is dependent on — essentially starving the cancer. Traditionally, when scientists need a protein for therapeutics, they look for several features, including stable proteins that don’t fall apart easily. That requires scientists to introduce genetic engineering to tweak a protein, a cumbersome process in lab work — mapping the structure and identifying, from all the possible genetic modifications, the best options.

Now, AI models are helping narrow down the possibilities, so scientists more quickly know the optimal modifications to try. In the experiment Diaz cited, use of an AI-enhanced version that is more stable resulted in a roughly sevenfold improvement in yield of the protein, so researchers end up with more protein to test, use, etc. “The results are looking very promising,” he said. And since it’s a human-based protein, the chances of patients becoming allergic to the drug — allergic responses to protein-based drugs are a big problem — are minimized.

Nvidia’s recent release of what it calls “microservices” for AI healthcare, including for drug discovery — a component in its aggressive ambitions for health sector AI adoption — allows researchers to screen for trillions of drug compounds and predict protein structures. Computational software design company Cadence is integrating Nvidia AI in a molecular design platform which allows researchers to generate, search and model data libraries with hundreds of billions of compounds. It’s also offering research capabilities related to DeepMind’s AlphaFold-2 protein model.

“AlphaFold is hard for a biologist to just use, so we’ve simplified it,” Powell said. “You can go to a webpage and input an amino acid sequence and the actual structure comes out. If you were to do that with an instrument, the instrument would cost you $5 million, and you’d need three [full-time equivalent workers] FTE to run, and you might get the structure in a year. We’ve made that instantaneous in a webpage,” Powell said.

Ultimately, AI-designed drugs will rise or fail based on the traditional final step in drug development: performance in human trials.

“You still have to generate ground proof,” Powell said.

She compared the current level of progress to the training of self-driving cars, where data is being collecting constantly to reinforce and re-enhance models. “The exact same thing is happening in drug discovery,” she said. “You can use these methods to explore new space … hone it, hone it … do more intelligent experimentation, take that experiment data and feed it back into the models, and around the loop goes.”

But the biological space within the broader AI model field is still small by comparison. The AI industry is in the range of a trillion model or more in areas of multi-modal and natural language processing. By comparison, the biology models number in the tens of billions.

“We are in the early innings,” Powell said. “An average word is less than ten letters long. A genome is 3 billion letters long.”

Read the full article

Recommended Posts