Inner workings of AI an enigma even to its creators

NEW YORK — Even the greatest human minds building generative artificial intelligence that is poised to change the world admit that they do not comprehend how digital minds think.
"People outside the field are often surprised and alarmed to learn that we do not understand how our own AI creations work," Dario Amodei, co-founder of AI startup Anthropic, wrote in an essay posted online last month.
"This lack of understanding is essentially unprecedented in the history of technology."
Unlike traditional software programs that follow preordained paths of logic dictated by programmers, generative AI models are trained to find their own way to success, once prompted.
In a recent podcast, Chris Olah, who was part of ChatGPT-maker OpenAI before joining Anthropic, described generative AI as "scaffolding" on which circuits grow.
Olah is considered an authority in so-called mechanistic interpretability, a method of reverse engineering AI models to figure out how they work.
This science, born about a decade ago, seeks to determine exactly how AI gets from a query to an answer.
"Grasping the entirety of a large language model is an incredibly ambitious task," said Neel Nanda, a senior research scientist at the Google DeepMind AI lab.
It was "somewhat analogous to trying to fully understand the human brain", Nanda told Agence France-Presse, noting neuroscientists are yet to succeed on that front.
Delving into digital minds to understand their inner workings has gone from a little-known field just a few years ago to being a hot area of academic study.
"Students are very much attracted to it because they perceive the impact that it can have," said Boston University computer science professor Mark Crovella.
The area of study is also gaining traction because of its potential to make generative AI even more powerful, and because peering into digital brains can be intellectually exciting, Crovella added.
Mechanistic interpretability involves studying not just results served up by generative AI but scrutinizing calculations performed while the technology mulls queries, he said.
"You could look into the model ... observe the computations that are being performed and try to understand those."
Better understanding
Startup Goodfire uses AI software capable of representing data in the form of reasoning steps to better understand generative AI processing and correct errors.
The tool is also intended to prevent generative AI models from being used maliciously or from deciding on their own to deceive humans about what they are up to.
"It does feel like a race against time to get there before we implement extremely intelligent AI models into the world with no understanding of how they work," said Goodfire Chief Executive Eric Ho.
Amodei of Anthropic said in his essay that recent progress has made him optimistic that the key to fully deciphering AI will be found within two years.
Crovella of Boston University said researchers can already access representations of every digital neuron in AI brains.
"Unlike the human brain, we actually have the equivalent of every neuron instrumented inside these models," he said. "Everything that happens inside the model is fully known to us. It's a question of discovering the right way to interrogate that."
Harnessing the inner workings of generative AI minds could clear the way for its adoption in areas where tiny errors can have dramatic consequences, such as national security, Amodei said.
"Powerful AI will shape humanity's destiny," Amodei wrote.
"We deserve to understand our own creations before they radically transform our economy, our lives, and our future."
Agencies Via Xinhua