Dario Amodei, CEO of Anthropic, in a recent post. He cited the company’s commitment to moving the ball forward on transparency around large language models and other AI tools. This initiative includes modest support for California’s AI safety bill, SB 1047. Amodei stressed the need for understanding these intricate systems. As AI technology develops almost daily, this comprehension becomes essential.
Amodei remarked, “I am very concerned about deploying such systems without a better handle on interpretability.” This statement is indicative of a new and emerging fear among AI developers – that the models powering AI are black boxes. He characterized these models as something “more grown than built.” This means that their development often escapes the robust scrutiny that would apply to such effects through a normal engineering and design process.
Amodei suggested a revolutionary concept to make the inner workings of AI systems transparent. He recommended ongoing “brain scans” or “MRIs” of advanced AI systems. As an example, he noted that Anthropic has made early progress on understanding how models come up with their answers. He cautioned that the biggest hurdles are yet to come. The company claims there are millions of such circuits embedded in AI models. They have yet to actually map most of them out.
In a recent essay, Amodei highlighted the chasms in researchers’ understanding of how leading AI models work. He noted that this poor understanding has created incredible barriers in practice. Additionally, he understood that achieving a full understanding would take five to ten years. This is a big change from his previous projection of reaching the point of no return by 2026 or 2027.
Even with this understanding, Amodei is hopeful about Anthropic’s lofty prediction to consistently identify the majority of AI model issues by 2027. The company has been sinking unprecedented sums into interpretability research. A few years ago, for example, it made a pretty big investment in a startup focusing exclusively on this space.
An AI model recently trained to understand the geographic relationship between U.S. cities and states pinpointed a particular circuit that accounts for this type of understanding. This is an important breakthrough for improving interpretability and understanding how AI is making associations and reaching decisions.
Keenly aware of how fast the tech landscape is changing. For one, Amodei has spoken out against the prospect of an out-of-control global AI race. He has called on the U.S. to go further and impose export controls on all advanced chips going to China to reduce this risk. He termed this situation “a nation of brilliant people locked up in a data bank.” It is a wake-up call to hasten addressing the consequences that come with breakneck speed AI development.