Anthropic develops ‘AI microscope’ to reveal how large language models think

In what might be a significant AI breakthrough, Anthropic researchers said that they have developed a new tool to help understand how large language models (LLMs) actually work.
The AI startup behind Claude said the new tool is capable of deciphering how LLMs think. Taking inspiration from the field of neuroscience, Anthropic said it was able to build a kind of AI microscope that “let us identify patterns of activity and flows of information.”
“Knowing how models like Claude think would allow us to have a better understanding of their abilities, as well as help us ensure that they’re doing what we intend them to,” the company said in a blog post published on Thursday, March
Beyond their capabilities, today’s LLMs are often described as black boxes since AI researchers are yet to figure out exactly how the AI models arrived at a particular response without requiring any programming. Other grey areas of understanding pertain to AI hallucinations, fine-tuning, and jailbreaking.