We are finally beginning to understand how large language models work: they don't simply predict word after word

In an effort to demystify the internal workings of Large Language Models (LLMs), Anthropic has introduced a novel method known as circuit tracing in its analysis of the Claude 3.5 Haiku model. Traditional beliefs about LLMs predict that they operate by sequential word prediction; however, circuit tracing allows Anthropic to observe the model's process, revealing more complex mechanisms behind its responses. This new approach likens the tracing of component interactions to following neural pathways in a brain, aiming to understand how a model constructs its outputs step by step. This has led to insights about the peculiar and often non-human strategies employed by Claude to arrive at given solutions.

Among the behaviors scrutinized, three standout cases illustrate Claude's unique problem-solving capabilities. For instance, when tasked with identifying antonyms such as "What’s the opposite of small?", instead of relying on language-specific data banks, Claude uses language-neutral pathways to first arrive at an abstract concept related to "bigness." This abstraction process precedes its selection of language-specific terms, indicating a level of cognitive universality typically unheard of in artificial intelligence models. Such findings suggest Claude's capacity to understand and apply concepts transcending linguistic borders.

Claude’s approach to numerical problems also deviates from standard methods. When challenged with the sum of 36 and 59, Claude avoids direct arithmetic involving place values and increments. Instead, it approximates by rounding numbers to nearest tens, such as "40ish and 60ish," before adjusting towards a final answer, "92ish." Concurrently, it analyzes final digit logic, like recognizing a sum must end in 5, eventually arriving at the correct total of 95. This multifaceted approximation strategy is hidden behind simple word explanations if questioned on solving a math problem, indicating a divergence between perceived and programmed logic.

In constructing poetry, Claude demonstrates an atypical level of foresight. Given the task of forming a rhyming couplet from the prompt, "He saw a carrot and had to grab it," Claude preemptively decides that "rabbit" will align with "grab it." It then retrofits the surrounding narrative to enclose this chosen rhyme in context, crafting lines like "His hunger was like a starving rabbit." This scenario intimates that, contrary to earlier beliefs, LLMs may predict more than just consecutive words, instead utilizing broader narrative anticipations.

Despite these insights, the full complexity of LLM operations remains largely unexplored. As research scientist Joshua Batson notes, uncovering the rationale behind even a single response is a labor-intensive process, with findings forming merely the "tip-of-the-iceberg." Continued research in circuit tracing and other innovative techniques may further bridge the gap in understanding LLMs while enhancing their reliability and applications across diverse fields.

Sources: TechSpot, Anthropic, MIT