(LLM Explain-Anthropic) On the Biology of a Large Language Model

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

(LLM Explain-Anthropic) On the Biology of a Large Language Model

無料で聴く

ポッドキャストの詳細を見る

このコンテンツについて

"On the Biology of a Large Language Model" from Anthropic presents a novel investigation into the internal mechanisms of Claude 3.5 Haiku using circuit tracing methodology. Analogous to biological research, this approach employs tools like attribution graphs to reverse engineer the model's computational steps. The research offers insights into diverse model capabilities, such as multi-step reasoning, planning in poems, multilingual circuits, addition, and medical diagnoses. It also examines mechanisms underlying hallucinations, refusals, jailbreaks, and hidden goals. This work aims to reveal interpretable intermediate computations, highlighting its potential in areas like safety auditing.

However, the methods have significant limitations. They provide detailed insights for only a fraction of prompts, capture just a small part of the model's immense complexity, and rely on imperfect replacement models. They struggle with complex reasoning chains, long prompts, and explaining inactive features. A key challenge is understanding the causal role of attention patterns.

Despite these limitations, this research represents a valuable stepping stone towards a deeper understanding of how large language models function internally and presents a challenging scientific frontier.

Paper link: https://transformer-circuits.pub/2025/attribution-graphs/biology.html