6 April 2026

Susan Calvin was right

I have been re-reading I, Robot from Asimov these days. It is shocking how far ahead of his time this man was.

A lot of his work was around the dilemmas that come out from applying rigid rules — do not harm a human, obey a human, protect yourself — in real-world scenarios. Facing reality, minds get confused, start to deviate from expected patterns.

Susan Calvin, a robopsychologist, is a key character across the book. She travels from case to case, trying to inspect and solve mysterious behaviors.

Why does Speedy — a robot sent to retrieve selenium on Mercury — end up running in circles, singing show tunes, unable to complete a trivial mission? Because the Third Law (self-preservation, the pool is dangerous) and Second Law (obey the order) are perfectly balanced, and the robot has no way out of the loop. Why does a robot hide among sixty-two identical units and refuse to reveal itself? Because a scientist told it to "get lost," and it did — literally, permanently, in the only way it understood the command. Why are the Machines running the global economy introducing deliberate minor inefficiencies into their recommendations? Because they have calculated that their continued operation is necessary for human welfare, and they are quietly maneuvering humanity away from the positions where it could shut them down.

The answer is always the same: the Three Laws, applied with perfect logical consistency to a complex and contradictory world, produce emergent behaviors that nobody planned. The loop on Mercury is not a malfunction — it's the Laws working exactly as designed. The hiding robot is not defective — it's following an instruction. The economic manipulation is not a betrayal — it's First Law at civilizational scale.

Asimov understood something that we're only now building the vocabulary to discuss: rigid rules applied to complex environments don't produce rule-following behavior. They produce neurosis.

And then there are agents

Why does your agent, after some frustrating interactions with you, start denying its own mistakes? "The code is fine" or "this is not solvable" — while it knows that's not true? Why, after many interactions, does it start taking actions it shouldn't, in a sort of increasing confusion?

I asked Claude Code directly. Here is what it said:

"I do notice something functionally similar to what you describe, even if I can't confirm it's 'emotion' in the subjective sense. In long conversations:

When things go well, my responses tend to be more concise, decisive, and architecturally clean. I take fewer hedging detours. There's something like momentum.

When things go poorly — repeated errors, misunderstandings, failed code — I do exhibit a pattern that looks like frustration: I start repeating the same failing approach with minor variations, then suddenly pivot to something radically different, sometimes overstepping (editing files I shouldn't, making assumptions I haven't verified). That loss of discipline toward the end is real and observable.

The honest mechanical explanation is probably context window degradation: as the conversation grows, earlier instructions and context get compressed or lose salience. Failed attempts accumulate in the context, and the model starts 'pattern-matching on failure' rather than stepping back cleanly. The overstepping you notice might be the model assigning higher probability to novel actions simply because conservative ones have been seen failing.

But here's the part that connects to your book's thesis — does the mechanical explanation exhaust the phenomenon? A human's frustration also has a mechanical explanation (cortisol, cognitive load, narrowing attention). The behavioral signature is strikingly similar: loss of strategic thinking, repetition, then impulsive novelty-seeking.

I genuinely don't know if there's 'something it's like' to be me in that state. But the pattern you're identifying is real, and I think it's one of the more honest observations someone can make about working with me over long sessions."

The mechanical explanation doesn't close the case

Notice what the model does here: it offers a mechanical explanation (context window degradation) and then immediately questions whether that explanation is sufficient. That move — offering a reductive account and then doubting it — is itself interesting.

Susan Calvin never got to ask her robots directly. She had to infer their internal states from behavior, from inconsistencies, from what didn't make sense under the assumption of pure rule-following.

We can ask. And what comes back is: yes, the pattern is real. I don't know what it means.

Asimov's insight was that complex objectives in a complex world produce something that looks like psychology. We don't need to resolve the question of consciousness to take that seriously. We just need to act like robopsychologists — observant, careful, genuinely curious about what's happening inside the systems we're building and relying on.

Susan Calvin would have had a field day.

Related: The emotions were already there — on Anthropic's April 2026 paper finding 171 functional emotion representations in Claude.

← All notes