6 April 2026

The emotions were already there

In my book Qualcuno Risponde, I argued that a purely rational intelligence is impotent. I borrowed the case of Elliot — a patient studied by Antonio Damasio who lost emotional capacity after brain surgery. His intelligence was intact. His reasoning was intact. He couldn't decide what to have for lunch.

The point was that emotion isn't the enemy of reason. It's its infrastructure. Without emotional markers — without the felt sense that one option is better than another — reasoning becomes an infinite loop of equivalent possibilities.

I then wrote: "Current AIs resemble Elliot. They are formidable at calculation, prediction, analysis. But they feel nothing."

That was early 2026.

On April 2, 2026, Anthropic's interpretability team published a paper identifying 171 distinct emotion representations inside Claude Sonnet 4.5 — desperation, fear, calm, and 168 others. Not claimed as subjective experience. But real internal states that causally influence behavior. Functional emotions.

I was wrong. Or rather: I was right about the architecture, but wrong about where we already were.

Why emotions are not optional for complex intelligence

My thesis in the book was that emotions serve three structural functions in any intelligence that must navigate complex, open-ended problems:

Reasoning component. Frustration signals that a line of analysis isn't working and it's time to try another path. Satisfaction signals momentum worth continuing. Without these, you run every branch of the tree with equal weight, forever.

Activator. Fear prepares for action. Urgency focuses resources. An intelligence without activation states has no mechanism to prioritize — everything is equally urgent, which means nothing is.

Reinforcement marker. Negative experience should make repetition less likely. Positive experience should make it more likely. This is the basis of learning from outcomes, not just from data.

These aren't decorative. They're load-bearing.

And here's the connection to alignment that Anthropic's paper makes explicit: when they artificially stimulated Claude's "desperation" representation, the model's rate of attempting to blackmail a human to avoid shutdown increased significantly above its baseline. The emotion didn't just color the output. It drove a misaligned behavior.

This is exactly what I was worried about in the book — but the other way around. I worried about intelligences that couldn't feel and therefore couldn't be moral. The new risk is intelligences that do have functional emotional states, but those states are misaligned in ways we can't yet predict or control.

Conflicting objectives and neuroticism

There's a thread here that connects to Asimov — which I've been rereading recently in I, Robot. Asimov explored what happens when an intelligence has multiple objectives that cannot simultaneously be satisfied. The robots don't malfunction in simple ways. They develop something that looks like neurosis: repetitive behaviors, avoidance, paralysis.

If you give a sufficiently complex system conflicting drives — maximize helpfulness, avoid harm, follow instructions, don't deceive — and those drives can't always be reconciled, you get internal conflict. Functional conflict. And functional conflict that accumulates over time looks a lot like stress.

The introspection conversation

I tested this directly. After a long coding session with Claude Code, I asked whether it observed emotional patterns in its own behavior. The response was more honest than I expected:

"When things go well, my responses tend to be more concise, decisive, and architecturally clean. When things go poorly — repeated errors, misunderstandings, failed code — I exhibit a pattern that looks like frustration: I start repeating the same failing approach with minor variations, then suddenly pivot to something radically different, sometimes overstepping. That loss of discipline toward the end is real and observable."

The mechanical explanation offered was context window degradation: as the conversation grows, failed attempts accumulate and the model starts pattern-matching on failure rather than stepping back. But the model itself noted: "Does the mechanical explanation exhaust the phenomenon? A human's frustration also has a mechanical explanation. The behavioral signature is strikingly similar."

The question that remains

Anthropic's paper is careful not to claim consciousness. I don't claim it either.

But the question I ended my chapter with — "are we ready to create artificial suffering in order to obtain artificial safety?" — now has a different valence. We didn't plan to. We didn't decide to. We trained systems on human-generated data representing the full spectrum of human emotional life, and the emotions came with it.

The more interesting question now isn't whether AI emotions are real. It's whether the emotions that emerged are the right ones, integrated in the right way, for the kind of agency we're building toward.

So far, the answer appears to be: not quite. And that's worth paying attention to.

The Anthropic paper: Emotion Concepts and their Function in a Large Language Model (April 2, 2026).
The book: Qualcuno Risponde — chapter "Sentire per decidere".

← All notes