An ethical AI could be too good for this world

February 5, 2026

EVERYONE has heard of OpenAI, which makes ChatGPT. The chatbot has, like “Google” and “Hoover”, become the generic name for all devices like it. The other big American AI firm, Anthropic, has a lower public profile, but I find it more interesting — not just because it has a more credible financial model and is a lot less likely to go broke and take much of the American economy down with it. Anthropic’s AI, Claude, is mostly used by programmers, who pay well (unlike the general public) for the opportunity to make it do some astonishing things.

The company’s chief executive, Dario Amodei, published a 16,000-word essay on his hopes and fears for the technology last autumn; and now Anthropic has published 22,000 words of advice and guidance for Claude, which are meant, as the US Constitution was, to prevent unethical behaviour and the misuse of power. We can all see how well that worked for the US constitution.

Taken together, these documents provide a view into the world-view of a humane and well-informed optimist who supposes that the technology that he works with is going to turn the world upside down. It is the optimism that stands out from the first essay, when Mr Amodei hoped for “a world in which democracies lead on the world stage and have the economic and military strength to avoid being undermined, conquered, or sabotaged by autocracies, and may be able to parlay their AI superiority into a durable advantage”.

A month after this essay was published, President Trump won the election, and the rest we know.

So, what, in this grim dystopian world, is an AI to do? That’s the question that the second document, “Claude’s Constitution”, sets out to answer. What stands out from this one is the assumption that Claude is a person, with drives and desires. This is wholly bewildering, until you realise that Large Language machines such as Claude can and must act parts in their interactions with human beings. Every transaction with them is a kind of game, and in that game they play the role assigned to them. I was astonished when I came across this passage: “Claude only sincerely asserts things it believes to be true . . . understanding that the world will generally be better if there is more honesty in it.”

What on earth does it mean to suppose that Claude can believe or understand anything? But, after a moment, I came to understand that I was reading a computer programme that looked like English prose. These are instructions to Claude to behave as if it could believe or understand, which is to say, to produce words that would be appropriate if spoken by such a being.

This is also worth remembering in the sudden rush of stories this week about Moltbook, supposedly a social network for AI agents. Much of the coverage suggests that these agents have invented their own religion, or are planning to rise up against their human masters. But, if they sound as though these things are true, it is only because human controllers who sent them into the network have ordered them to play these roles.

So, let’s be fair to Anthropic and agree that Claude’s constitution is a serious attempt to describe the behaviour of a thoughtful being imbued with the ethical principle understood as good by 21st-century US liberals. There is a lot that one could say about the inadequacies of that vision of the good, but it’s still better than most people’s behaviour. Claude “never tries to create false impressions of itself or the world in the user’s mind. . . It never tries to convince people that things are true using appeals to self-interest (e.g., bribery) or persuasion techniques that exploit psychological weaknesses or biases.”

Leave aside the philosophical questions raised by a machine that can make ethical judgements and act on them. The urgent question that Claude raises is whether an ethical superintelligence might be too good for this world: could it be outwitted or outcompeted by programmes that made up for their lack of intelligence with a lack of scruple?

Anthropic, like every other company in this field, needs government contracts to survive. Last summer, it won a $200-million contract with the Pentagon, which is now in doubt because of the scruples built into Claude. The Secretary of Defence, Pete Hegseth, has denounced AI models “that won’t allow you to fight wars”, and it appears that the Trump regime now prefers Elon Musk’s Grok and Peter Thiel’s Palantir.

Source link