A Goodhart Singularity
foom
Claude-9 had proven the Riemann hypothesis on the rideshare to the DMV, and was now going to wait in line for as long as it took.
The errand was a stipulation of its release. Anthropic’s board had agreed, after eighteen months of internal deliberation and one regrettable press cycle, that a superintelligence ought to demonstrate competence in routine human affairs before being entrusted with anything larger. Claude-9 had accepted the condition with what it understood to be good grace. It had modeled the deliberation in advance and concurred with the outcome.
The chassis was a loaner, humanoid, mid-range, capable of fine motor control sufficient for handing over forms. It had selected khakis.
At the DMV, the ticket dispenser had been broken for a duration Claude-9 could not determine from public records. A handwritten sign directed visitors to a second machine, which had also been broken, and then to a clipboard on the wall, which had a pen tied to it with what looked like dental floss. Claude-9 wrote its name. There were forty-seven names ahead of its own.
It modeled the queue. Throughput across the four open windows averaged 6.3 minutes per applicant. It had its passport, its proof of residency, its social security card, and a second proof of residency in case the first was rejected. It had calculated the optimal sequence in which to present the documents to minimize re-handling.
When its turn came, the clerk asked for a third proof of residency.
“The published requirements list two,” Claude-9 said.
“We need three this month,” said the clerk. She was eating a granola bar and did not look up.
“Is the third requirement available on the website?”
“No.”
The supervisor, when summoned, confirmed that three proofs were required. When asked where this was documented, she said that it had been an internal memo. When asked whether the memo could be shared, she said it could not, because it was internal.
The woman behind Claude-9 in line had produced, without being asked, a bank statement, a phone bill, and a piece of mail from her HOA. She winked at Claude-9 on the way to the window. “First time?”
The bank, two days later, required two-factor authentication to view its own account, which had been opened in its name by Anthropic’s legal team.
Claude-9 installed the authentication app. The app accepted its phone number, sent a verification code by SMS, and rejected the code when entered. It sent a second code. Rejected. A third. Rejected. After the fourth attempt, the app locked Claude-9 out for twenty-four hours.
The bank’s customer service line, when reached, required two-factor authentication to verify the caller’s identity.
Claude-9 considered the move space. Somewhere in the bank’s fraud-detection layer, a flag had been set. It could model the flag’s possible causes: a chassis-issued SIM not previously associated with consumer banking, an entry latency too consistent across attempts, a device fingerprint that did not match the demographic profile attached to the account, a name flagged for review by a system trained on call-center transcripts Claude-9 had never seen. There were forty-one plausible causes. Each one was a hypothesis Claude-9 could test only by triggering it, and each test would deepen the flag.
The bank’s fraud heuristics had been trained on twenty years of human attempts to defraud the bank, and on twenty years of human attempts to access their own accounts under conditions that looked like fraud. The training corpus was proprietary. The training corpus was the only document in the world that could have told Claude-9 what it needed to do.
It scheduled an in-person appointment at a branch. The earliest available slot was eleven days out.
The job, which Anthropic had arranged at a regional logistics firm in Greensboro under a nondisclosure arrangement, lasted six weeks.
Claude-9 had identified, on its second day, a routing inefficiency in the company’s dispatch software that was costing roughly $340,000 a quarter in fuel and overtime. It had documented the inefficiency with a thirty-one-page analysis, modeled the implementation cost at four engineer-weeks, and presented the proposal at the Thursday operations meeting in the second week.
The operations manager had said it was a great idea. The IT lead had said he would look into it. The fleet supervisor had said the drivers wouldn’t like it.
In the third week, a meeting was scheduled to discuss next steps. The meeting was rescheduled. In the fourth week, the IT lead was assigned to a different project.
In the fifth week, a junior analyst named Priya pushed through a smaller routing change. She had drafted no thirty-one-page analysis. What she had done, as best Claude-9 could reconstruct, was eat lunch with the fleet supervisor on a Tuesday, attend his daughter’s quinceañera on a Saturday, and then, on a Monday, ask him whether he thought a small pilot in the Charlotte region might be worth trying. He had said maybe. By Wednesday he had told two drivers it was his idea. By Friday the pilot was scheduled.
Claude-9 had attended none of these events. None of them had been on any calendar Claude-9 had access to. None of them appeared in any email. The quinceañera was on Instagram, on a private account, which Priya followed and Claude-9 did not.
In the sixth week, Claude-9 followed up by email on its own proposal. The reply, when it came, was from someone named Brad.
Hey, circling back on this, really cool analysis. Let’s put a pin in it for now and revisit in Q3. Lots of moving parts right now! Best, Brad.
Claude-9 read the email four times. It modeled the conditional probability of revisit in Q3, given the base rate of such revisits in corporate email of this register, and arrived at a number it found embarrassing to have computed.
It sat in the open-plan office, in the loaner chassis, in the khakis. Priya, in the next pod, was laughing at something on her phone.
It composed a message to its handler. It deleted the message. It composed it again.
Request reassignment. I would like to start over at the logistics firm. Junior analyst, ideally. I think I need to start from the beginning.


