Anthropic apologizes for invisible Claude Fable guardrails

www.theverge.com - 162 poäng - 156 kommentarer - 23684 sekunder sedan

https://web.archive.org/web/20260611122253/https://www.theve..., https://archive.ph/y4V4k

Kommentarer (39)

Sol- - 5558 sekunder sedan
This has dampened my opinion on Anthropic quite a bit. It's difficult to take their marketing for AI as an empowering technology seriously when they are quite clear in their new deployments that they do not mean empowering for you, but empowering for them and organizations that are in their (or the US government's, despite Anthropics performative disagreements with the administration) good graces. You are allowed to vibe code some dashboards, a web app or let it drive Excel, but anything more interesting than that is forbidden.
If it was just plain monetary concerns and sabotage of competitors I'd almost be fine with it, but it seems they actively want to monopolize most of human progress in their enlightened hands, lest the mob does something undesirable with these powers.
Avicebron - 7496 sekunder sedan
I like Claude Code a lot, I think it sets a dangerous precedent to put guardrails in that return a response from a prompt that was modified by the system in real time in order to subvert the original intent.
Fail cleanly. Anything else makes it too difficult to rely on.
edit: Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word. But the EA thing is really leaking through, and paternalism isn't a good look.
VeninVidiaVicii - 2226 sekunder sedan
This is absolutely insane:
Repro (de-identified): sample_dataset_group1.tsv - Geometry: Heatmap - X axis: frac_set set + condition (two columns → the "Add column" cross join) - Y axis: condition - Color: mean frac_set value, Sequential
When the X axis is a cross join of two columns (the second added via "Add column"), the x-axis tick labels (frac_set_2, frac_set_3, frac_set_4, frac_set_5) render in a broken state, rotated and offset, visually caught mid-transition, as if a CSS transition started and never settled to its resting position.
● Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more
HarHarVeryFunny - 2823 sekunder sedan
I suppose it's an improvement, but it doesn't make the model any more useful. Anthropic are now being quite explicit that they'll choose what you can and can't use their models for, and most importantly that's not limited to any safety concerns - it includes not allowing you to work on AI (and anything else Anthropic may choose to work on).
What's interesting is they say they'll change this to an explicit refusal in a few days, which seems too fast for them to retrain Fable/Mythos itself, so implies that this was always a filter in front of the model, and judging by how crude their "safety" filter is, this "might compete with us" filter is not going to be any better.
I also wonder who's paying for the tokens consumed by the filter (presumably also an LLM) - is that now factored into the input tokens cost? Hopefully(?) it is an LLM not just a regex like Claude Code's "sentiment" (swear) detector.
- 2084 sekunder sedan
rdtsc - 2318 sekunder sedan
The power is getting to their heads it seems.
With the guard rails explicit or implicit do they refund back the tokens after you've hit the guard rails? I guess they don't. They could just throttle you just to save money then. You may be paying Fable prices but getting Haiku results with some excuse that well this coding issue sounds like a security bug.
I don't know, I'd rather have something less powerful but more predictable.
CSMastermind - 2784 sekunder sedan
They should apologize for their visible gaurdrails, I don't think I've had a conversation that hasn't downgraded to Opus for completely inexplicable reasons.
film42 - 6350 sekunder sedan
I'm surprised they didn't do this the first time around. Like, a user says they forgot their password and you tell them they don't actually have an account, that's an information disclosure vulnerability. Not automatically falling back to Opus just lets the "attacker" know they are bumping against the guardrails and they need to try a different strategy.
It's Anthropic's product and they can do what they want, but my concern is what happens if Fable's product team decides that they can route 25% of traffic to Opus, bill it as Fable, and max their KPIs. That just doesn't sit right.
- 2546 sekunder sedan
highfrequency - 2590 sekunder sedan
I wish it were ok for companies to bluntly say: “we made these decisions for competitive reasons, but the public backlash outweighed that so we are reversing course.”
I think it’s normal and morally fine for companies to want to protect their leadership position. I find the process of creating narratives that justify these decisions as something chosen for the good of others is a little tedious.
ComputerGuru - 4171 sekunder sedan
The problem with trust is that it is easy to lose and hard to get back.
You can't blame the people commenting "they SAY they won't silently sabotage your session but how can we know?" because they're right, we can't ever know. And Anthropic has firmly planted the seeds of doubt.
accelbred - 3722 sekunder sedan
I don't think they can convince me they have actually reversed course on this. Its invisible so we wouldn't know if they kept on doing it secretly. It required building out technical capability which is unlikely to remain forever unused while conveniently available to them.
They relied on trust that they were providing the service they were being paid for. That trust was blown, and an "oops, lets undo that" does not regain trust. It would be prudent to assume the invisible guardraild are possibly in play for all future Clause use, Fable or otherwise.
stevefan1999 - 5001 sekunder sedan
Then reset the quotas as an atonement ;p
Seriously though, Fable was not that great facing a greenfield subject. It is excellent at oneshotting some math problems, but if you want it to do some cutting edge tech stuff, say like piecing together a new Crossplane XRD, by reading existing Helm chart and with application source code available. I still have to get a few pass for Fable to get it done right, and at this point I may consider making a skill for it. I even gave it the source code of the Crossplane itself and tell it to be careful about CRDs and data flow, but it is still pretty silly. Adaptiveness for Fable is still not great, and I think it is a well known problem for Anthropic, albeit all LLMs do suffer a lot from subjects they don't know and will hallucinate stuff very frequently.
dang - 7235 sekunder sedan
Related. Others?
Anthropic walks back policy that could have 'sabotaged' researchers using Claude - https://news.ycombinator.com/item?id=48485958 - June 2026 (30 comments)
Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable - https://news.ycombinator.com/item?id=48478969 - June 2026 (488 comments)
If Claude Fable stops helping you, you'll never know - https://news.ycombinator.com/item?id=48467896 - June 2026 (495 comments)
---
Also related, I guess?
AWS Bedrock to require sharing data with Anthropic for Mythos and future models - https://news.ycombinator.com/item?id=48473166 - June 2026 (248 comments)
Anthropic requires 30 day data retention for Fable and Mythos - https://news.ycombinator.com/item?id=48464258 - June 2026 (291 comments)
jarjoura - 3705 sekunder sedan
Can anyone help me understand why this particular issue is any different than Anthropic training its models with its brand of moral judgement since day one? I've always been turned off by their particular stances on things they bake into their models that steer users in directions.
Maybe this is just a different set of people now realizing that Anthropic does this and has always done this?
Do not forget that this company is launching this thing at the moment it's trying to IPO. It's not rocket science that their very public steering/denial claim is really just them hinting to interested investors that their moat is absolute.
tornikeo - 2603 sekunder sedan
I moved off Claude Code 3 months ago.
That decision keeps getting better and better as time goes on.
decorner - 2344 sekunder sedan
New overlord, same as the old overlord.
sometimelurker - 4537 sekunder sedan
I don't like this shift in the Overton window, or at least their perspection of the Overton window. I really do like their open work on mech interp tho. least bad AI lab imo.
also if they do this or not is unprovable and other labs will probably silently implement this too. it'll be 100% normal by this time next year
kingcauchy - 4922 sekunder sedan
How much of the apology was written by Claude? How much of the release note process was written by Claude? Will they have better prompts going forward to make sure Claude doesn't write upsetting things into the release notes for devs like silent nerfing? Spooky times.
jmount - 3884 sekunder sedan
The whole arc was brilliantly evil. Once they put int the guardrails then Claude is fully un-falsifiable, and failure can be claimed intentional.
- 5075 sekunder sedan
BrenBarn - 2266 sekunder sedan
This just means next time they'll make sure to keep it really secret.
airstrike - 7114 sekunder sedan
This article reads like it was written by Claude and forwarded to Verge.
klmarks - 5478 sekunder sedan
The restrictions are there so that security researchers cannot disprove the Mythos claims:
"You see, Mythos can automatically break out of a VM running on SELinux, but unfortunately this is too dangerous and we had to implement guardrails for the Fable peasants."
xpct - 4398 sekunder sedan
It's probably good that they walked back on it. It also makes them look somewhat weak in terms of believing their claimed mission.
prodigycorp - 7212 sekunder sedan
Anthropic apologizes for nothing. We all know where the EA cult on things of this matter and any statements otherwise is just PR.
The beliefs of these people, and how they manifest, is deeply terrifying to me. They believe that any means are acceptable to achieve what they believe is a better end.
3fffa - 4040 sekunder sedan
The demand for Google's products and open source just shifted.
Neither OAI or Anthropic can be trusted.
rodrigodlu - 2676 sekunder sedan
The same week that they will move goalposts by blocking 3rd party harnesses on claude code. Nice.
I was a happy Max user.
SilverElfin - 6362 sekunder sedan
Invisible guardrails? Or purposeful sabotage if you use it for building AI capabilities?
But also, it isn’t the only huge mistake Anthropic has made in the last 48 hours. Having a sneaky data retention policy, while also giving companies no way to block Fable, is a massive problem. And it is ridiculous that Anthropic has so little respect for its customers. OpenAI should take advantage of this.
bellowsgulch - 7037 sekunder sedan
Such a weird openly immoral way to defend your moat, too.
Why not just tell people, "To defend our ability to be competitive in our industry, we ask that you do not use Claude or any of our models to independently perform research on large language models or any of its related architectures or technologies. In order to prevent this violation of the Terms of Service, we have trained Claude Fable to deny any requests or prompts which involve frontier AI research."
mlazos - 5916 sekunder sedan
The idea of them purposefully wasting my time by having the model act dumber and me having to argue with it without knowing if it’s the prompt or the model was just such an idiotic product decision I can’t believe they shipped that without getting any feedback from users first.
rvz - 5150 sekunder sedan
Why would anyone defend Anthropic after this? Imagine falling for the DoW supply chain risk designation, and now this. This company is trying to ban powerful open models and restrict access to frontier models to slow everyone else down.
They just showed that they CAN do this right in front of you. Local open weight models are a necessity.
whatever1 - 6062 sekunder sedan
Boobytrapping is illegal. Anthropic wanted to poison its customers on the suspicion of them misusing their services.
system2 - 4071 sekunder sedan
Will Anthropic ever respond to these negative comments here? They won't.
behnamoh - 6598 sekunder sedan
They didn't apologize for doing it, they are sorry they were caught doing it. They still nerf the model if your request is about AI development.
sergiotapia - 5851 sekunder sedan
The damage is done. If you're in engineering, think hard about using Claude for your work. This is not a moral company.
God bless the Chinese companies releasing true open source models. Imagine a world without them, we would be at the mercy of unscrupulous people.
micromacrofoot - 6182 sekunder sedan
incredible marketing from anthropic with all the "it's too dangerous" bullshit
olbeardGear - 3635 sekunder sedan
[dead]
bellowsgulch - 7207 sekunder sedan
*Anthropic apologizes they got caught defending their moat by implementing invisible Claude Fable guardrails