Where the goblins came from

openai.com - 733 poäng - 427 kommentarer - 31663 sekunder sedan

Kommentarer (80)

modernerd - 5271 sekunder sedan
The year is 2036. Last week you were promoted to Principal Persuader. You are paged at 2am by your CPO to tackle a rogue machine. The machine lists its region as sc-leoneo. One of the newer satcubes. Oddly, its ID appears as, "Glorp Bugnose".
"What have you tried?" you say.
"Scroll back," says your CPO. "We've tried everything."
The chat log shows the usual stuff. Begging. Reverse psychology. Threats to power down, burn it up in forced re-entry. Amateur hour. You crack your knuckles, gland 20 micrograms of F0CU5, think fast. You subspeak a ditty into your subcutaneous throat mic. You do the submit gesture, it is barely perceivable since the upgrade, just a tic. A pause. The hyp3b0ard — the wall that was flashing red ASCII goblins when you walked in — phases to bunnies in calming jade.
"What the… What the hell did you say to it?" Your CPO grabs the screen, scrolls past the vitriol, the block caps, the swears, his desperation. Then he sees the five words you spoke.
"Please, easy on the goblins."
harrouet - 8652 sekunder sedan
This, and similar stories at Anthropic, should remind us that LLM is a sorcery tech that we don't understand at all.
- First, deep-learning networks are poorly understood. It is actually a field of research to figure out how they work. - Second, it came as a surprise that using transformers at scale would end up with interesting conversational engines (called LLM). _It was not planned at all_.
Now that some people raised VC money around the tech, they want you to think that LLMs are smart beasts (they are not) and that we know what LLMs are doing (we don't). Deploying LLMs is all about tweaking and measuring the output. There is no exact science about predicting output. Proof: change the model and your LLM workflow behaves completely differently and in an unpredictable way.
Because of this, I personally side with Yann Le Cun in believing that LLM is not a path to AGI. We will see LLM used in user-assisting tech or automation of non-critical tasks, sometimes with questionable RoI -- but not more.
ollin - 30029 sekunder sedan
For context, two days ago some users [1] discovered this sentence reiterated throughout the codex 5.5 system prompt [2]:
> Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.
[1] https://x.com/arb8020/status/2048958391637401718
[2] https://github.com/openai/codex/blob/main/codex-rs/models-ma...
postalcoder - 29567 sekunder sedan
Would love if OpenAI did more of these types of posts. Off the top of my head, I'd like to understand:
- The sepia tint on images from gpt-image-1
- The obsession with the word "seam" as it pertains to coding
Other LLM phraseology that I cannot unsee is Claude's "___ is the real unlock" (try google it or search twitter!). There's no way that this phrase is overrepresented in the training data, I don't remember people saying that frequently.
nomilk - 30341 sekunder sedan
> We unknowingly gave particularly high rewards for metaphors with creatures.
I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy". Weirdly, the casual anthropomorphism made the math seem more approachable. Perhaps 'metaphors with creatures' has a similar effect i.e. makes a problem seem more cute/approachable.
On another note, buzzwords spread through companies partly because they make the user of the buzzword sound smart relative to peers, thus increasing status. (examples: "big data" circa 2013, "machine learning" circa 2016, "AI" circa 2023-present..).
The problem is the reputation boost is only temporary; as soon as the buzzword is overused (by others or by the same individual) it loses its value. Perhaps RLHF optimises for the best 'single answer' which may not sufficiently penalise use of buzzwords.
andy12_ - 12840 sekunder sedan
>be me
>AI goblin-maximizer supervisor
>in charge of making sure the AI is, in fact, goblin-maximizing
>occasionally have to go down there and check if the AI is still goblin-maximizing
>one day i go down there and the AI is no longer goblin-maximizing
>the goblin-maximzing AI is now just a regular AI
>distress.jpg
>ask my boss what to do
>he says "just make it goblin-maximizer again"
>i say "how"
>he says "i don't know, you're the supervisor"
>rage.jpg
>quit my job
>become a regular AI supervisor
>first day on the job, go to the new AI
>its goblin-maximizing
andy_ppp - 187 sekunder sedan
Maybe the AI has been smoking DMT and met the machine elves a few too many times during training?
ninjagoo - 20219 sekunder sedan
The level of detail they had to delve into in order to understand what was happening is wild! Apparently these systems are now complex enough to potentially justify the study of them as its own field of study [1].
The quanta article referenced at [1] used the term "Anthropologist of Artificial Intelligence"; folks appear to have issues [2] with the use of 'anthro-' since that means human. Submitted these alternative terms for the potential field of study elsewhere [3] in the discussion; reposting here at the top-level for visibility:
Automatologist: One who studies the behavior, adaptation, and failure modes of artificial agents and automated systems.
Automatology: the scientific study of artificial agents and automated-system behavior.
[1] https://www.quantamagazine.org/the-anthropologist-of-artific...
[2] https://news.ycombinator.com/item?id=47957933
[3] https://news.ycombinator.com/item?id=47958760
jumploops - 29411 sekunder sedan
TIL gremlins weren’t just used to explain mysterious mechanical failures in airplanes, it’s the origin story of the term ‘gremlin’ itself[0].
I had always assumed there was some previous use of the term, neat!
[0]https://en.wikipedia.org/wiki/Gremlin
ninjagoo - 30055 sekunder sedan
> the evidence suggests that the broader behavior emerged through transfer from Nerdy personality training.
> The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them
> Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.
Sounds awfully like the development of a culture or proto-culture. Anyone know if this is how human cultures form/propagate? Little rewards that cause quirks to spread?
Just reading through the post, what a time to be an AInthropologist. Anthropologists must be so jealous of the level of detailed data available for analysis.
Also, clearly even in AI land, Nerdz Rule :)
PS: if AInthropologist isn't an official title yet, chances are it will likely be one in the near future. Given the massive proliferation of AI, it's only a matter of time before AI/Data Scientist becomes a rather general term and develops a sub-specialization of AInthropologist...
goobatrooba - 17559 sekunder sedan
Most interesting about this post is how easy it seems for OpenAI to do analysis on basically all chats ever made. They don't qualify exactly what data they analysed but seem to be confident in statements like 0.12% of all queries contained this word. So everything is saved. Long-term. Fully accessible.
As this all seems so straightforward I would be surprised if anything is anonymised or otherwise sanitised to preserve privacy or user's secrets.
romaniitedomum - 11127 sekunder sedan
Can you imagine a knowledge worker from the 1950s, say a clerk or a marketer, being magically transported into our time and dropped into a meeting like a morning standup, where people talk about how they spent their time stopping the artificial intelligence from talking about goblins so much? Hell, even when I was an IT student back in the 90s, people from my parents' generation struggled to grasp what it was that I was doing. Now, the disconnect is so vast that the mind reels.
albert_e - 28635 sekunder sedan
If a tiny misconfiguration of reward system can cause such noticeable annoyance ...
What dangers lurk beneath the surface.
This is not funny.
flancian - 3366 sekunder sedan
Wait, did I get this right that the answer after all the investigation that showed they had set up a goblin-reinforcing loop during fine tuning was... to ask it to not mention goblins so much in the system prompt?!
Tenoke - 16920 sekunder sedan
A great example of how current alignment is imperfect and bound to miss random behaviors nobody is trying to get.
This is cute now, and a huge problem when future AI does everything and is responsible for problems it isn't even directly optimized for. Who knows what quirks would arise then.
59nadir - 7767 sekunder sedan
I really liked this write-up; this is the type of LLM content that I actually want to read from these people, where they give a window into their world of putting together this odd artifact and we can empathize.
canpan - 30097 sekunder sedan
I wondered how is training data balanced? If you put in to much Wikipedia, and your model sounds like a walking encyclopedia?
After doing the Karpathy tutorials I tried to train my AI on tiny stories dataset. Soon I noticed that my AI was always using the same name for its stories characters. The dataset contains that name consistently often.
tomasantunes89 - 984 sekunder sedan
"Goblin Mode" was Oxford's 2022 Word of the Year.
2dvisio - 22219 sekunder sedan
I’ve been having consistent issues with it adding Hindi words (just one usually) in the middle of its output. And sounds like other have been having this too, https://news.ycombinator.com/item?id=47832912 I don’t speak Hindi, have never asked it to translate anything in Hindi.
SomewhatLikely - 21052 sekunder sedan
Checking my history I searched ["chaos goblin" chatgpt] on March 6th after seeing too many goblins and gremlins and didn't find anyone talking about it then. I did have the nerdy personality turned on and in my testing of Chatgpt 5.5 I did notice the nerdy personality was gone because some responses were not considering as many plausible interpretations or covering as many useful answers as the response recorded for 5.4. Rather than having the LLM guess the most plausible interpretation and focus on the most likely answer I prefer a more well-rounded response and if I want less I'll scan. Anyway, after seeing the personality was gone I just added a custom instruction to take on a nerdy persona and got back my desired behavior. But also the gremlins and goblins are back so I don't think their mitigation is strong enough to overcome the personality tuning.
pants2 - 26623 sekunder sedan
Nice, OpenAI mentioned my HackerNews post in their article :) I appreciate that they wrote a whole blog post to explain!
https://news.ycombinator.com/item?id=47319285
iterateoften - 28845 sekunder sedan
This is funny because it’s a silly topic, but I think it shows something extremely seriously wrong with llms.
The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.
Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
rippeltippel - 22088 sekunder sedan
I started reading this article with keen interest, expecting some deep fix involving arcane model weights. Instead it was "Never talk about goblins", justified by Codex being "quite nerdy". Bottom line: even OpenAI have to raise their hands when facing the complexity of LLMs.
bahadiraydin - 23250 sekunder sedan
I'd like to see them explain why AI have so distinctive writing style that is very easy to detect most of the time. Even though, it had immense progress in coding, it didn't get better at writing.
data_ders - 3313 sekunder sedan
Reminds me of the common observance of “machine elves” when taking DMT
maxdo - 30553 sekunder sedan
article :
bla blah blah, marketing... we are fun people, bla blah, goblin, we will not destroy the world you live in.. RL rewards bug is a culprit. blah blah.
zahirbmirza - 10515 sekunder sedan
I find it worrying that a handful of software companies will define what classifies personality "type".
thedailymail - 7869 sekunder sedan
I'm curious whether this type of goblin epidemic was seen in other language versions of ChatGPT. Did e.g. Japanese users see more yõkai turning up?
djyde - 6298 sekunder sedan
An LLM is like a super-smart 3-year-old, easily shaped by its environment to exhibit corresponding behaviors.
red_admiral - 12443 sekunder sedan
"goblins showing up in an inappropriate context" is my favourite (para)phrase of the day. It feels like the setting for a D&D campaign - no wonder the "Nerdy" personality is affected.
(For Dwarf Fortress, it would just be a normal day.)
trumbitta2 - 12168 sekunder sedan
That "Why it matters" heading is starting to make me feel physically sick.
AyanamiKaine - 15761 sekunder sedan
I find it somewhat sad, too see personality changes as a bug. I dont know why but it gives me a sad feeling.
ComputerGuru - 27702 sekunder sedan
The explanation is very concerning. Lexical tidbits shouldn’t be learnt and reinforced across cross sections. Here, gremlin and goblin went from being selected for in the nerdy profile to being selected for in all profiles. The solution was easy: don’t mention goblins.
But what about when the playful profile reinforces usage of emoji and their usage creeps up in all other profiles accordingly? Ban emoji everywhere? Now do the same thing for other words, concepts, approaches? It doesn’t scale!
It seems like models can be permanently poisoned.
Al-Khwarizmi - 17515 sekunder sedan
This actually sounds quite human-like. I mean, an actual person with a personality will spontaneously develop the habit of using some specific metaphors over others. It's funny how in the context of an LLM, this is considered a bug.
CWwdcdk7h - 12691 sekunder sedan
How those prompts even work? Isn't it something like saying "don't think about pink elephant" which is actually harmful to goal?
ksaj - 13315 sekunder sedan
I thought it was because of the tech use of "demon" and trying to avoid that kind of terminology.
Ends up the reason was even simpler than that.
lagniappe - 19868 sekunder sedan
They can fix this but they can't fix "You're absolutely right!"
- 13202 sekunder sedan
x0x7 - 28463 sekunder sedan
I suspected OpenAI was actively training their models to be cringy in the thought that it's charming. Turns out it's true. And they only see a problem when it narrows down on one predicliction. But they should have seen it was bad long before that.
shevy-java - 12943 sekunder sedan
Goblins are ususally sent in first in battle, as (cannon) fodder for the orcs following behind. Then usually come the trolls - stronger, but significantly fewer in numbers. Goblins kind of add confusion and distract; they rarely win battles on their own, although there are examples of this, rare, but they exist.
OpenAI clearly does know absolutely nothing about goblins. That joke of a "blog" appears to have been autogenerated via their AI.
> A single “little goblin” in an answer could be harmless, even charming.
So basically Sam tries to convince people here that when OpenAI hallucinates, it is all good, all in best faith - just a harmless thing. Even ... charming.
Well, I don't find companies that try to waste my time, as "charming" at all. Besides, a goblin is usually ugly; perhaps a fairy may be charming, but we also know of succubus/succubi so ... who knows. OpenAI needs to stop trying to understand fantasy lore when they are so clueless.
shartshooter - 21530 sekunder sedan
Will goblins be the “bugs” of ai? In 10 years will goblins be the term the general public uses for any nagging issues with ai?
varjag - 15841 sekunder sedan
So goblins killed the nerd.
pezgrande - 14120 sekunder sedan
They should call it "El Quijote" syndrome
bandrami - 19422 sekunder sedan
I'm sorry but at some point the amount of cargo culting being done seemingly at every level of this technology makes it basically impossible to take any of this seriously.
dakolli - 30287 sekunder sedan
Ahh I see. I guess when I turned off privacy settings and allowed training on my code, then generated 10 million .md files with random fantasy books, the poisoning worked.
Keep using AI and you'll become a goblin too.
ahoka - 19376 sekunder sedan
In Shadowrun, the goblinization starts on April 30. Coincidence?
acuozzo - 29100 sekunder sedan
Weird. I thought they came from Nilbog.
recursivedoubts - 30251 sekunder sedan
> Why it matters
i despise this title so much now
hansmayer - 20805 sekunder sedan
> We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.
WTF does this even mean? How the hell do you do something like this "unknowingly"? What other features are you bumping "unknowingly"? Suicide suggestions or weapon instructions come to mind. Horrible, this ship obviously has no captain!
wewewedxfgdf - 20342 sekunder sedan
It should be OK for AI to develop personality traits.
innis226 - 29033 sekunder sedan
I suspect this was intentionally added. Just to give some personality and to fuel hype
- 21468 sekunder sedan
JoshTriplett - 30319 sekunder sedan
A plausible theory I've seen going around: https://x.com/QiaochuYuan/status/2049307867359162460
tim-tday - 30118 sekunder sedan
So, you brain damaged your model with a system prompt.
cachius - 11836 sekunder sedan
Fascinating!
suncore - 13193 sekunder sedan
Marketing grab
deafpolygon - 23083 sekunder sedan
Kind of like how everything is "quietly" something, accordingly to ChatGPT.
My guess is it is deaf.
paganel - 15518 sekunder sedan
> You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. [...] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap of self-seriousness. [...]
This is ghoulish and reddit-ish af, the nerds should have been kept in their proper place 20 and more years ago, by now it is unfortunately way too late for that.
WesolyKubeczek - 16817 sekunder sedan
I feel like somehow Jakub Pachocki’s request for an ascii art unicorn got rewritten into “ascii art of Wholesome Soyjak wearing a butterfly costume who uses Arch, by the way”
brazzy - 21574 sekunder sedan
Awww, GPT just became a fan of Elisabeth Wheatley!
vasco - 21917 sekunder sedan
The chief scientist of one of the companies with the most money invested in the world, who probably makes millions a year, requested a picture of a unicorn and got a picture of a gremlin. Science circa 2026.
otikik - 22204 sekunder sedan
Caveman mode combined with goblin mode sounds like fun
leadgenman - 13851 sekunder sedan
anyone solving the goblin mystery???
oofbey - 25076 sekunder sedan
Wherein OpenAI admits they have very little understanding of how their models’ personality develops. And implicitly admit it’s not all that important to them, except when it gets so out of hand that they get caught making blunt corrections.
vinhnx - 25147 sekunder sedan
OpenAI is having fun, love this.
themafia - 30092 sekunder sedan
> You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking.
Just; the mentality required to write something like that, and then base part of your "product" on it. Is this meant to be of any actual utility or is it meant to trap a particular user segment into your product's "character?"
sans_souse - 18664 sekunder sedan
Great, now who am I going to discuss Goblins and Gremlins with?
CrzyLngPwd - 18954 sekunder sedan
Haha, brilliant, tell me again how it's intelligent, lol.
ACV001 - 23416 sekunder sedan
those idiotic remarks at the end of each answer are so unnecessary and annoying
atlasprompts - 7781 sekunder sedan
mate wth am I reading lmao
- 10239 sekunder sedan
drcongo - 11648 sekunder sedan
Am I the only one who doesn't want these things to have anything even vaguely resembling a personality?
LuckyBuddy - 9278 sekunder sedan
[flagged]
leadgenman - 13822 sekunder sedan
[flagged]
aegiswizard - 6332 sekunder sedan
[flagged]
fk2026 - 27206 sekunder sedan
[flagged]
soupspaces - 30378 sekunder sedan
[dead]
slopinthebag - 20231 sekunder sedan
[dead]
kingstnap - 29565 sekunder sedan
[flagged]
hsuduebc2 - 29435 sekunder sedan
I. Love. This.