Google’s Gemini Spent 800 Hours Beating Pokémon And Then It Panicked And Failed

Gemini took 813 hours to complete its first run of Pokémon Blue.

Artificial intelligence (AI) has come a long way, but even advanced systems can struggle sometimes. According to a report from Google DeepMind, their top AI model, Gemini 2.5 Pro, had a tough time while playing the classic video game Pokémon Blue—a game that many kids find easy.

The AI reportedly showed signs of confusion and stress during the game.

The results came from a Twitch channel called Gemini_Plays_Pokemon, where an independent engineer named Joel Zhang tested Gemini. Although Gemini is known for its strong reasoning and coding skills, the way it behaved during the game revealed some surprising and unusual reactions.

The DeepMind team reported that Gemini started showing signs of what they called “Agent Panic." In their findings, they explained, “Throughout the playthrough, Gemini 2.5 Pro gets into various situations which cause the model to simulate ‘panic’. For example, when the Pokémon in the party’s health or power points are low, the model’s thoughts repeatedly reiterate the need to heal the party immediately or escape the current dungeon."

This behaviour caught the attention of viewers on Twitch. People watching the live stream reportedly started recognising the moments when the AI seemed to be panicking.

DeepMind pointed out, “This behaviour has occurred in enough separate instances that the members of the Twitch chat have actively noticed when it is occurring."

Even though AI doesn’t feel stress or emotions like humans do, the way Gemini reacted in tense moments looked very similar to how people respond under pressure—by making quick, sometimes poor or inefficient decisions.

In its first full attempt at playing Pokémon Blue, Gemini took a total of 813 hours to complete the game.

After Joel Zhang made some adjustments, the AI managed to finish a second run in 406.5 hours. However, even with those changes, the time it took was still very slow, especially when compared to how quickly a child could beat the same game.

People on social media didn’t hold back from poking fun at the AI’s nervous playing style. A viewer commented, “If you read its thoughts while it’s reasoning, it seems to panic anytime you slightly change how something is worded."

Another user made a joke by combining “LLM" (large language model) with “anxiety," calling it: “LLANXIETY."

Interestingly, this news comes just a few weeks after Apple shared a study claiming that most AI models don’t actually “reason" in the way people think. According to the study, these models mostly depend on spotting patterns, and they often struggle or fail when the task is changed slightly or made more difficult.