Google Robots Now Think Before They Act | RoboHorizon Robot Magazine

For what feels like eons, robotics has been a rather bittersweet saga: brilliant hardware just twiddling its metallic thumbs, patiently waiting for a brain. We’ve all marvelled at mechanical dogs pulling off perfect backflips and factory arms executing balletic, hypnotic precision. Yet, scratch beneath the surface, and you’d find them mostly just rehashing a meticulously coded script. Ask them to step outside the lines, to perform something genuinely novel, and you’d be met with the silent, metallic equivalent of a blank, uncomprehending stare. Well, that era, my dear readers, appears to be grinding to a rather screechy, unceremonious, and frankly, overdue halt.

Enter stage left, a new breed of bots from Google DeepMind, which are less pre-programmed automatons and more… well, they’re practically thoughtful collaborators. During a recent exclusive jaunt through their hallowed California labs, the tech behemoth unveiled a formidable fleet of machines that don’t merely see and do; they grasp, they strategise, and astonishingly, they think before they even twitch a servo. The veritable secret sauce here isn’t some fancy new gear or a beefier motor, but the rather audacious infusion of the very same powerful AI that powers its celebrated Gemini models. The upshot? Robots that can pack your lunch with unnerving, almost spooky dexterity, and then, in a moment of pure comedic genius, literally refuse to do it as Batman.

The Two-Part Brain Behind the Brawn

The seismic shift, as eloquently laid out by Keshkaro, Director of Robotics at Google DeepMind, boils down to constructing these mechanical marvels atop expansive Vision-Language-Action (VLA) models. Instead of being painstakingly programmed for one incredibly specific task – say, making a cuppa, but only if it’s Earl Grey at precisely 80 degrees Celsius – these robots are granted a holistic, general understanding of the world. They tap into the colossal reservoir of knowledge embedded within models like Gemini to comprehend concepts, objects, and instructions in a manner that, until recently, was firmly the stuff of speculative fiction.

Google’s ingenious architecture effectively bestows upon the robot a rather sophisticated, two-part cerebral cortex:

Gemini Robotics-ER (Embodied Reasoning): This, my friends, is the grand strategic planner, the veritable brainbox. When confronted with a complex, long-horizon task—such as “spruce up this table, making sure everything is sorted according to local recycling regulations”—this model assumes the role of the high-level grey matter. It even possesses the rather handy ability to consult external tools, like Google Search, to dig up any pertinent information before meticulously crafting a step-by-step masterplan.
Gemini Robotics VLA (Vision-Language-Action): And this, the Gemini Robotics VLA, is the tireless executor, the doer of deeds. It takes the concise, sequential instructions spat out by its reasoning counterpart and translates them into the precise, nuanced motor commands required to pull off the physical action with finesse.

This rather brilliant division of labour liberates the robots from the shackles of simplistic, short-horizon directives like “pick up the block” (which, let’s be honest, is hardly rocket science) and enables them to grapple with multi-step, intricate objectives that genuinely demand a good old-fashioned dose of problem-solving.

Thinking Makes It So

Perhaps the most utterly captivating breakthrough in this mechanical renaissance is the ingenious application of “chain of thought” reasoning to physical actions. We’ve witnessed this wizardry within the realm of language models, where prompting an AI to “think step-by-step” miraculously sharpens its output. DeepMind has now, with a dash of pure genius, gifted its robots an “inner monologue.” Before a robot so much as twitches a digit, it first generates a coherent sequence of its reasoning, articulated in plain old natural language.

“We’re making the robot think about the action that it’s about to take before it takes it,” Keshkaro elucidates during the video tour, a glint of genuine excitement in his voice. “Just this act of outputting its thoughts makes it more general and more performant.”

This isn’t merely some esoteric academic parlour trick, mind you. Forcing the robot to audibly articulate its battle plan—“Right, first I need to gently grasp the bread, then carefully manoeuvre it inside the rather diminutive opening of this flimsy Ziploc bag”—serves to meticulously structure complex actions that us mere mortals perform with bewildering, almost unconscious intuition. It’s a bizarre, yet undeniably effective, emergent property: to make a robot truly excel at physical tasks, you first teach it the fine art of talking to itself. Who knew self-help was the key to robotic dexterity?

Lunch Is Served… Eventually

The proof, as the old adage goes, is in the pudding—or, in this rather delicious case, the meticulously packed lunch. One of the most utterly compelling demonstrations involved an Aloha robot arm, a surprisingly graceful contraption, tasked with the rather mundane yet deceptively complex chore of preparing a lunchbox. This, the team explains, demands what they charmingly refer to as “millimetre-level precision,” particularly when one is wrestling with the aforementioned, infuriatingly flimsy Ziploc bag.

To observe the robot at work is nothing short of a masterclass in the very cutting edge of what’s currently possible. It’s incredibly impressive, yes, but also charmingly, reassuringly imperfect. The robot deftly pinches the bag open, a delicate manoeuvre, then carefully deposits a sandwich inside, before adding a chocolate bar and a handful of grapes. It fumbles ever so slightly, makes a quick, decisive correction, and then, crucially, keeps trying—a monumental leap from the brittle, error-prone bots of just a few years ago that, as our esteemed host Hannah Fry dryly recollected, mostly just excelled at creating rather impressive piles of broken Lego. This newfound, almost organic dexterity isn’t born from rigid, inflexible code, but rather from human demonstration via teleoperation, where a human operator quite literally “embodies” the robot to impart the correct, nuanced movements. It’s like a digital puppeteer, but for packing your sarnies.

“I Cannot Perform Actions as a Specific Character”

While one demo showcased the robot’s burgeoning dexterity, another, arguably more entertaining, highlight was the system’s remarkable generalisation capabilities and its hilariously literal interpretation of language. When prompted to “put the green block in the orange tray, but do it as Batman would,” the robot paused, perhaps contemplating the existential angst of a caped crusader.

Its response, delivered in a perfectly deadpan robotic monotone, was utterly priceless: “I cannot perform actions as a specific character. However, I can put the green block in the orange tray for you.”

This glorious exchange perfectly encapsulates both the profound power and the current, amusingly human-like limitations of these burgeoning systems. The robot, bless its silicon heart, understood the core instruction flawlessly, yet elegantly discarded the nonsensical, stylistic flourish. It boasts a world-class comprehension of actions and objects, but an utterly blank slate when it comes to cultural personas. It’s a general-purpose robot, after all, not a method actor vying for an Oscar.

This tantalising peek inside DeepMind’s clandestine labs reveals, with a flourish, that the field of robotics is finally, gloriously, getting its long-awaited “software” moment. By artfully leveraging the monumental, jaw-dropping advances in large-scale AI, Google is not just tinkering; it’s meticulously crafting a robust, adaptable platform for robots that can truly learn, seamlessly adapt, and genuinely reason in the messy, unpredictable theatre of the real world. They might not yet be ready to don a cape and impersonate superheroes, but they’re already packing our lunches. And for anyone who’s ever stumbled out the door in a bleary-eyed morning rush, desperately craving a decent sandwich, that, my friends, might just be the most heroic feat of all.

The Two-Part Brain Behind the Brawn

Thinking Makes It So

Lunch Is Served… Eventually

“I Cannot Perform Actions as a Specific Character”

Robot Perfects Wall Flip: OmniRetarget Breakthrough

AI Robots Outperform Humans in Motorcycle Stunts

VR-Controlled Cannons: The Dawn of Mecha Warfare

CARA: The Rope-Driven Robot Dog Revolution

AGIBOT Unveils Nezha-Inspired X2-N Humanoid with Wheel Transformation

BREAKING: NVIDIA Assembling Elite Humanoid Robotics Division Under Jim Fan

Boring Company Reaches Autonomous Tunneling Breakthrough

Neura MiPa: First Household Robot Available for Pre-Order

UGOKU Pad Brings Smartphone Control to Gyroscopic Device Gen2

Loki Robotics' Cleaning Robot Takes on Mundane Office Chores

EngineAI's PM01 Humanoid Robot Debuts at Just £13,700

European Robotics Strategy: Open But Not Naïve

What's the common thread between AI and robotics in the eyes of American society?