In a move that should make the entire robotics industry sit up and choke on its morning brew, Ant Group—the fintech powerhouse behind Alibaba—has just unleashed a comprehensive foundational stack for embodied intelligence upon an unsuspecting world. The kicker? It’s all been released as open-source under the remarkably permissive Apache 2.0 license. This isn’t just another incremental update; it’s a triple-threat of perception, action, and imagination, engineered to serve as the universal brain for the next generation of robots.
While the rest of the tech world was busy watching humanoid robots pull off backflips for the cameras, Ant Group’s Robbyant unit was quietly perfecting the software that will actually make them useful in the wild. They’ve debuted three interconnected foundation models under the LingBot banner, specifically targeting the messy, unpredictable reality of making robots see, act, and plan ahead. It’s a bold, strategic play that signals a shift away from bespoke robot brains towards a standardised, Android-like platform for the entire industry.
The Three-Course Meal for Embodied AI
Ant Group has structured this release as a complete toolkit for embodied intelligence, spanning what it defines as perception, action, and imagination. It’s a holistic approach that manages the entire pipeline, from sensing the environment to physical interaction.
First up is LingBot-Depth, a model dedicated to spatial perception. Then there’s LingBot-VLA, a Vision-Language-Action model that translates high-level commands into physical movement. Finally, we have the pièce de résistance: LingBot-World, an interactive world model that simulates reality for training and strategic planning. Together, they represent a serious attempt to crack the embodied AI nut from end to end.
LingBot-VLA: A Brain Trained on 2.2 Years of Reality
The real showstopper here is LingBot-VLA, and with good reason. It has been trained on a staggering 20,000 hours of real-world robot data. To put that in perspective, that’s over 2.2 years of a robot working non-stop, learning from its mistakes and figuring out the nuances of physical reality. This isn’t synthetic simulation; this is hard-won, real-world experience.
This massive dataset was harvested from nine different popular dual-arm robot configurations, which is vital for generalisation. The ultimate goal of a VLA is to create a “universal brain” capable of operating various robot types without the need for expensive retraining for every new piece of hardware. Ant Group claims LingBot-VLA can be adapted for single-arm, dual-arm, and even humanoid platforms—a feat that has long been a sticking point in the field.
The results are impressive. On the GM-100 real-robot benchmark, LingBot-VLA outpaced its rivals, particularly when paired with its sibling, LingBot-Depth, to sharpen its spatial awareness. It also boasted training speeds 1.5 to 2.8 times faster than existing frameworks—a crucial advantage for developers working with limited compute.
A Mind’s Eye and a Digital Sandbox
Navigating the world is half the battle, and that’s where LingBot-Depth earns its keep. It’s a foundation model designed to generate metric-accurate 3D perception from noisy or incomplete sensor data. Remarkably, it can function with less than 5% of the depth information usually required, tackling the kind of reflective surfaces and transparent objects that typically leave standard sensors in a muddle. This is exactly the kind of robust perception needed for a robot to survive outside the sterile confines of a lab.
But perhaps the most ambitious part of this release is LingBot-World. This is an interactive world model that acts as a “digital sandbox” for AI. It can generate nearly 10 minutes of stable, controllable, physics-grounded simulation in real-time. This directly addresses the “long-term drift” issue that plagues most video generation models, where a scene tends to dissolve into a fever dream after just a few seconds.
Even more brilliantly, LingBot-World is fully interactive. Running at roughly 16 frames per second with sub-second latency, it allows users to control characters or alter the environment via text prompts with instant feedback. It also features zero-shot generalisation: show it a single photo of a real-world location, and it can spin up a fully interactive world based on that image without any specific training.
The Android Strategy for Robotics
Why is a fintech giant pouring resources into giving away robot brains for free? The answer lies with its affiliate, Alibaba. As a titan of e-commerce and logistics, Alibaba stands to gain immensely from widespread, affordable, and intelligent automation. By open-sourcing the foundational layer under a permissive license, Ant Group is essentially inviting the world to build the future of robotics on its architecture. It’s a classic ecosystem play.
This release on Hugging Face isn’t just a simple code dump; it’s a production-ready codebase complete with tools for data processing, fine-tuning, and evaluation. Ant Group isn’t just handing out the blueprints; they’re providing the entire factory floor.
While competitors are guarding their models behind closed APIs or restrictive licenses, Ant Group’s decision to go fully open could be the catalyst for a Cambrian explosion in robotic innovation. The race is no longer just about who has the cleverest AI, but who can build the most vibrant and productive ecosystem around it. With the LingBot trilogy, Ant Group has just made a very loud opening statement.













