Javier Cantón
Plain Concepts CTIO
In March 2025, Google DeepMind introduced Gemini Robotics, a groundbreaking technology set to revolutionize how robots interact with humans in both industrial and domestic environments.
Until now, robots commonly used in factories have been designed with a primary focus on task efficiency, executing specific jobs as quickly and precisely as possible. These machines operate much like the mechanical components of a car, where every action is carefully timed and optimized for efficiency. However, traditional industrial robots assume a static environment, meaning they do not monitor or adapt to changes around them. They are unable to detect obstacles, such as a person crossing their path, which is why they are typically enclosed within safety cages to prevent accidents.
Gemini Robotics aims to change this paradigm by integrating advanced AI, enabling robots to perceive, adapt, and interact dynamically with their surroundings, making them safer and more versatile for real-world applications.
However, the nature of work is changing rapidly. For example, in the automotive industry, vehicle models are evolving in increasingly shorter cycles. This means that production chains must adapt quickly, making highly specialized machines less cost-effective in the long run.
Additionally, challenges arise when robots need to share a workspace with other similar robots. When relying on a basic approach based on predefined task lists and rigid workarounds, coordination and efficiency can become major obstacles.
In a factory, machines are not the only ones at work. Not all tasks can be fully automated due to cost constraints or the need for flexibility. This is where the concept of Cobots (Collaborative Robots) comes into play. A Cobot is a type of robot specifically designed to work alongside humans in a shared workspace, rather than operating autonomously or in isolation like traditional industrial robots.
However, designing Cobots presents new challenges, particularly in ensuring human safety. These robots must be capable of detecting collisions with both humans and other machines within their environment. As a result, they need to dynamically adjust their movements based on real-time conditions. For example, it is common for a Cobot to reduce its working speed when a human approaches too closely, minimizing the risk of accidental contact.
Google DeepMind aims to leverage its most advanced AI models, such as Gemini 2.0, to help robots better understand the physical world. The goal is to develop generalist robots capable of executing various tasks with the same programming while ensuring safety when working alongside humans in dynamic environments.
According to DeepMind, Gemini Robotics has been tested on a wide range of tasks and has demonstrated the ability to tackle challenges it had never encountered during training. For instance, previous robots AI trained only to stack blocks would struggle if asked to arrange items in a fridge. In contrast, Gemini Robotics harnesses the broad reasoning capabilities of Gemini 2.0, enabling it to process novel instructions. In technical evaluations, it more than doubled performance on a comprehensive generalization benchmark, surpassing other state-of-the-art models in adapting to new situations.
Another key differentiator is real-time interactivity. Built on a powerful language model, Gemini can understand instructions given in everyday language and even follow along in a conversation. If a user interrupts a robot mid-task and says, “Actually, place that item on the top shelf instead,” the Gemini system can adjust on the fly. It continuously monitors both its environment and instructions, ensuring it doesn’t blindly execute a plan if conditions change.
Earlier robots were often rigid once a task began, any unexpected change could cause failure (for example, a cleaning robot might repeatedly bump into a chair that had been moved after it mapped the room). In contrast, Gemini’s AI brings a human-like adaptability, it is always “thinking” and re-planning when necessary. This adaptability is possible because the model doesn’t just react reflexively; it actively reasons through situations, thanks to Gemini 2.0’s deep contextual and intent-based understanding.
In recent years, AI models have evolved from simply processing text inputs and generating text-based responses to more advanced architectures capable of handling multiple types of inputs and outputs within the same model.
Google DeepMind has built upon this evolution by using Gemini 2.0 as the foundation for a new AI model that can process various types of input data, including text (natural language), images, audio, and video. This model goes beyond traditional AI by generating action outputs that can be executed directly by a robot. It is a Vision-Language-Action (VLA) model, serving as the “brain” for robots and enabling them to interpret complex commands and perform tasks in human environments.
A crucial innovation in this system is the integration of an intermediate reasoning layer between input and output. This layer is designed to analyze physical space and enforce safety protocols, ensuring that every action is evaluated in real-time before execution. The most groundbreaking aspect of this technology is that its outputs are generated as a continuous stream, dynamically adjusting based on real-time input data.
This concept is incredibly powerful and represents the key breakthrough behind the success of this new technology, allowing robots to adapt on the fly and operate more safely and efficiently in unpredictable environments.
Google DeepMind highlights three core capabilities that define the advancements in Gemini Robotics: generality, interactivity, and dexterity.
Generality refers to the ability of a robot to adapt to new and unforeseen situations. Gemini Robotics leverages the extensive world knowledge embedded within the Gemini model to handle novel objects, diverse instructions, and unfamiliar environments. This capability is crucial for robots to move beyond highly specific, pre-programmed tasks and operate effectively in the dynamic real world. Google reports that Gemini Robotics demonstrates a significant improvement in this area, more than doubling the performance on a comprehensive generalization benchmark compared to other leading vision-language-action models. This focus on generality indicates a broader trend in robotics towards creating more versatile machines. Unlike traditional industrial robots designed for very specific and repetitive actions, Gemini Robotics aims to enable robots that can be more readily adapted and deployed across a wider variety of tasks and settings.
Interactivity describes the robot’s ability to understand and respond to commands and changes in its environment in a seamless and intuitive way. Gemini Robotics can understand and respond to everyday, conversational language and react to sudden changes in instructions or its surroundings, often continuing tasks without needing further input. This includes the ability to understand and respond to natural language instructions in multiple languages. Furthermore, if a robot happens to drop an object or if someone in the environment moves something, the system can replan its actions and adjust accordingly without requiring explicit reprogramming. This level of real-time adaptability is crucial for robots to be truly useful in dynamic, human-centric environments. The advanced language understanding capabilities derived from Gemini 2.0 directly contribute to this seamless interaction. Instead of requiring users to learn specific robotic commands, they can communicate with Gemini-powered robots using natural language, making the technology more accessible and fostering more intuitive human-robot collaboration.
Dexterity refers to the robot’s ability to perform complex tasks that require fine motor skills and precise manipulation. Gemini Robotics demonstrates significant advancements in this area, enabling robots to perform tasks such as folding origami, packing a lunch box, or preparing a salad. Demonstrations of this capability include robots picking fruits and snacks, placing glasses in cases, tying shoelaces, and even attempting to slam dunk a basketball. Many everyday tasks that humans perform effortlessly rely on a high degree of dexterity, and progress in this area significantly expands the potential utility of robots in real-world scenarios. While robots have traditionally excelled at tasks involving large, repetitive movements, fine manipulation has been a persistent challenge. Gemini Robotics’ advancements in dexterity open possibilities for robots to assist in more nuanced and human-oriented tasks.
Google DeepMind has introduced two AI models under the Gemini Robotics initiative:
Comparison table:
The advancements brought by Gemini Robotics open a vast range of real-world applications across multiple industries. These include the development of more capable general-purpose robots and next-generation humanoid robots designed to assist in homes, workplaces, and beyond.
A key collaboration in this effort is Google DeepMind’s partnership with Apptronik, a robotics company, to integrate Gemini Robotics into their Apollo humanoid robot for logistics automation. This partnership highlights the practical implementation of Gemini Robotics in advancing humanoid robots for real-world tasks.
Furthermore, Gemini Robotics-ER is currently being evaluated by a select group of trusted partners, including Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools. This strong industry interest underscores the technology’s potential and its validation by leading robotics companies.
The potential applications span a broad spectrum of tasks, from everyday household chores like meal preparation to complex industrial operations such as warehouse automation. Additionally, Gemini Robotics could play a crucial role in elder care and medical assistance, providing support for healthcare professionals.
These collaborations between Google DeepMind and various robotics companies are crucial for translating cutting-edge AI research into practical, real-world solutions. They also facilitate continuous improvement by gathering valuable feedback to further refine and enhance the technology.
Gemini Robotics has made a significant impact by demonstrating that a single AI model can equip robots with a wide range of capabilities from understanding human commands to adapting to new tasks and manipulating objects with precision. Unlike previous approaches, Gemini Robotics is designed to be more general, integrated, and adaptable, introducing groundbreaking technologies that could shape the future of robotics AI.
The potential applications are vast, spanning business automation, industrial efficiency, and personal assistance in daily life. However, transforming this prototype into a widely adopted reality will require overcoming challenges in safety, business integration, and ethical considerations. The coming years will serve as a crucial testing phase for Gemini Robotics, determining whether it can successfully transition from an experimental breakthrough to a mainstream solution.
If all goes well, this moment could be remembered as the turning point when robots moved beyond the assembly line and began seamlessly assisting in the real world, a world they can finally understand. With Gemini Robotics, the vision of intelligent, helpful robots is no longer confined to science fiction but is becoming a tangible reality, ushering in a new era where AI and robotics work together to enhance human potential.
– Gemini Robotics – Google DeepMind
– Gemini Robotics brings AI into the physical world – Google DeepMind
– storage.googleapis.com/deepmind-media/gemini-robotics/gemini_robotics_report.pdf
Javier Cantón
Plain Concepts CTIO
Cookie | Duration | Description |
---|---|---|
__cfduid | 1 year | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 29 days 23 hours 59 minutes | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 1 year | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 29 days 23 hours 59 minutes | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
attributionCookie | session | No description |
cookielawinfo-checkbox-analytics | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category . |
cookielawinfo-checkbox-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-performance | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance". |
cppro-ft | 1 year | No description |
cppro-ft | 7 years 1 months 12 days 23 hours 59 minutes | No description |
cppro-ft | 7 years 1 months 12 days 23 hours 59 minutes | No description |
cppro-ft | 1 year | No description |
cppro-ft-style | 1 year | No description |
cppro-ft-style | 1 year | No description |
cppro-ft-style | session | No description |
cppro-ft-style | session | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 1 year | No description |
i18n | 10 years | No description available. |
IE-jwt | 62 years 6 months 9 days 9 hours | No description |
IE-LANG_CODE | 62 years 6 months 9 days 9 hours | No description |
IE-set_country | 62 years 6 months 9 days 9 hours | No description |
JSESSIONID | session | The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 1 year | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 1 year | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
VISITOR_INFO1_LIVE | 5 months 27 days | A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface. |
wmc | 9 years 11 months 30 days 11 hours 59 minutes | No description |
Cookie | Duration | Description |
---|---|---|
__cf_bm | 30 minutes | This cookie, set by Cloudflare, is used to support Cloudflare Bot Management. |
sp_landing | 1 day | The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content. |
sp_t | 1 year | The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content. |
Cookie | Duration | Description |
---|---|---|
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 1 year | No description |
_hjid | 11 months 29 days 23 hours 59 minutes | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 11 months 29 days 23 hours 59 minutes | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 1 year | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 1 year | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjSession_1776154 | session | No description |
_hjSessionUser_1776154 | session | No description |
_hjTLDTest | 1 year | No description |
_hjTLDTest | 1 year | No description |
_hjTLDTest | session | No description |
_hjTLDTest | session | No description |
_lfa_test_cookie_stored | past | No description |
Cookie | Duration | Description |
---|---|---|
loglevel | never | No description available. |
prism_90878714 | 1 month | No description |
redirectFacebook | 2 minutes | No description |
YSC | session | YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages. |
yt-remote-connected-devices | never | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt-remote-device-id | never | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt.innertube::nextId | never | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | never | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |