Search
  • en
  • es
  • en
    Search
    Open menu Open menu

    Gemini 2.0 is here and promises to be able to do (almost) anything

    Intro

    Google has used the last days of the year to launch its most
    anticipated artificial intelligence model: Gemini 2.0. This is its
    next-generation AI model, which promises to be a big step forward in terms of
    intelligence and capabilities.

    If the previous model focused on multimodality, version 2.0 is
    based on AI agents, which are able to
    act more autonomously and solve complex problems with less human
    intervention. With this, Google is at the forefront of the race for the most
    advanced AI models on the market. Here are all the details!

    Gemini 2.0 Introduction

    On the occasion of the launch of Gemini 2.0, Sundar Pichai, CEO
    of Google and Alphabet, shared the following: “Information is at the heart of
    human progress. That’s why for more than 26 years we’ve been focused on our
    mission to organize the world’s information and make it accessible and
    useful. And that’s why we continue to push the boundaries of AI to organize
    that information at every input and make it accessible through any output so
    that it can be truly useful to you. (…) Today, millions of developers are
    developing with Gemini, which helps us reinvent all of our products
    (including the 7 that have 2 billion users) and create new ones. In the last
    year, we have invested in developing more agile models, i.e. able to better
    understand the world around you, anticipate and act on your behalf, under
    your supervision. Today we are excited to launch our next era of models
    designed for this new era of agents: we are introducing Gemini 2.0, our most
    capable model yet. With new advances in multimodality
    (such as native audio and image output) and use of native tools, it will
    allow us to create new AI agents that bring us closer to our vision of a
    universal assistant
     (…)”.

    In this video you can see a summary of the new capabilities of the
    model:

    Gemini 2.0 Flash

    The first model released by the company is Gemini 2.0 Flash, the
    smaller and less powerful model, although even better than the current Pro
    model. According to Demis Hassabis, CEO of Google
    DeepMind, this model is more versatile and capable than
    previous models and can generate multilingual images and audio
    natively:
     “Flash even outperforms 1.5 Pro in key
    benchmarks, with twice the speed and also comes with new capabilities. In
    addition to supporting multimodal inputs such as images, video, and audio,
    Flash 2.0 now supports multimodal outputs, such as natively generated images
    mixed with text and multilingual audio synthesized from text (TTS). It is
    also natively integrated with tools such as Google Search or code execution,
    as well as third-party user-defined functions.”

    This model is already available as an experimental model via the
    Gemini API, with multimodal input and text output, native text-to-speech
    conversion, and image generation.

    It will be widely available in January, along with more model
    sizes.

    AI agents for Gemini 2.0

    The biggest new feature of Gemini 2.0 lies in the AI agents. It
    now includes native UI action capabilities, along with
    other enhancements such as multimodal reasoning,
    understanding long contexts, tracking and planning complex instructions,
    calling composite functions, using native tools, and improving
    latency
    .

    These AI agents will have a major influence over the next few
    years, and Google is exploring this field with several prototypes that can
    help people perform tasks like never before.

    It is still in its early stages of development, but one example is
    the updated Project Astra, a prototype that explores the future capabilities
    of a universal AI assistant.

    We also find Project Mariner, which explores the future of
    human-agent interaction, starting with the browser. Or Jules, an AI-powered
    code agent that helps developers in their tasks, integrated directly into a
    GitHub workflow.

    Astra Project

    A few months ago, Google launched this project, which they
    presented as an evolution of virtual assistants, and which
    can analyze our environment for numerous actions
    , such as
    finding lost objects or describing situations.

    With the arrival of Gemini 2.0, Project Astra has also been
    improved:

    • Improved dialogues: you now have the
      ability to converse in several languages, as well as a better understanding
      of accents or less common words.
    • New use of tools: you can now
      use the search engine, Google Lens or Maps.
    • Improved memory: you now have
      up to 10 minutes of memory during the session and can remember other
      conversations you have had with him in the past thanks to his
      personalization.
    • Improved latency: thanks to new
      streaming features and native audio understanding, the AI agent can understand
      language with a latency similar to that of a human
      conversation.

    Mariner Project

    As briefly mentioned above, Project Mariner is a research
    prototype built with Gemini 2.0 that explores the future
    of human-agent interaction
    .

    It is able to understand and reason, through browser screen
    information, about pixels, text, code, images, or forms, and then use this
    information through a Chrome extension that completes the tasks for
    you.

    It’s still at an early stage, but the results are looking very
    promising.

    Here comes the challenge of building it
    securely and responsibly
    , so it can only type, scroll, or
    click on the active browser tab and ask the user for a final confirmation
    before performing certain sensitive actions.

    https://youtube.com/watch?v=IDpo7pC1P10%3Ffeature%3Doembed

    With all these advances, Google and DeepMind have also emphasized
    their commitment to security and accountability when
    developing AI agents.
     As such, they are taking an
    explorative and incremental approach to product development, testing multiple
    prototypes, insisting on security integration and training, working with
    trusted testers and external experts, and conducting thorough risk and
    security and assurance assessments.

    Without a doubt, Gemini 2.0 and the new prototypes open a great
    door to a new generation of smarter and more autonomous AI models, and one
    that we look forward to exploring and discovering. We will be sharing demos
    using this new version very soon.

    Alex Amigo

    Digital Marketing Manager