Google strikes again at OpenAI with “Project Astra” AI agent prototype


A video still of Project Astra demo at the Google I/O conference keynote in Mountain View on May 14, 2024.
Enlarge / A video nonetheless of Project Astra demo on the Google I/O convention keynote in Mountain View on May 14, 2024.

Google

Just sooner or later after OpenAI revealed GPT-4o, which it payments as with the ability to perceive what’s going down in a video feed and converse about it, Google introduced Project Astra, a analysis prototype that options related video comprehension capabilities. It was introduced by Google DeepMind CEO Demis Hassabis on Tuesday on the Google I/O convention keynote in Mountain View, California.

Hassabis referred to as Astra “a common agent useful in on a regular basis life.” During an illustration, the analysis mannequin showcased its capabilities by figuring out sound-producing objects, offering inventive alliterations, explaining code on a monitor, and finding misplaced objects. The AI assistant additionally exhibited its potential in wearable gadgets, corresponding to sensible glasses, the place it might analyze diagrams, counsel enhancements, and generate witty responses to visible prompts.

Google says that Astra makes use of the digital camera and microphone on a consumer’s system to supply help in on a regular basis life. By repeatedly processing and encoding video frames and speech enter, Astra creates a timeline of occasions and caches the data for fast recall. The firm says that this allows the AI to establish objects, reply questions, and keep in mind issues it has seen which are not within the digital camera’s body.

Project Astra: Google’s imaginative and prescient for the way forward for AI assistants.

While Project Astra stays an early-stage characteristic with no particular launch plans, Google has hinted that a few of these capabilities could also be built-in into merchandise just like the Gemini app later this yr (in a characteristic referred to as “Gemini Live”), marking a big step ahead within the growth of useful AI assistants. It’s a stab at creating an agent with “company” that may “suppose forward, cause and plan in your behalf,” within the phrases of Google CEO Sundar Pichai.

Elsewhere in Google AI: 2 million tokens

During Google I/O, the corporate unveiled a lot of AI-related bulletins, a few of which we could cowl in separate posts sooner or later. But for now, this is a fast overview.

At the highest of the keynote, Pichai talked about an “improved” model of February’s Gemini 1.5 Pro (identical model quantity, oddly) that’s coming quickly. It will characteristic a 2 million-token context window, which implies it could possibly course of massive numbers of paperwork or lengthy stretches of encoded movies without delay. Tokens are fragments of information that AI language fashions use to course of data, and the context window determines the utmost variety of tokens an AI mannequin can course of without delay. Currently, 1.5 Pro tops out at 1 million tokens (OpenAI’s GPT-4 Turbo has a 128,000 token window for comparability).

We requested AI researcher Simon Willison—who doesn’t work for Google however was featured in a promo video through the keynote—what he considered the context window announcement. “Two million tokens is thrilling,” he replied by way of textual content whereas sitting within the keynote viewers. “But it is value maintaining worth in thoughts that $7 per million tokens means a single immediate might price you $14!” Google expenses $7 per million enter tokens for 1.5 on prompts longer than 150,000 tokens by means of its API.

During the Google I/O 2024 keynote, Google said Gemini Advanced has the
Enlarge / During the Google I/O 2024 keynote, Google stated Gemini Advanced has the “longest context window on the planet” at 1 million tokens—quickly to be 2 million.

Google

Speaking of tokens, Google introduced that its beforehand introduced 1 million token context window for Gemini 1.5 Pro is lastly coming to Gemini Advanced subscribers. Previously, it was solely accessible within the API.

Google additionally introduced a brand new AI mannequin referred to as Gemini 1.5 Flash, which it billed as a light-weight, quicker, and cheaper model of Gemini 1.5. “1.5 Flash is the most recent addition to the Gemini mannequin household and the quickest Gemini mannequin served within the API. It’s optimized for high-volume, high-frequency duties at scale,” says Google.

Willison had a touch upon Flash as effectively: “The new Gemini Flash mannequin is promising there, it is meant to supply as much as 2m tokens at a lower cost.” Flash prices $0.35 per million tokens on prompts as much as 128,000 tokens and $0.70 per million tokens for prompts longer than 128,000. It’s one-tenth the worth of 1.5 Pro.

“35 cents per million tokens! That’s the most important information of the day, IMO,” Willison advised us.

Google additionally introduced Gems, which seems to be its tackle OpenAI’s GPTs. Gems are customized roles for the Google Gemini chatbot that can play a component that you simply outline, permitting you to personalize Gemini in several methods. Google lists examples of potential Gems as “a gymnasium buddy, sous chef, coding associate or inventive writing information.”

New generative AI fashions

A screenshot of the Google Imagen 3 website.
Enlarge / A screenshot of the Google Imagen 3 web site.

Google

Also on the Google I/O keynote on Tuesday, Google introduced a number of new generative AI fashions for creating photos, audio, and video. Imagen 3 is the most recent in its line of picture synthesis fashions, which Google says is its “highest high quality text-to-image mannequin, able to producing photos with even higher element, richer lighting and fewer distracting artifacts than our earlier fashions.”

Google additionally confirmed off its Music AI Sandbox, which Google payments as “a set of AI instruments to rework how music might be created.” It combines its YouTube music mission with its Lyria AI music generator into instruments for musicians.

The firm additionally introduced Google Veo, which is a text-to-video generator that creates 1080P movies from prompts in a high quality that appears to match OpenAI’s Sora. Google says it’s working with actor Donald Glover to create an AI-generated demonstration movie that can debut quickly. It’s removed from Google’s first AI video generator, but it surely appears to be its most succesful thus far.

The pattern video above, supplied by Google, used the immediate, “A lone cowboy rides his horse throughout an open plain at stunning sundown, smooth mild, heat colours.”

Google says beginning immediately, its new AI inventive instruments can be found to pick out creators in a personal preview solely however that wait lists are open.



Source hyperlink

Leave a Reply

Your email address will not be published. Required fields are marked *