OpenAI is releasing a new flagship generative AI model called GPT-4o, set to roll out “iteratively” across the company’s developer and consumer-facing products over the next few weeks.
OpenAI CTO Muri Murati said that GPT-4o provides “GPT-4-level” intelligence but improves on GPT-4’s capabilities across text and vision as well as audio.
“GPT-4o reasons across voice, text and vision,” Murati said at a keynote presentation at OpenAI’s offices.
GPT-4, OpenAI’s previous leading model, was trained a combination of both images and text, and could analyze images and text to accomplish tasks like extracting text from images or even describing the content of those images. But GPT-4o adds speech to the mix.
What, concretely, does this enable? A range of things.
GPT-4o greatly improves the ChatGPT experience — ChatGPT being OpenAI’s viral AI-powered chatbot. ChatGPT has long offered a voice mode that transcribes text from ChatGPT using a text-to-speech model. GPT-4o supercharges this, allowing users to interact with ChatGPT like an assistant.
For example, users can ask ChatGPT — powered by GPT-4o — a question and interrupt ChatGPT while it’s answering.
In other news, OpenAI is releasing desktop version of ChatGPT and a refreshed UI.
“We know that these models [are getting] more and more complex, but we want the experience of interaction to actually become more natural, easy, and for you not to focus on the UI at all, but just focus on the collaboration with [GPTs],” Murati said.