FILE PHOTO: OpenAI announced its “most capable” speech-to-speech AI model, gpt-realtime.
| Photo Credit: AP
OpenAI on Thursday (August 28, 2025) announced its “most capable” speech-to-speech AI model, gpt-realtime. The AI model is said to be natural and expressive while also being better at following complex instructions.
“It’s better at interpreting system messages and developer prompts—whether that’s reading disclaimer scripts word-for-word on a support call, repeating back alphanumerics, or switching seamlessly between languages mid-sentence,” per the company blog.
It can also switch language or tone in the middle of a sentence.
Gpt-realtime is also able to capture non-verbal cues like laughs and detect numbers even in languages like Spanish, Chinese, Japanese and French.
“We trained the model in close collaboration with customers to excel at real-world tasks like customer support, personal assistance, and education—aligning the model to how developers build and deploy voice agents,” the blog stated.
The model will be available on the Realtime API, which was also made generally available.
OpenAI has also released new voices on the API called Cedar and Marin which can be accessed via the API.
Published – August 29, 2025 02:07 pm IST