OpenAI Launched Advanced Voice Mode: Learn How to Use it

OpenAI has finally released the much-awaited Advanced Voice Mode. However, it’s available for a limited group of ChatGPT Plus users.

  • ChatGPT’s advanced Voice Mode uses a multimodal GPT-4o.
  • Advanced Voice Mode (AVM), available in Alpha, can respond to non-verbal cues as it does have screen and video understanding capabilities.
  • Alpha version of ChatGPT’s Voice Mode responds exactly like a human. It pauses and takes breaths as a human does while conversing.
  • New Voice Mode feature will be available to all users of ChatGPT Plus in the fall of 2024.

In its Spring 2024 event of launching GPT-4o model in May, OpenAI revealed the capabilities of this advanced Voice Mode of GPT-4o. This voice feature was the most advanced capability in AI voice technology because of its strong similarity to human voice and conversing style.

ChatGPT's advanced voice mode

When this human-voice-resembling AI voice mode feature was introduced in May 2024, it initially had five voice options: Sky, Breeze, Cove, Juniper, and Ember, each character with a unique tone and style. Later, the voice, Sky, was removed. Now it has four voice characters.

If you remember, the voice of Sky faced controversy due to its similarity to the voice of actress Scarlett Johanson, who played the voice of an artificial assistant in “Her”. She threatened legal action against the company for using her voice without permission, which OpenAI denied.

What is Advanced Voice Mode?

OpenAI started to roll out advanced Voice Mode on 30 July 2024 to ChatGPT Plus Alpha users with a hyper-realistic voice interaction experience, setting a new natural voicing capability standard in the AI field.

This advanced voice feature uses the GPT-4o modal which is a multimodal GPT with its chat, vision, and voice capabilities. This 3-in-1 GPT model enables ChatGPT’s voice feature to respond faster than the previous version of Voice Chat.

In a recent announcement on X, OpenAI said that the advanced Voice Mode can respond to your emotions, and handle interruption from you, promising a natural and real-time conversational experience.

10 Use Cases of Advanced Voice Mode

The advanced Voice Mode has 4 preset voices. Additionally, the company added new filters to avoid music generation or other copyrighted content. The updated Voice Mode can assist with on-screen content and use the phone camera for contextual responses, while these features are still rolling out to Alpha users of ChatGPT Plus.

Advanced Voice Mode
Image by OpenAI

Many ChatGPT Plus Alpha users have shared their experience with the Advanced Voice Mode of ChatGPT. Here are the 10 most exciting and innovative examples of use cases of this new voice mode:

Real-Time Translation

You can use advanced Voice Mode for real-time translation of any language, written or spoken. For example, if you’re watching a German movie or playing a game in the German language, you can ask advanced voice mode to translate its German caption into English. It’ll speak to you in English whichever language is shown to it via your mobile camera.

Sports Commentator

You can ask to act as a sports commentator for any sport of your choice like football, soccer, cricket, etc. An alpha user asked it to act as a soccer match commentator. It really did a great job being a soccer commentator. It delivered commentary getting excited, screaming about a score or goal just like a human commentator does.

News Reporter/Analyst

Since this voice feature uses GPT-4o, it can crawl the web or the internet. You can use it as a news analyst who’ll inform you about the latest news in whatever category you’re interested in. You may choose one particular piece of news from your Google News timeline and ask advanced voice mode to elaborate on it.

Language Teacher

You can use ChatGPT’s advanced Voice Mode as a Language Tutor. Conversate with it in your 2nd or 3rd language that you’re learning, say French, German, Japanese, Italian, or any other language.

You can also listen to and practice different regional accents of a single language. For instance, a ChatGPT Plus Alpha user asked Voice Mode to converse being different speakers representing various US accents like New York, Boston, Mid-western, and California accents respectively.

The alpha user asked Advanced Voice Mode to make these speakers, speaking regional accents, advocate their own regional dish to be the tastiest in the US. So, you can use updated Voice Mode to listen, learn, and practice different accents of a language.

Pronunciation Tutor

Although ChatGPT’s advanced voice feature is excellent when it comes to the pronunciation of a language, it’s not up to mark yet in terms of correcting your pronunciation.

You may somehow use it to correct your pronunciation of different accents of the English language. It isn’t yet quite an expert in correcting your pronunciation but you may take it as a good start to practice and correct your pronunciation.

BeatBoxer

Chat GPT with its advanced voice mode can play beatboxing for you. Crazy, huh? Yes, you can use it for beatboxing, a lip art music that mimics the sound of drum machines. Human Beatboxer uses his voice to create beats by making different sounds, this AI voice feature can also do so.

Tongue Twister Master

You can use Advanced Voice Mode to say famous tongue twisters, to make them more tongue-twisty, or to generate an entirely new one to play with among your friends.

It can also translate a tongue twister into another language. For instance, an alpha user requested it to translate a Swedish tongue twister into an English one. It’s very creative and funny when listening to such tongue twisters.

Joke Cracker

Yes, you read it right. You can use ChatGPT’s advanced voice mode as a joke cracker in any language of your choice. It laughs, cracks jokes, and can be your jolly friend to kill loneliness in your solitude as well as to warm up in family and friend get-togethers.

Making Animal Voices

You can ask it to make animal voices like meow, buzz, moo, chirp, howl, bark, etc. A ChatGPT Plus user having access to advanced Voice Mode asked it to make a meowing voice as 2 cats are meowing, it incredibly did this.

Counting Numbers

Many users are experimenting with it in different ways. Some alpha users asked advanced Voice Mode to count from 1 to 50 louder and faster.

And, it does so in an impressively natural way while pausing in between, taking breaths just like a real human does when counting, sometimes taking the pace high and sometimes making it go slow.

ChatGPT’s Advanced Voice Mode Vs Standard Voice Mode

OpenAI, the creator company of ChatGPT, has significantly enhanced Advanced Voice Mode with GPT-4o’s advanced audio and video capabilities, offering improved functionality and performance.

Standard Voice Mode of ChatGPT uses three separate models for translation, text processing, and audio conversion, causing latency (2.8 sec for GPT-3.5 & 5.4 sec for GPT-4) and loss of information. Advanced Voice Mode with GPT-4o’s text, vision, and audio capabilities relies on a single model, reducing latency and preserving more context.

Using a single multimodal GPT not only makes it faster but also improves its functionality in different ways.

Overall, advanced Voice mode is more human-like, engaging, and immersive while standard or previous Voice Mode is functional but less natural and interactive.

How to Access Advanced Voice Mode Alpha?

OpenAI rolled out Advanced Voice Mode to a selective and smaller group of ChatGPT users who have the subscription of ChatGPT Plus, a $20 per month subscription package. The Alpha is the smaller group of ChatGPT Plus users that have been selected to use and test Advanced Voice Mode.

Unlike SearchGPT, an AI Search Engine that can be joined by only 10,000 users yet, OpenAI is expanding its Alpha users for advanced Voice Mode, so you can include yourself in the possibility of being selected by joining ChatGPT Plus. If you get selected, you’ll get an email with instructions to use this new voice feature. Additionally, you’ll see a notification to try the new voice mode when opening your ChatGPT mobile app.

If you’re a ChatGPT Plus user and haven’t been selected to use Advanced Voice Mode, you may be selected soon, or else you can use it when it is released to all ChatGPT Plus users in fall 2024.

Albert Haley

Albert Haley

Albert Haley, the enthusiastic author and visionary behind ChatGPT 4 Online, is deeply fueled by his love for everything related to artificial intelligence (AI). Possessing a unique talent for simplifying complex AI concepts, he is devoted to helping readers of varying expertise levels, whether newcomers or seasoned professionals, in navigating the fascinating realm of AI. Albert ensures that readers consistently have access to the latest and most pertinent AI updates, tools, and valuable insights. Author Bio