ToolsProductsBlogVideosAboutContactSupport MeYouTubeStart Here
Back to blog
AI Tools6 min read

ElevenLabs V3 Tutorial: Best Settings, Audio Tags & Free GPT Tool

Master ElevenLabs V3 with the right voice settings, audio tags, and a free GPT that formats your scripts automatically for maximum expressiveness.

ElevenLabs V3 Tutorial: Best Settings, Audio Tags & Free GPT Tool

ElevenLabs V3 can whisper, laugh, switch accents mid-sentence, and drop a gunshot sound effect into a single generation. Most people paste in plain text, hit generate, and wonder why it sounds flat. The model isn't the problem. The prompting is.

You can get started with ElevenLabs V3 right now through this link. It's free to try, and V3 alpha is accessible directly from the main interface.

ElevenLabs
Try ElevenLabs V3 free, the most expressive AI voice model available right now.

The three things that actually control your output#

Getting great results from V3 isn't about talent or expensive gear. It comes down to three settings working together: voice selection, stability, and audio tags. Get any one of them wrong and the other two can't save you.

Voice selection: start neutral#

The starting voice is your foundation. V3's emotional tags only work when the base voice is neutral enough to shift. In the ElevenLabs voice library, filter for voices tagged "best voices for V3", these are the ones that have been verified to respond correctly to emotional direction. You'll see checkmarks on the left side of each result.

If you want to use your own voice, you need to custom-train it first. An untrained clone won't respond to emotional tags the same way. The model needs a clean baseline before it can perform.

Stability: this is where most people waste credits#

There are three positions on the stability slider: Creative, Natural, and Robust.

Robust gives you consistent, reproducible output every time, useful if you're running V3 inside an automation where you can't check every generation. But Robust is essentially V2 behavior. It suppresses audio tag responsiveness. If you're using emotional tags and wondering why nothing is happening, this is why.

For maximum expressiveness, stay in Natural or Creative. Creative is more emotionally varied but can occasionally hallucinate delivery. Natural is the middle ground: close to the original voice, balanced, and still responsive to tags. When you're experimenting with V3's new capabilities, start at Natural and push toward Creative as you get comfortable.

Audio tags: what separates V3 from everything before it#

This is the feature that makes V3 genuinely different. You can direct the voice to laugh, whisper, express sarcasm, shift accents, or drop in sound effects, all within a single generation. The syntax is straightforward: wrap the instruction in brackets inside your script.

Voice delivery tags include things like [laughs], [laughs harder], [wheezing], [sigh], [exhale], [sarcastic], [curious], [excited], and [whisper]. When you use [whisper], everything following that tag is whispered until you introduce a new tag.

The distinction worth knowing: a standalone tag like [laughs] plays as a separate beat. But you can describe the emotional state and have the voice carry it through the following words. So [laughs] Can you believe this? produces a laugh followed by the line. That's a different effect than instructing the voice to deliver the line while laughing.

Sound effects work the same way. You can write:

[applause] Thank you all for coming tonight. [gunshot] [surprised] What was that?

That's a single generation with ambient sound, a gunshot, and a surprised vocal delivery chained together. That's not possible in any previous version of this model.

Accents are also tag-driven. You can specify [French accent], [Russian accent], or [strong German accent] and the voice will shift mid-script. In testing, the model handles these transitions cleanly, it doesn't just speak, it performs.

Text structure shapes the output as much as tags do#

Tags get the attention, but how you structure the surrounding text matters just as much.

Use ellipses for pauses. A comma gives you a breath; ellipses give you weight. If you want the narrator to sit with a moment, ... does more work than any punctuation mark.

Capitalize for emphasis. Writing OH MY GOD versus oh my god produces noticeably different stress. The model reads capitalization as an instruction to push harder on those words.

Shift emotions gradually. Going directly from laughing to sad in one line can produce uneven output. Staging the transition, laughing, then wondering, then sad, gives the model a path to follow and produces more natural results.

For multi-speaker scripts, V3 now supports labeling speakers directly in your text. Label them as Speaker A and Speaker B (or give them names) and add character descriptions. The more context you give each speaker, the more distinct their intonations become. This is genuinely powerful for dialogue-heavy content.

One practical constraint: V3 accepts between 200 and 10,000 characters per generation. Keep that range in mind when you're structuring longer pieces.

The free GPT that removes all of this friction#

If you don't want to memorize every tag and punctuation rule, you don't have to. I built a custom GPT specifically for this: the ElevenLabs V3 Scriptwriter GPT, free on Gumroad.

You give it a plain-language prompt, something like "write four expressive sentences about why someone should subscribe to my YouTube channel", and it returns a properly formatted V3 script with the right tags, ellipses, capitalization, and emotional cues already baked in. You can then ask it to push further: "make it more dramatic, add more pauses" and it layers in additional tags and punctuation.

The output pastes directly into ElevenLabs. No manual formatting required.

ElevenLabs V3 Scriptwriter GPT (Free)
Free custom GPT that writes broadcast-quality ElevenLabs V3 scripts with correct tags, punctuation, and emotional cues.

If you want to go deeper on what ElevenLabs can do beyond voice generation, the complete ElevenLabs tutorial guide covers AI dubbing and video translation, a genuinely underused application for creators building multilingual audiences.

The V3 model gives you more control than any previous version. That control requires more deliberate input from you. The GPT handles the input side. All you need to do is choose the right voice, keep stability off Robust, and let the tags do the work.


Watch the full video on YouTube: https://youtu.be/Z2B_pgJ9hbA

This post contains affiliate links. I only recommend tools I actually use.

ML
Moe Lueker
elevenlabs-v3ai-voiceaudio-tagstext-to-speechai-tools

Get new videos in your inbox

Weekly AI workflows. No fluff.

No spam. Unsubscribe anytime.

Want more guides like this?

Subscribe for new videos every week.

Subscribe on YouTube