- Blog
- How to Remove Subtitles from Veo 3 Videos: Fix the Garbled Caption Bug (2026)
How to Remove Subtitles from Veo 3 Videos: Fix the Garbled Caption Bug (2026)
Veo 3 burns garbled subtitles into dialogue clips. Here is the complete 2026 playbook to prevent them at generation and remove them in post.
Emma Chen · 15 min read · Jun 25, 2026


You wrote the perfect Veo 3 prompt. The character looks right, the lighting is cinematic, the voice sounds human — and then a strip of scrambled, half-spelled subtitles appears burned across the bottom of the frame. You never asked for captions. You even wrote "no subtitles." They showed up anyway.
If that sounds familiar, you are not doing anything wrong. Veo 3's auto-subtitle behavior is one of the most reported problems with the model, documented everywhere from Reddit threads to MIT Technology Review. The captions are frequently nonsensical, they are burned into the pixels (not a track you can toggle off), and getting rid of them after the fact can cost you extra credits.
This guide is the complete, tested playbook for clean, caption-free Veo 3 video. You will learn exactly why Veo 3 adds subtitles, the prompt techniques that prevent them at generation time, and the post-production methods that remove them when prevention fails. Every prompt below is copy-paste ready. By the end you will have a repeatable workflow that produces talking-character clips with zero burned-in text.
Why Veo 3 Adds Subtitles You Never Asked For
To fix the bug reliably, you have to understand where it comes from. It is not a setting you forgot to switch off — it is a side effect of how the model was trained.
1. The training data was full of captions. Veo 3 learned to generate video and synchronized audio from enormous amounts of real-world footage. A large share of that footage — news clips, social videos, tutorials, movie scenes — already had subtitles or on-screen captions baked in. When the model detects that a clip contains speech, it has learned to associate speech with on-screen text. So it "helpfully" draws captions to match.
2. The captions are burned in, not a separate track. This is the part that catches people off guard. In a normal video editor, subtitles are a toggleable layer. In a Veo 3 export, the text is part of the rendered image itself — the same pixels as the actor's face and the background. There is no "captions: off" button because there is no caption layer. That is why you cannot simply disable them after export.
3. The text is often garbled. Because the model is drawing text rather than typesetting it, the captions are frequently misspelled, repeated, or pure gibberish. This makes them worse than ordinary subtitles — they actively make a professional clip look broken.
4. It is hardest to avoid exactly when you need clean output most. The bug is triggered by dialogue and narration. So the moment you use Veo 3's signature feature — native audio with a character actually speaking — is the moment you are most likely to get unwanted text. That is the core tension this guide solves.
Knowing this, the strategy is two-layered: prevent captions during generation with prompt structure, and remove them in post when a re-roll is not worth the credits. Let's start with prevention, because it is free.
Part 1 — Prevent Subtitles at Generation (The Free Fix)
Prevention is always cheaper than removal. These four techniques stack — use all of them together for the most reliable clean output. They work in the Gemini app, Google Flow, Google AI Studio, and the Veo 3 API.
Technique 1: Use a colon for dialogue, never quotation marks
This is the single highest-leverage change. The way you format the spoken line dramatically affects whether captions appear.
When you wrap dialogue in quotation marks or apostrophes, you are showing the model written text — and written text is exactly what it tends to render on screen. When you use a plain colon instead, the model is far more likely to treat the line as spoken audio only.
❌ Triggers captions (quotes):
A barista says: "Your latte is ready, enjoy your morning."
✅ Cleaner (colon, no quotes):
A barista says: Your latte is ready, enjoy your morning.
Avoid apostrophes inside the spoken line too, since they behave like quotation marks. Rephrase contractions where you can ("do not" instead of "don't") if a clip keeps fighting you.
Technique 2: Put the dialogue at the START of the prompt
Prompt order matters more than most people expect. Multiple creators have found that placing the voiceover or spoken line at the beginning of the prompt — before the visual description — produces dramatically fewer subtitles, and better audio-to-mouth alignment as a bonus. One creator reported ten consecutive subtitle-free generations after moving the speech to the top.
❌ Dialogue buried at the end:
A slow dolly-in on a chef in a warm, busy kitchen, golden hour light through
the window, shallow depth of field. The chef looks up and says: Tonight we cook
something special.
✅ Dialogue first:
The chef says: Tonight we cook something special.
Visual: a slow dolly-in on a chef in a warm, busy kitchen, golden hour light
through the window, shallow depth of field.
Technique 3: Add an explicit "no subtitles" instruction right after the dialogue
Negative instructions do help — but placement matters. Append the constraint immediately after the spoken line, not at the very end of a long paragraph where the model may discount it.
The guide says: Follow me to the overlook. (no subtitles, no captions, no on-screen text)
Useful phrasings that test well, in rough order of strength:
(no subtitles)no captions, no subtitles, no text overlayclean frame, no words on screen, no burned-in textDo not add any subtitles or captions.
Technique 4: Fill the negative prompt field
If you are generating in Google Flow, Google AI Studio, or via the Veo 3 API, you have a dedicated negative prompt field. Use it. This is separate from the constraints inside your main prompt and gives the model a second, clearer signal.
Negative prompt (copy-paste):
subtitles, captions, closed captions, on-screen text, text overlay, watermark,
words on screen, lower-third text, burned-in text, sign language overlay
For more on how the negative prompt field works across scenarios, see our Veo 3 negative prompt guide, which covers the full syntax and the other artifacts (extra fingers, warped logos, flicker) you can suppress the same way.
The combined "clean dialogue" prompt formula
Put all four techniques together and you get a template that produces caption-free talking clips with a high hit rate:
[SPEAKER] says: [spoken line, plain text, no quotes, no apostrophes].
(no subtitles, no captions, no on-screen text)
Visual: [subject + action], [setting], [lighting], [camera move], [lens/depth of field].
Audio: [ambient sound], [tone of voice], natural delivery.
Negative prompt: subtitles, captions, on-screen text, text overlay, watermark, words on screen
Worked example — a product UGC clip:
The reviewer says: I have used this blender every morning for a month and it
still sounds brand new.
(no subtitles, no captions, no on-screen text)
Visual: a 28-year-old woman in a bright modern kitchen holding a blender,
talking directly to camera, handheld UGC style, soft window light, shallow
depth of field, subtle natural sway.
Audio: quiet kitchen ambience, warm conversational tone, natural delivery.
Negative prompt: subtitles, captions, on-screen text, text overlay, watermark, words on screen
This same structure powers high-converting ad formats — if that is your use case, pair it with our Veo 3 UGC ad generator workflow and the broader Veo 3 native audio prompt guide for dialogue, sound effects, and lip-sync control.
Part 2 — Remove Subtitles in Post (When Prevention Fails)
Prevention is not 100%. Some clips still come back with captions, especially long lines of dialogue or scenes with two speakers. Because the text is burned in, you now have a pixel problem, not a text problem. Here are the four removal methods, from least to most destructive.
Method 1: Crop the bottom strip
Veo 3's auto-captions almost always sit in the lower third of the frame. The fastest fix is to crop that strip out.
- In any editor (CapCut, Premiere, DaVinci Resolve, even the Photos app), crop off the bottom 12–18% of the frame.
- Re-frame so your subject stays centered.
Trade-offs: you lose some of the image and you slightly zoom in, which can soften a 720p/1080p clip. It works best for vertical 9:16 content where a tighter crop is natural. Plan for it by composing with headroom — leave empty space at the bottom of the frame when you generate, so a crop costs you nothing important.
Method 2: Cover it with a lower-third or b-roll
Instead of removing the captions, hide them.
- Drop a solid lower-third graphic, brand bar, or caption box of your own (correctly spelled) over the bottom strip.
- Or overlay a few seconds of b-roll in a picture-in-picture band.
This turns the bug into a design choice and is the fastest route for social videos that were going to have a caption bar anyway. Add your real, on-brand captions on top — clean, readable, and spelled correctly.
Method 3: AI text / object removal
Several tools can paint out burned-in text by reconstructing the pixels behind it: dedicated AI video object-removal features, inpainting tools, and "remove text from video" utilities. Results vary with how busy the background is — a plain wall cleans up well; a moving, detailed background may smear. Always preview a few frames before committing a full export.
Trade-offs: the best tools are paid, and processing detailed backgrounds is imperfect. Reserve this for hero shots where cropping would ruin the composition.
Method 4: Re-roll the generation (last resort)
If a clip is critical and post-production cannot save it, regenerate it — but only after you have applied all four Part 1 prevention techniques. Re-rolling a clip with the same flawed prompt just burns credits for the same result. Re-rolling with a colon-formatted, dialogue-first, negative-prompted version is what actually changes the outcome.
To make re-rolls cheaper, run them on a lower-cost tier first to confirm the prompt is clean, then scale up. Our Veo 3 free access guide and Veo 3 pricing breakdown explain how to test prompts without burning premium credits.
Removal decision table
| Situation | Best method | Why |
|---|---|---|
| Vertical social clip, captions in lower third | Crop the bottom strip | Fast, free, natural for 9:16 |
| Video was going to have captions anyway | Cover with your own lower-third | Turns the bug into a feature |
| Cinematic hero shot, full frame matters | AI text removal | Preserves composition |
| Mission-critical clip, post can't fix it | Re-roll with fixed prompt | Only works with corrected prompt |
Part 3 — Two-Speaker Scenes: The Hardest Case
Single-speaker clips are usually solvable with Part 1. Conversations between two characters are harder — Veo 3 may caption both lines, and it sometimes assigns dialogue to the wrong person's mouth.
Two things help:
1. Label the speaker by position and appearance, not just name. The model does not know who "Anna" is, but it can track "the woman on the left in the red jacket."
The woman on the left in the red jacket says: Did you finish the report?
The man on the right in the grey shirt says: Almost, give me ten minutes.
(no subtitles, no captions, no on-screen text)
2. Keep each spoken line short. Long dialogue is the single biggest caption trigger. Break a conversation into multiple short clips — one exchange each — and stitch them in your editor. Shorter lines also improve lip-sync accuracy.
If you are building multi-shot scenes with the same characters across clips, the bigger challenge becomes keeping faces and outfits consistent. Our Veo 3 character consistency guide covers the reference-image and seed techniques that keep your speakers looking identical shot to shot, and the Veo 3 street interview prompts guide shows the dialogue-heavy format in action.
Part 4 — Does Veo 3.1 Fix the Subtitle Bug?
Veo 3.1 improved many things — better prompt adherence, stronger character locking, cleaner audio — but the auto-caption behavior still appears for dialogue-heavy prompts. The same prevention stack applies. If anything, 3.1's better instruction-following means the colon-syntax and dialogue-first techniques land more reliably than they did on the original Veo 3.
A quick note on the root cause: this is genuinely hard for Google to fully eliminate. Because the behavior is baked into the training data, a true fix means re-labeling or filtering caption-bearing footage and retraining — slow, expensive work. Translation: do not wait for an official toggle. The prompt and post-production workflow in this guide is the practical fix today, and it will keep working across Veo 3, Veo 3 Fast, and Veo 3.1.
For everything new in the latest version, see our Veo 3.1 new features guide. And if your dialogue is part of a larger image-to-video pipeline, the Veo 3 image-to-video guide explains how reference frames interact with native audio.
Five Common Mistakes That Bring the Captions Back
Even people who know the techniques sabotage their own clips. These are the patterns that quietly reintroduce subtitles, and how to break each one.
Mistake 1: Keeping the quotation marks "just this once." It feels natural to write dialogue in quotes — that is how dialogue looks in a script. But quotes are the strongest single trigger for on-screen text. Train yourself to write the colon form every time, even for a throwaway test. One stray pair of quotation marks is enough to bring the captions back on an otherwise clean prompt.
Mistake 2: Writing a paragraph of dialogue. The longer the spoken line, the higher the chance of captions and the worse the lip-sync. A character delivering four sentences in one breath is asking for trouble. Cap each spoken line at roughly one short sentence (about 8–12 words) per clip and let the editor handle the conversation flow. This also keeps you inside Veo 3's natural 8-second clip length without rushed, mumbled delivery.
Mistake 3: Putting the negative constraint only at the very end. A "no subtitles" tacked onto the end of a 90-word prompt competes with everything else for the model's attention and often loses. Place the constraint immediately after the dialogue line where it has the most influence, and back it up with the dedicated negative prompt field. Redundancy is your friend here.
Mistake 4: Composing edge-to-edge with no headroom. If you frame your subject tightly to the bottom of the screen and captions appear anyway, you have no room to crop them out without cutting off your subject. Always leave a little dead space at the bottom of the frame when you generate dialogue clips. It costs you nothing visually and gives you a free escape hatch in post.
Mistake 5: Re-rolling the exact same prompt. This is the most expensive mistake because it burns credits for no change. Generation is probabilistic, so an identical prompt will occasionally come back clean — but you are gambling, not fixing. Always change the prompt (colon syntax, dialogue first, inline constraint, negative field) before you spend credits on another generation. A corrected prompt changes the odds; an identical one does not.
Avoid all five and your clean-output hit rate jumps from "sometimes" to "almost always." For a deeper look at how prompt structure controls every part of a Veo 3 generation — not just captions — see our Veo 3 prompt engineering guide.
A Complete Clean-Dialogue Workflow (Start to Finish)
Here is the full process, combining everything above into one repeatable routine.
- Write the spoken line first, in plain text — no quotes, no apostrophes.
- Format with a colon:
[Speaker] says: [line]. - Append the constraint right after:
(no subtitles, no captions, no on-screen text). - Add the visual block below the dialogue: subject, action, setting, lighting, camera, lens.
- Fill the negative prompt field with the caption blocklist.
- Compose with bottom headroom so a crop is painless if needed.
- Generate on a cheaper tier first to confirm the prompt is clean.
- Inspect the lower third of the result. Clean? Scale up. Captions? Apply a Part 2 removal method or re-roll with the corrected prompt.
- Add your own correctly spelled captions in post if you want them — now you control the text.
Follow this and the subtitle bug stops being a recurring headache and becomes a checkbox you tick once per clip.
Frequently Asked Questions
Why does Veo 3 add subtitles when I clearly wrote "no subtitles"? Because the behavior comes from the training data, not a single instruction the model reliably obeys. A lone "no subtitles" at the end of a long prompt is often discounted. You get far better results by also using colon-not-quotes formatting, putting dialogue first, and filling the negative prompt field. Stack the techniques — no single one is bulletproof.
Can I just turn captions off in settings? No. Veo 3 captions are burned into the rendered pixels, not a separate subtitle track, so there is no toggle. Your only options are preventing them at generation or removing them in post.
Do quotation marks really cause subtitles?
In practice, yes — quotes and apostrophes around dialogue make the model more likely to render that text on screen. Switching to a plain colon (A man says: hello) is the most consistent single fix reported by creators.
Will cropping ruin my video quality? A modest bottom crop (12–18%) zooms in slightly, which can soften lower-resolution clips. Avoid quality loss by composing with empty headroom at the bottom of the frame when you generate, or by upscaling after the crop.
Does the subtitle bug affect Veo 3.1 and Veo 3 Fast too? Yes, the behavior still appears for dialogue-heavy prompts across all current Veo 3 variants. The same prevention and removal workflow applies, and 3.1's improved prompt adherence actually makes the prompt-side fixes more reliable.
What about two people talking — why does the wrong character speak? Veo 3 can misassign dialogue when speakers are not clearly distinguished. Label each speaker by position and appearance ("the woman on the left in the red jacket"), keep lines short, and split long conversations into separate clips.
The Bottom Line
Veo 3's auto-subtitles are annoying, but they are predictable — and predictable problems have repeatable solutions. Prevent them at generation with four free techniques (colon not quotes, dialogue first, an inline "no subtitles" constraint, and a filled negative prompt field), and remove them in post with a crop, an overlay, AI text removal, or a corrected re-roll. Master that two-layer workflow and you unlock Veo 3's best feature — characters that genuinely talk — without the garbled text that makes a clip look broken.
Ready to put it into practice? Generate clean, caption-free talking videos with Veo 3 on veo3ai.io and start with the clean-dialogue prompt formula above.
Related Articles
Continue with more blog posts in the same locale.

Veo 3 B-Roll Generator: How to Create Cinematic Stock Footage with AI (2026)
Use Veo 3 as a b-roll generator: write prompts for cinematic cutaways, match AI footage to real clips, batch a full b-roll pack, and QA before publishing.
Read article
Veo 3 Dialogue: How to Make Two Characters Talk in One Scene (2026)
Stage realistic two-character conversations in Veo 3 — prompt structure, distinct voices, turn-taking, lip-sync, and a full worked example.
Read article
Bulk AI Video Generator: How to Batch-Create Veo 3 Videos at Scale (2026)
How to use a bulk AI video generator workflow to batch-create dozens of consistent Veo 3 videos — master prompts, variable sheets, seed control, production playbooks, cost control, and QA at scale.
Read article