Gemini Omni Camera Motion and Shot Framing Guide

Google's official Gemini Omni prompt guide makes one thing clear: camera language matters. The model can reason about action, location, style, lighting, shot framing and motion, bu

Emma Chen · 17 min read · May 21, 2026

Google's official Gemini Omni prompt guide makes one thing clear: camera language matters. The model can reason about action, location, style, lighting, shot framing and motion, but a stronger prompt still gives it a production plan. This guide turns that idea into a practical camera system for creators using Gemini Omni, text-to-video and image-to-video.

The goal is not to rewrite Google's guide. The official source is here: Google DeepMind Gemini Omni prompt guide. This article builds original frameworks, shot examples and camera templates for veo3ai.io readers who want controllable results.

One wording note: Gemini Omni has replaced the Veo label inside the Gemini app, but that does not mean every Veo reference has disappeared from Google's broader ecosystem. For that transition, read our Gemini Omni vs Veo 3.1 comparison. For broader prompt structure, pair this with our Gemini Omni prompt guide to essential elements.

Gemini Omni camera motion and shot framing guide cover

Quick answer: what should a camera prompt include?

A useful Gemini Omni camera prompt should describe six things in one clean sentence or paragraph:

Element	What to specify	Example phrase
Shot size	How close the subject appears	"medium close-up of a chef's hands"
Angle	Where the camera is placed	"low angle from countertop height"
Movement	How the camera travels	"slow dolly push-in"
Subject action	What happens during the shot	"she dusts flour across the dough"
Visual priority	What must stay readable	"keep the product label sharp and centered"
Mood and pacing	How the shot should feel	"calm, premium, deliberate"

A weak prompt says: "Show a coffee cup in a cafe." A stronger camera prompt says: "Medium close-up of a ceramic coffee cup on a sunlit cafe table, camera starts at table height and slowly pushes in as steam rises, shallow depth of field, warm morning light, keep the cup centered and readable."

The second version gives Gemini Omni a shot plan instead of only a scene description.

The CRAFT framework for Gemini Omni camera prompts

Use CRAFT when you need a fast, repeatable structure:

C — Composition: subject, shot size, frame position and background.
R — Route: camera movement path from start to finish.
A — Action: what the subject does while the camera moves.
F — Focus: what must remain sharp, stable or readable.
T — Tone: mood, lighting, speed and visual style.

Template:

[Shot size] of [subject] in [location]. The camera [movement route] while [subject action]. Keep [focus priority] clear. Use [tone, lighting, pacing and style].

Example:

Wide establishing shot of a solo runner on an empty coastal road at sunrise. The camera begins behind the runner, then performs a smooth tracking move alongside her as the ocean appears on the left. Keep the runner sharp and centered, with soft golden light, quiet athletic energy and natural motion blur.

This framework works because it separates camera behavior from scene content. Gemini Omni may understand a high-level goal, but the model still benefits when you define the frame, motion path and visual hierarchy.

Shot framing: choose the viewer's distance first

Shot size is the easiest way to control emotion. Before asking for a camera move, decide how close the viewer should feel to the subject.

Extreme wide shot

Use this when scale, location or isolation matters. It is useful for landscapes, city skylines, architecture, travel openings and fantasy environments.

Prompt pattern:

Extreme wide shot of a tiny hiker crossing a snowfield below enormous dark mountains, camera locked off, minimal movement, cold blue dawn light, emphasize scale and solitude.

Wide shot

A wide shot shows the full subject and enough environment to explain the scene. Use it for product lifestyle scenes, action blocking, fashion, sports or group movement.

Prompt pattern:

Wide shot of two cyclists riding through a narrow old-town street, camera tracks backward ahead of them, buildings passing on both sides, lively morning market atmosphere.

Medium shot

Medium shots are the workhorse of marketing videos. They keep the person or product readable while still showing context. Use them for explainers, founder videos, tutorials and social ads.

Prompt pattern:

Medium shot of a designer standing beside a wall of sketches, camera gently pushes in as she points to a new product concept, soft studio lighting, confident but approachable tone.

Close-up

Close-ups create attention. They work well for faces, hands, textures, tools, food, jewelry, buttons, screens and product details. Be specific about what should stay sharp.

Prompt pattern:

Close-up of a hand pressing a matte black camera shutter button, shallow depth of field, camera holds steady, crisp focus on the fingertip and button texture.

Extreme close-up

Extreme close-ups are best for sensory detail: water droplets, fabric weave, eyelashes, glowing pixels, engraved logos or mechanical parts. Use them sparingly because they can become abstract.

Prompt pattern:

Extreme close-up of condensation beads sliding down a cold glass bottle, macro lens feel, slow vertical camera drift, label edge slightly visible but droplets remain the focus.

Gemini Omni camera movement map infographic

Camera angles: define power, intimacy and clarity

Angle changes the meaning of the same scene.

A low angle makes the subject feel larger, heroic or dominant. Use it for athletes, products with a premium feel, vehicles or dramatic entrances.

A high angle makes the subject feel small, organized or observable. Use it for desks, flat lays, cooking, maps, logistics, crowds or planning scenes.

An eye-level angle feels natural and trustworthy. Use it for tutorials, interviews, lifestyle ads and realistic scenes.

An over-the-shoulder angle creates participation. It is helpful for app demos, workstation scenes, gaming, design, drawing and conversations.

A top-down angle is excellent for clarity. Use it when the viewer must understand layout: recipe steps, unboxing, notebooks, tools, ingredients or UI planning.

Instead of writing "cinematic angle," name the angle and reason:

Eye-level medium close-up of a small business owner packing orders at her desk, camera slowly slides left, keep her hands and branded boxes visible, warm practical lighting, documentary realism.

The phrase "eye-level" tells the model to avoid an overly dramatic view. The phrase "hands and branded boxes visible" tells it what the frame must protect.

Movement grammar: what different camera moves mean

Camera motion is not decoration. Each move communicates a different idea.

Push-in

A push-in moves the camera closer. It increases importance, emotion or reveal. Use it when the audience should notice a key detail or feel a decision becoming serious.

Example:

Medium shot of a founder looking at a prototype on a table, slow dolly push-in as the device lights up, focus shifts from her expression to the glowing interface.

Pull-back

A pull-back reveals context. It starts intimate and then shows the larger world. Use it for transformations, surprises and before-after moments.

Example:

Close-up of a single app notification on a phone, camera slowly pulls back to reveal a busy creative studio using the same dashboard on multiple screens.

Tracking shot

A tracking shot follows a subject. It creates momentum and continuity. Use it for walking, running, vehicles, factory lines, travel, retail shelves or process videos.

Example:

Side tracking shot of a delivery robot moving along a sidewalk, camera keeps pace at wheel height, city lights reflecting on wet pavement, smooth evening motion.

Orbit

An orbit circles around the subject. It feels premium, dramatic or product-focused. Use it for hero shots, fashion reveals, cars, gadgets and character moments.

Example:

Slow 180-degree orbit around a transparent smart speaker on a pedestal, internal lights pulsing softly, glossy reflections, clean futuristic showroom.

Pan

A pan rotates the camera from left to right or right to left. It reveals a scene without moving through space. Use it for landscapes, room reveals, comparison layouts and event setups.

Example:

Slow left-to-right pan across a creator's desk setup, passing a camera, microphone, sketchbook and laptop timeline, soft afternoon light.

Tilt

A tilt moves the camera up or down. It reveals height, scale or vertical information. Use it for buildings, outfits, product towers, trees, robots or stage entrances.

Example:

Low angle tilt up from polished shoes to a tailored jacket as a speaker steps onto a conference stage, bright spotlight, confident launch-event energy.

Handheld motion

Handheld motion feels immediate, imperfect and human. Use it for documentary realism, behind-the-scenes footage, street scenes and tense moments. Do not use it for luxury product shots unless you want a raw feeling.

Example:

Handheld close follow shot of a chef moving through a crowded kitchen, slight natural shake, steam and motion, urgent but controlled dinner-service atmosphere.

Locked-off shot

A locked-off shot has no camera movement. It feels composed, observational or clinical. Use it when motion inside the frame is already enough.

Example:

Locked-off wide shot of sunlight moving across an empty minimalist bedroom, curtains shifting gently, quiet morning mood.

The start-to-finish motion rule

Many prompts mention movement but do not say where the camera begins or ends. That creates ambiguity. A better motion prompt includes a start frame, movement path and end frame.

Weak:

Camera moves around a car.

Better:

Camera starts in a low front three-quarter view of the electric car, then performs a slow clockwise orbit to the rear badge, ending on a close-up of the taillight design.

The better version gives Gemini Omni an edit map. It also prevents the camera from wandering away from the most important detail.

Use this formula:

Start at [initial framing]. Move [direction and speed]. End on [final framing or reveal].

Example for image-to-video:

Using the uploaded product image as the design reference, start with a close-up on the logo, pull back into a medium product shot, then end with a slight orbit that shows the side profile. Preserve the product shape, color and label placement.

That last sentence is important for commercial work. When a logo, screen, garment or package matters, tell the model to preserve it.

Lens language without overcomplicating the prompt

You do not need a film-school paragraph. A few lens-style phrases can guide the look: wide lens feel for environment, telephoto compression for fashion or sports, macro lens feel for detail, shallow depth of field for premium focus and deep focus when foreground and background both matter.

Prompt example:

Medium close-up of a ceramic watch on a dark stone surface, macro lens feel, shallow depth of field, camera slowly slides right, keep the engraved dial sharp and readable.

Avoid stacking too many technical terms. One clear shot plan is usually stronger than a crowded list of camera jargon.

Gemini Omni shot framing ladder infographic

Camera prompting for text-to-video vs image-to-video

For text-to-video, your prompt must create the scene and direct the camera. Include subject, setting, style, action, framing and motion.

Text-to-video example:

Wide shot of a compact AI video studio inside a glass office at night. A creator reviews clips on a large monitor while city lights glow outside. Camera begins behind the creator, slowly pushes toward the screen, then ends on the timeline interface. Clean modern lighting, calm professional mood.

For image-to-video, the image already defines part of the world. Your camera prompt should protect the reference while adding motion.

Image-to-video example:

Animate the uploaded product photo into a premium hero shot. Keep the product design, logo, color and proportions consistent. Start with a close-up on the front label, pull back to reveal the full product on a reflective surface, then add a subtle clockwise orbit. Soft studio highlights, no extra text.

The difference is priority. Text-to-video needs world building. Image-to-video needs preservation plus motion.

Three complete Gemini Omni camera prompt templates

1. Product hero template

Close-up of [product] on [surface/environment]. Camera starts at [detail], then [movement] to reveal [final hero angle]. Keep [logo/screen/shape] sharp and accurate. Use [lighting], [background style] and [brand mood].

Example:

Close-up of a silver wireless microphone on a matte graphite desk. Camera starts on the mesh grille, then slowly pulls back and orbits 90 degrees to reveal the full body and glowing power light. Keep the brand mark sharp and centered. Use soft studio lighting, dark premium background and precise commercial pacing.

2. Human story template

[Shot size] of [person] doing [action] in [location]. Camera [movement] from [start position] to [end position]. Emphasize [emotion or story beat]. Keep [important object or face] clear. Use [lighting and style].

Example:

Medium shot of a student editing a short film in a quiet dorm room. Camera starts over her shoulder on the laptop timeline, then slowly pushes in as she smiles at the final cut. Emphasize relief and creative focus. Keep her face and the screen readable. Use warm desk-lamp lighting and realistic documentary style.

3. Scene reveal template

Start with [close detail]. Camera [pull-back/pan/tilt] to reveal [larger scene]. The reveal should show [key contrast or surprise]. Maintain [mood], [pacing] and [visual priority].

Example:

Start with a close-up of raindrops on a train window. Camera pulls back to reveal a futuristic station full of glowing signs and travelers with transparent umbrellas. The reveal should feel quiet and cinematic, with cool reflections, slow pacing and the window texture still visible in the foreground.

Common camera prompt mistakes

Do not rely on "cinematic" alone. Add shot size, angle and movement: "low-angle wide shot, slow push-in, golden backlight, subject centered." Do not combine conflicting moves such as drone, macro, fast handheld and locked-off in one shot. Pick one dominant behavior.

Also give the subject something to do. "Camera orbits a desk" is weaker than "camera orbits as the designer places the finished prototype onto the desk." If product text, UI, labels or logos matter, say so and request no extra text. Finally, avoid over-directing every frame. Give Gemini Omni clear shot grammar, then let it fill natural details.

A practical editing workflow

Start with one clean master prompt, not ten variations. Generate a first result, then evaluate three questions:

Is the subject readable?
Does the camera move in the intended direction?
Does the ending frame land on the right detail?

If the answer is no, revise the camera instruction rather than rewriting the whole scene. For example:

Keep the same scene and lighting, but change the camera movement. Start wider, push in more slowly, and end on the product label instead of the background.

This style of revision fits Gemini Omni's conversational editing direction. It also helps teams keep a consistent creative concept while improving only the camera behavior.

Final checklist before you generate

Before running the prompt, check for one primary shot size, one camera angle, one dominant movement, a clear subject action, a start frame, an end frame, a focus priority, mood, lighting and any preservation notes for image-to-video references. If your prompt has those pieces, it is usually ready.

Bottom line

Gemini Omni can understand broader creative goals, but camera prompts still decide whether a video feels random or directed. Think like a director: choose the viewer's distance, place the camera, define the movement, protect the important detail and describe the mood. For most creators, the CRAFT framework is enough: Composition, Route, Action, Focus and Tone.

Use the official Google DeepMind guide as the source of truth for Gemini Omni capabilities, then use the original templates in this article to make camera motion and shot framing more repeatable. If you are still mapping the wider product change, start with our Gemini Omni hub, then compare the transition in Gemini Omni vs Veo 3.1.

FAQ

What is the best camera movement for Gemini Omni prompts?

The best movement depends on the goal. Use a push-in for importance, a pull-back for reveal, tracking for motion, orbit for product drama, pan for environment and locked-off framing for calm observation. Pick one dominant movement per shot.

Should I use technical lens terms in Gemini Omni prompts?

Use simple lens-style language only when it helps. Phrases such as "macro lens feel," "shallow depth of field" or "wide lens feel" are useful. Long lists of film terms can make the prompt less clear.

How do I stop Gemini Omni from changing a product in image-to-video?

Tell it exactly what to preserve: product shape, color, logo, label placement, screen layout or material. Add a sentence such as "Preserve the uploaded product design and keep the logo readable; do not add extra text."

Is Gemini Omni better than Veo for camera control?

Gemini Omni is presented as a Gemini-native video creation and editing experience with strong prompt understanding. Veo 3.1 remains an official Google DeepMind model reference. The practical answer depends on which product surface, plan, region and workflow you can access.

Can I ask Gemini Omni to fix camera motion after generation?

Yes, if the interface supports iterative editing, you can revise the camera instruction without changing the whole concept. Ask it to keep the same scene, lighting and subject, then change the start frame, movement speed or final framing.

Ready to create AI videos?

Turn ideas and images into finished videos with the core Veo3 AI tools.

Text to Video Image to Video

Continue with more blog posts in the same locale.

Browse all posts

What is Google Veo 4?

Complete overview of Google Veo 4 AI video generator features, capabilities, and improvements over Veo 3.

Read article

How to Use Google Veo 4

Step-by-step guide to using Google Veo 4 AI video generator. Learn prompts, settings, and best practices for creating stunning AI videos.

Read article

Gemini Omni vs Veo Prompting: Why Omni Prompts Can Be Less Prescriptive

Learn why Gemini Omni prompting can be less prescriptive than Veo prompting, with practical prompt examples, workflow tips, and safe wording about the Veo transition.

Read article

Browse all posts

Quick answer: what should a camera prompt include?

The CRAFT framework for Gemini Omni camera prompts

Shot framing: choose the viewer's distance first

Extreme wide shot

Wide shot

Medium shot

Close-up

Extreme close-up

Camera angles: define power, intimacy and clarity

Movement grammar: what different camera moves mean

Push-in

Pull-back

Tracking shot

Orbit

Pan

Tilt

Handheld motion

Locked-off shot

The start-to-finish motion rule

Lens language without overcomplicating the prompt

Camera prompting for text-to-video vs image-to-video

Three complete Gemini Omni camera prompt templates

1. Product hero template

2. Human story template

3. Scene reveal template

Common camera prompt mistakes

A practical editing workflow

Final checklist before you generate

Bottom line

FAQ

What is the best camera movement for Gemini Omni prompts?

Should I use technical lens terms in Gemini Omni prompts?

How do I stop Gemini Omni from changing a product in image-to-video?

Is Gemini Omni better than Veo for camera control?

Can I ask Gemini Omni to fix camera motion after generation?

Related Articles

What is Google Veo 4?

How to Use Google Veo 4

Gemini Omni vs Veo Prompting: Why Omni Prompts Can Be Less Prescriptive