Comprehensive definitions of all VeoStudio and AI video generation terminology.
Claude Sonnet 4.5's massive context window allowing it to process and understand up to 1 million tokens of information. Enables the AI Script Writer to generate coherent, long-form scripts with up to 99 scenes while maintaining narrative consistency.
VeoStudio's scriptwriting system using Claude Sonnet 4.5 with 1M token context. Generates structured, multi-scene scripts (up to 99 scenes) with detailed descriptions including prompts, camera work, lighting, dialogue, and timeline information.
The proportional relationship between width and height of videos. VeoStudio supports 16:9 (landscape, for YouTube), 9:16 (portrait, for TikTok/Instagram), and 1:1 (square, for social feeds).
Cinematic camera movements and angles specified for each scene (e.g., "static shot," "slow push in," "dutch angle"). Part of every scene's description to guide VEO 3's video generation.
The ability to maintain the same character appearance across multiple scenes or projects. VeoStudio achieves this through Gemini Banana, which creates character images with unique IDs that can be referenced throughout your project.
Your collection of reusable characters created with Gemini Banana. Each character maintains consistent appearance and can be used across multiple scenes and projects.
Anthropic's advanced language model used for VeoStudio's AI Script Writer. Known for excellent creative writing, instruction following, and ability to handle complex multi-scene narratives with its 1M token context window.
Full rights to use generated videos for any commercial purpose including client work, resale, YouTube monetization, marketing campaigns, and feature films. Included automatically with all VeoStudio content.
An image used to guide VEO 3's video generation. VeoStudio uses the scene's first frame as a conditioning image to ensure the video starts with the desired composition, characters, and setting.
VeoStudio's dual-method approach to maintaining flow between scenes: (1) Text continuity where each scene's ending description (endWith) flows into the next scene's opening (nextBegin), and (2) Image continuity where the previous scene's last frame informs the next scene's first frame.
VeoStudio's $19.99/month subscription including unlimited AI scriptwriting, Gemini Banana character generation, frame preview, and all planning tools. Video generation uses separate pay-as-you-go tokens.
Reusable props, objects, or environmental features that can be saved and referenced across multiple scenes. Similar to character libraries but for inanimate objects.
VEO 3's rapid generation mode. Costs 30 tokens ($1.80) per 8-second 1080p clip. Generates videos twice as fast (1-2 minutes) at half the cost with slightly less refinement. Perfect for drafts or B-roll footage.
The opening image of a scene, used as conditioning for VEO 3 video generation. Generated by Gemini 2.5 Flash based on AI descriptions and character references. Ensures videos start with the desired composition.
VeoStudio's scene preview system. AI generates descriptions and Gemini creates images for the first and last frame of each scene before video generation. Allows you to verify composition and continuity without spending tokens on full videos.
Google's multimodal AI model used for character and frame generation. Specifically the "Image Preview" variant that excels at text-to-image and image+text-to-image generation with consistent character rendering.
VeoStudio's character consistency system powered by Gemini 2.5 Flash Image Preview. Generates characters that maintain consistent facial features, wardrobe, and style across all scenes. Each character gets a unique ID for project-wide reuse.
Visual flow maintained through frame references. The last frame of one scene is used as a reference when generating the first frame of the next scene, ensuring consistent visual transitions.
The closing image of a scene. Used for continuity validation and as a reference for the next scene's first frame. Helps maintain visual flow between sequential scenes.
VeoStudio's timeline-based editor for stitching scenes into complete movies. Drag-and-drop interface for arranging clips, trimming, and reordering. Exports final videos in multiple formats.
The complete set of information for a scene including set anchor, characters, camera work, lighting, timeline, and technical specifications. Stored in structured format for AI processing.
VEO 3's premium generation mode. Costs 60 tokens ($3.60) per 8-second 1080p clip. Produces the highest quality output with refined details, smooth movements, and better adherence to prompts. Takes 2-4 minutes per clip.
The number of pixels in a video. All VeoStudio videos are 1080p (1920×1080 pixels), which is Full HD quality suitable for professional distribution and most platforms.
A single 8-second video segment in your project. Each scene includes a prompt, camera work, lighting, timeline, character placements, and continuity information. Projects can have up to 99 scenes.
The primary location or setting for a scene (e.g., "forest clearing," "modern office"). Used by AI to maintain consistent environment descriptions throughout scene generation.
Narrative flow maintained through scene descriptions. Each scene's "endWith" field describes how it concludes, and the next scene's "nextBegin" field picks up from that point, ensuring smooth story progression.
The temporal structure of a scene specifying character positions, actions, and camera movements at different points during the 8-second clip. Used by AI to create precise, well-timed scene descriptions.
VeoStudio's pay-as-you-go currency for video generation. Tokens cost $0.06 each. VEO 3 Quality costs 60 tokens ($3.60) per 8-second clip. VEO 3 Fast costs 30 tokens ($1.80) per clip. AI scriptwriting, character generation, and frame preview are FREE.
Google's third-generation video generation model. VeoStudio uses two variants: veo-3.0-generate-preview (Quality mode, 60 tokens) and veo-3.0-fast-generate-preview (Fast mode, 30 tokens). Both produce 1080p videos with audio at 8 seconds per clip.