Motion & AI

Agentic Video

What happens when AI stops waiting for instructions and starts directing the edit.

Jack Vaughan

I'm Jack. I've worked in post-production and motion for over 15 years. Over the last few years, I've been automating and improving more and more of my workflows with AI. Gradually, my work has started to sit somewhere between video and engineering. This page is my attempt to map the current state of agentic video, and where things seem to be heading.

What I mean by "agentic"

Most AI video tools today are reactive. You give a prompt, you get a clip. Useful, but limited. Each output stands alone.

When people talk about agentic video, they are usually pointing at something broader than this. Not a single tool or model, but the use of AI across a video pipeline to observe, decide, and act at multiple stages, rather than producing one isolated result.

A quick caveat. This is not a fixed category with clear edges. The term is new, the space is early, and there is no single agreed definition yet.

It helps to think of agentic video as a spectrum. At one end are tools that apply AI to a single task, like silence cutting, captions, or upscaling. At the other end are emerging systems that begin to coordinate decisions across an entire edit: watching footage, holding context, and influencing multiple steps in the process.

Most real tools and workflows today sit somewhere in between. If you are a video producer, you are probably already using AI in several parts of your pipeline. When those decisions start to connect and compound, that is where something more agentic begins to emerge.

This page attempts to map that full landscape as it exists today, without pretending it is settled, and to explore where this shift in how video is made may be heading.

Why now?

A few things have converged to make agentic video possible in the last year or two. None of them is the whole picture, but together they unlock something new.

Vision models that understand video. New multimodal models can process long footage, generate timestamped analyses, and distinguish A-roll from B-roll or identify story beats. An editing agent has to watch and understand the material before it can cut it.

Longer context and memory. LLMs can hold a full transcript or storyboard. The agent can remember what was said in minute 2 while working on minute 15.

Agents that use tools and APIs. Function calling, code-driven pipelines like Remotion. The AI can actually perform edits, not just suggest them.

Better generative models. Clip quality and consistency are improving. You can fill gaps with generated B-roll or synthetic shots when needed.

Multimodal integration. Audio, visual, and text in one pipeline. Transcribe, trim filler, re-sync. Editing is inherently multi-modal.

Hardware and software. WebGPU, WebCodecs, browser-based editing. Heavy tasks can run server-side while the agent orchestrates; we're getting to the point where an AI editor in the browser is feasible.

Why this matters to me

Motion design involves a lot of repetitive work: keyframing the same animation again, rendering variants, exporting formats. What if that could be delegated to a system that understands your style? The goal isn't to replace the craft. It's to spend less time pushing pixels and more time guiding the idea.

Automating the grind. Most time in video goes into editing; an agent could speed that up.

Lowering the barrier. Agentic tools give anyone a reasonable first cut on demand; you become more of a director. Some call that vibe editing: you set the direction, the system does the cut.

Quality and taste. Agents might learn what makes a video good, not just how to cut; we're not fully there yet.

Scale and personalisation. Same promo in five aspect ratios and three languages; brand consistency at volume.

New possibilities. Creative collaborator: feed hours of footage, ask for a storyline; sketch a concept, AI fills in details.

Accessibility. Small teams and indies could produce polished video without a big budget.

What can agents do?

It's still early, but we can already see capability buckets that agentic systems are tackling. Think of these as building blocks of a full pipeline.

Content analysis and triage. Ingest footage, find best takes, summarize, annotate for search. Classify A-roll vs B-roll, detect when multiple angles cover the same scene. Some tools do this as a first step before any editing.

Autonomous editing and orchestration. Rough cuts from intent. Multi-step coordination of models: generate a clip, insert it, add transitions. Context across the edit so the AI remembers earlier choices.

Polishing and post-production. Color correction, audio cleanup, filler removal, consistency. One-click to fix an entire timeline's look, or strip "um" and "uh" from speech.

Multi-format adaptation. Resize and reframe for platforms (16:9 to 9:16). Cut a longer video into shorter variants. Localisation: translate, new voiceover, adjust timings.

Optimization and creative suggestion. Goal-driven edits (e.g. maximise retention). "Agents with taste" that suggest creative flourishes. Iterative refinement: you give high-level feedback, the agent executes and delivers a new version.

Conversations around agentic video

View all episodes

Justin Taylor · Episode 35

Justin Taylor runs Hyper Brew and created the Bolt frameworks that power plugins and automation for creative teams. We talk about pipeline at studios like Buck, the Adobe plugin landscape from expressions to UXP, open source as a business model, and where video tooling is headed—including AI and whether we'll all be writing our own tools soon.

Matt Perry · Episode 18

Matt Perry built Framer Motion and now leads Motion (motion.dev). We cover the history of animation on the web, layout animations, the hybrid rendering engine, and why Motion has explicit AI support. Essential listening for anyone thinking about animation as code and LLMs that can write motion.

Ian Waters · Episode 16

Ian Waters is CTO of Cavalry, a motion design tool built from the ground up. We discuss conceptual design as the foundation of the app, the render engine built on Skia, and how a well-defined architecture enables motion tools to evolve in the direction of automation and programmability.

Arie Stavchansky · Episode 19

Arie Stavchansky founded Dataclay; their flagship product Templater is like mail merge for video. We talk about data-driven video, why After Effects remains resilient, the future of AI-generated content alongside data-driven workflows, and building tools that automate reversioning at scale.

Grant Shaddick · Episode 31

Grant Shaddick is CEO of Tella, a cloud-based recording and editing tool. We talk about editing agents, video in the cloud, design constraints that keep tools simple, and how to balance high-quality output with a delightful, low-friction experience—relevant to anyone thinking about agentic editing.

Isaac Gili · Episode 14

Isaac Gili is co-founder and CEO of Shuffll, an AI video studio for B2B marketing. We discuss generating vectorized art and AI-driven animation, the gap between high-touch launches and fast cadence content, and where this category of technology is heading.

Coming soon

Jonny Berger · Coming soon

Founder of Remotion. Conversation about programmatic video and the describe-it, get-code, get-video pipeline coming soon.

Where I stand

What I'm actually doing

I've been experimenting with Claude for motion graphics, small workflow tools, and treating video production as a conversation. On my podcast I talk with people building these tools; in my tutorials I share what I'm learning. I'm still figuring out where the boundaries are.

Why I'm uncertain

I'm not here to sell you on a vision. I don't know if "agentic video" will stick as a category or if craft will stay central. My instinct is that craft will still matter but the work will shift: less repetitive execution, more direction and taste. The way I work has already changed.

Something is shifting. I'd rather understand it than ignore it.

Real limitations

Quality and taste: Outputs can miss on storytelling or brand tone; human review is still common. The glass box (code you can edit) helps; not all tools offer it.

Consistency: In longer videos, characters and style can drift. Defining "good" in a way a machine can optimize is tough.

Trust and UX: Editors are used to timelines, not chatbots. Building trust needs demonstrable reliability.

Ethics and law: Copyright, consent, and authenticity are open questions. The goal is to augment editors, not replace them.

Dataset

Tools and sources from ongoing research.

I've grouped 100+ tools by what they do: the engines that generate footage, the editors that work with it, and the bits that connect them. Below is one table per category, with a short note on each.

The Foundation Engines (Generative Video Models)

The models that actually generate the video. Rough split: ones built for high-quality, cinematic output (slower, heavier) and ones built for speed and volume (social, short-form).

ToolDescription
CogVideoX (Zhipu AI)Open-source; efficient so it runs on consumer hardware. Useful for decentralized or local video agents.
DynamiCrafter (Tencent)Open-source; animates still images with realistic physics and HDR.
GenmoReplay and 3D-aware video; community template library to remix working parameters.
Google Veo 3.1Leader in high-fidelity 1080p+ video with native audio sync. Consistent character persistence across shots for narrative and B-roll.
Hailuo (Minimax)Cinematic motion specialist; handles complex human movement and character interaction with fewer morphing artifacts. Strong for narrative shorts.
HotshotGIFs and short loops; temporal stability over length. For memes and social reactions.
Hunyuan Video (Tencent)Large foundation model; strong for Asian cultural context and APAC-localized marketing.
I2VGen-XLImage-to-video; often used inside larger agentic pipelines to animate static assets.
KaiberAI video generation; music videos and stylized sequences from prompts.
Kaiber 2.0Audio-reactive video: analyzes the track and modulates visuals in real time. Standard for AI music videos and concert visuals.
Kling AI 2.6Long clips (up to 3 min), motion brush, lower cost per frame. Volume leader for social content and rapid video prototyping.
Luma Dream MachineDefine first and last frame; AI generates the interpolation. Vital when you need a precise visual transition between two known states.
MorphicKeyframe-to-video; animators set keyframes, diffusion fills between. Bridges keyframing and generative.
OpenAI SoraText-to-video and image-to-video; high fidelity, dynamic clips with audio.
Pika 2.5Stylized animation and lip-sync; playful look. Fits social and stylized marketing.
Pika LabsGenerate and edit short video clips from prompts; stylized output.
Runway (Gen-2 / Gen-3, Storyboard)Text-to-video and image-to-video; timeline edits via natural language; style transfer.
Runway Gen-3 Alpha & Gen-4Creative control niche: Motion Brush and Director Mode let animators dictate camera and element motion in-frame. General world model for artists.
Sora 2 (OpenAI)Benchmark for complex physics and world-building; strong on object interactions and long-form coherence. Now includes sound.
Stable Video Diffusion (SVD)Open weights; inconsistent raw output but heavily fine-tuned and hacked. Engine for many niche apps.
Wan 2.6 (Alibaba)Open-weight model; self-host or fine-tune. Backbone for many custom enterprise video agents.

Agentic Editors & Intelligent Workbenches

Tools that don't just generate clips, they edit. They sit in or alongside your NLE and take on the grunt work: logging, rough cuts, polish. Like an assistant editor that doesn't sleep.

ToolDescription
Adobe Premiere Pro (AI agent)Adobe's native AI copilot in Premiere: Generative Extend, object removal in timeline, deep Creative Cloud integration for pros.
AutoCutPremiere plugin: auto delete silences, zoom on punch moments. Speeds up social dialogue rhythm.
Cardboard (YC W26)Browser-based NLE that uses multimodal LLMs to reason over footage. Upload assets, prompt goals like a 30s hook; get a coherent edit in minutes. Vibe editing.
Colourlab AIMatch footage to a reference look; perception-based grading. Lowers the bar for cinematic color.
Descript UnderlordAutomates the full audio-visual polish workflow: multi-cam sync, filler removal, studio sound. Edit via text transcript; gold standard for podcasters.
Eddie (heyeddie.ai)For professional editors: auto-logs footage, identifies A-roll vs B-roll, builds rough cuts via chat. Exports XML to Premiere; bridges AI and the NLE.
Firefly (Premiere Pro)Adobe’s AI in Premiere: Generative Extend, object removal in timeline. Fits Creative Cloud workflows.
GlingBuilt for YouTubers: uses audio analysis to auto-cut silences, bad takes, and flubs from raw recording. Saves hours of pre-cutting.
K2 (k2.video)AI editing assistant: auto pull best takes, choose highlights, assemble rough cuts for teams.
Kino.aiBrowser-based AI editing; import from Premiere, FCP, Avid; AI search and agentic editing with shareable URLs.
KlapYouTube to TikTok at volume; AI captions and face-centering.
MunchPulls viral-style clips from long-form; trend-aware; auto-crops and captions for TikTok/Reels.
OpusClipFinds hooks in video podcasts; reformats horizontal to vertical with face-tracking.
RecutSilence cutting with simple UX; exports XML for DaVinci Resolve.
Runway (Editor)Beyond generation: web editor with Green Screen (rotoscoping) and Inpainting. Industry staples for VFX and cleanup.
Spingle AIMultimodal search over asset libraries; find and organize B-roll. Cuts down the hunting phase.
SubmagicDynamic captions: emojis, keyword highlights, animated text. Built for retention on short-form.
TimeBoltDesktop: speed-cut silence and scene removal. Often used before bringing into Premiere.
Topaz Video AIIndustry standard for AI upscaling, de-interlacing, and motion interpolation. Restore archival footage or upscale generative output to 4K.
Veed.io AIAI editing and captioning in the browser; auto highlights and formatting for social.
Vidyo.aiAuto highlight reels and short clips from long videos (e.g. streams, Zoom); captions and aspect ratios.
WisecutAuto-edit talking-head videos: cut silences, add background music.
Wondershare FilmoraAI Copilot Editing: conversational assistant, smart cutouts, motion tracking for prosumers.

Orchestration, Workflow & "No-Code" Directors

Where you build or chain your own video workflows. You define the process, not just run a single task. The glue between engines and editors.

ToolDescription
Anthropic (Claude)LLM with tool use and long context; used to write motion code, scripts, and orchestrate pipelines.
ComfyUINode-based UI for Stable Diffusion and SVD; granular control. Engine for many custom pipelines.
FlowithVisual canvas for multi-step creative workflows (video, image). Manages state for long-running tasks.
GlifGithub for AI workflows: chain models (e.g. GPT-4, Midjourney, Runway) into mini-apps. Node-based interface for creative video pipelines.
GumloopConnects AI models to Slack, Notion; e.g. when a post publishes, generate a video summary.
LangChainLLM framework with memory and context. Core for custom chat-to-video agents.
LangflowDrag-and-drop UI for LangChain; useful for prototyping video agents.
Make (Integromat)Automation with deep AI video API ties (Runway, Synthesia). Trigger-based video workflows.
n8nSelf-hosted workflow automation; for teams that want video agent pipelines and data staying in-house.
OverlapWorkflow tool for podcasters: clips, captions, vertical formats from long-form.
Relevance AIB2B agent builder; multi-agent workforces (e.g. researcher → scriptwriter).
Stack AIEnterprise LLM platform; templates for video analysis and generation workflows.
VideoAgent (Framework)Open-source; graph-based planning turns high-level intents into editing actions.
ViMax (HKUDS/ViMax, GitHub)Open-source multi-agent video framework; director, screenwriter, producer; character and scene consistency.
VoiceflowConversational AI builder; increasingly used to orchestrate video agents and avatars.
ZapierCanvas for visual AI video workflows; connects most major video APIs.

Digital Humans, Avatars & Performance Transfer

Avatars and digital humans. Moving from static talking heads to more expressive, even full-body, performance. Still evolving quickly.

ToolDescription
AkoolFace swap and realistic avatars for marketing.
ArgilTrain a digital twin; create social content without filming. For influencers and thought leaders.
ColossyanL&D focus; scenario-based learning agents and actor modes for workplace training.
D-IDStill image to talking video; real-time avatars for chatbots and conversational AI.
Deep-Video (Deevid.ai)High-fidelity digital personas; aims to minimize uncanny valley for high-end use.
EMO (Alibaba)Emote Portrait Alive: expressive portrait video from audio; strong on singing and emotion.
HedraCharacter-first; dramatic performance and storytelling, not corporate talking heads.
HeyGenLeader in viral marketing avatars. Instant Avatar (clone from 2 min of footage), accurate lip-sync and translation. Ease of use meets high fidelity.
HyperrealHigh-end VFX avatars; volumetric capture and re-animation for film and TV.
Soul MachinesAutonomous digital people for customer service; breathe, blink, react to input.
Spirit MeMobile-first; quick self-digitization for social content.
SynthesiaEnterprise standard for AI avatars. Expressive avatars and video agents that can interact in real time; staple for corporate training and comms.
TavusProgrammatic personalization: thousands of variants (e.g. name in script). Big in sales outreach.
VASA-1 (Microsoft)Research: lifelike talking face from one image + audio. Not fully public yet.
Virbo (Wondershare)Avatar generator in Wondershare ecosystem; tuned for e-commerce and marketing.

Specialized VFX, 3D & Motion Design

The specialist stuff: VFX, 3D, motion design. Jobs that used to need years in After Effects, Nuke, or Blender. Now there are tools that do a lot of it for you.

ToolDescription
CapsuleEnterprise templates; teams create once, anyone generates on-brand variants via form.
CSM (Common Sense Machines)Cube model: 2D images or video into 3D assets for virtual worlds.
DeepMotionBrowser-based AI motion capture; video to FBX/BVH for games and 3D.
Domoo3D to video; keeps geometry and lighting consistent.
EbSynthStyle transfer: one painted frame drives the look for the whole sequence.
Figma (Weave)AI canvas for design; generative nodes and workflow; design-to-motion pipeline.
FocalAgentic AI tools that produce full videos from high-level input.
Hunyuan3DTencent: 3D assets from text or images for game and video.
JitterFigma for motion design. AI animates UI/UX elements via text prompts (e.g. slide in with a bounce), streamlining interface animation for video and web.
Krea AIUpscale and enhance; fix or add detail to low-res generative video.
LottieFilesLottie platform; AI plugins to generate vector motion from text.
Luma Interactive ScenesGaussian splatting from video; navigable 3D environments. For location scouting and virtual production.
Marble (World Labs)Persistent 3D worlds from prompts, images, or video; real-time edit and export.
MeshcapadeAI human avatars from video; body metrics and 3D reconstruction for fashion and fitness.
Meta SAM 3DSegment Anything in 3D; reconstruct objects and humans from single photos.
Move.aiMarkerless motion capture from standard cameras; high-fidelity animation for 3D characters.
ProtnGenerative 3D textures and assets for game and video pipelines.
RiveInteractive animation; AI helps build state machines for app and game motion.
Runway Green ScreenAI rotoscoping; remove backgrounds from complex footage quickly.
ShuffllAI video studio for B2B marketing; vectorized art, AI-driven animation, on-brand output.
Skybox (Blockade Labs)360° skybox environments for 3D and video backdrops from a prompt.
Spline AI3D design with AI copilot; objects, textures, scenes from text for web and video.
Tripo AIFast text-to-3D; useful for populating scenes in video.
Wonder Studio (Wonder Dynamics)Automatically animates, lights, and composites CG characters into live-action plates. Puts a VFX artist in a box.

Personalized, Sales & Programmatic Video

Built for scale: thousands of variants for sales, e-commerce, marketing. Data in, video out.

ToolDescription
ArcadsAI actors for ads; UGC-style talent for TikTok/Reels without casting.
BHumanPersonalized video at scale; clone face/voice for cold outreach to many leads.
CreatifyApp store listing to video ads; streamlines UA creative.
Gan.aiEnterprise video personalization; high fidelity and security for big brands.
Invideo AIPrompt to video: stock, script, voiceover; v4 adds AI avatars for influencer-style.
MaverickAI video messages for e-commerce; abandoned cart and personalized offers.
Motion (Creative Strategist)Uses ad performance data (Meta, TikTok) to suggest new video variations and scripts.
PictoryText or blog posts to branded video; content marketing and SEO video.
PippitVideo agent for product promotion; aligns tone with trending short-form.
PotionSales video personalization; screen-recording style for B2B SaaS.
Reachout.aiSales engagement: email plus personalized AI video to boost opens and replies.
Revid.aiTemplates for repurposing social content; Auto Mode for fast creation.
Tagshop.AIUGC-style ads; videos that feel like user reviews or influencer content.
Topview.AIURL to video: scrapes product links and generates video ads from viral patterns.
WindsorPost-purchase thank-you videos for e-commerce; loyalty and churn reduction.

Audio Intelligence, Dubbing & Music

Video is half audio. Voice, music, dubbing, cleanup. These tools cover that side.

ToolDescription
Adobe PodcastEnhance Speech: industry standard for cleaning up bad audio with one click. Video and podcast.
AIVAAI music for classical and cinematic; control over emotional arc.
AudioShakeStem separation: vocals, drums, bass from one file. Remix or strip vocals.
CleanvoiceRemoves filler words, mouth sounds, stuttering from podcasts and voiceover.
DeepDubEnterprise dubbing for film/TV; keeps emotion in voice for localization.
Descript (Studio Sound)Studio Sound removes echo and background noise so iPhone audio sounds like a studio. Essential for video and podcast.
DubverseGenerative dubbing; speed and many languages. Strong in education.
DynascoreDynamic music engine that composes to your edit and recomposes when you change the cut. Music stays in sync with picture.
ElevenLabsAI voice and Dubbing Studio; translate and dub while keeping voice and tone.
Epidemic Sound (Soundmatch)AI matches footage to tracks from their human-composed library.
GapsLip-sync correction for dubbed video; align picture to new audio.
PapercupHuman-in-the-loop dubbing; AI does the work, humans QC. Broadcast quality.
Rask.aiVideo localization; lip-sync so mouth matches translated audio.
SoundrawCustomizable AI music; edit length, tempo, mood to fit the cut.
SunoFull songs from text (lyrics and vocals). For custom, royalty-free soundtracks.
UdioHigh-fidelity music generation; complex structures and realistic vocals for backtracks.

Infrastructure & Developer Tools

APIs and dev tools. What developers use to build video into their own apps.

ToolDescription
ApifyScraping and data; feed product or web data into video agents.
BannerbearImage and video automation via API; simple for social bots.
CloudinaryMedia API; AI cropping, background removal, video summarization.
CreatomateAPI for video automation; visual template editor for programmatic video.
Fal.aiFast inference for media models; powers many real-time generation apps.
Hugging FaceHub for open-source models; diffusers and community models.
Hyper Brew / BoltPlugins and automation for creative teams; pipeline tooling for studios and motion workflows.
JSON2VideoJSON to video API; low-code automation.
LivepeerDecentralized video; GPU networks for low-cost transcoding and generation.
Motion (motion.dev)Animation library for the web with explicit AI support; LLMs can generate motion code for UI and narrative animation.
MuxVideo streaming; adding agentic capabilities for screen recording and optimization.
PlacidNo-code API for branded visuals and video; integrates with Airtable etc.
PlainlyVideo versioning from After Effects templates; B2B and high-fidelity branding.
RemotionReact for video: write video programmatically using React code, bridging web dev and video. Core to describe-it, get-code, get-video pipelines.
ReplicateRun open-source models (Llama, SD, etc.) via API. Core for custom agents.
ShotstackCloud video editing API: define timelines in JSON, get rendered video at scale. Programmable video generation for apps.
Twelve LabsVideo search API; multimodal indexing, Ctrl+F for video, semantic search.

Common questions

AI that doesn't just generate a clip when you ask, but orchestrates an entire production: research, scripting, editing, audio. The AI makes creative decisions and holds context across the edit, not just a single prompt. It's the layer that directs, not only generates.

Those tools are black box: you prompt, you get a clip. If something's wrong, you re-roll. Agentic video that uses code (e.g. Remotion plus an LLM) is glass box: you get instructions you can see and edit. You're not hoping for a good result; you're auditing and fixing the code that created it. It's the difference between generating a photo and directing a shoot.

Partly. The pieces are emerging. I've used Claude to write motion graphics code, Remotion to render video programmatically, various tools to automate parts of the process. It's not one smooth pipeline yet, but it's further along than most people realise.

I genuinely don't know. My instinct is that craft will still matter, but the nature of the work will shift. Less time on repetitive execution, more on creative direction and taste. I'm not resolving that tension here. I'm watching carefully.

Early systems are tackling five kinds of work: content analysis and triage (ingest footage, find best takes, summarize); autonomous editing and orchestration (rough cuts from intent, multi-step coordination of models); polishing and post-production (color, audio cleanup, filler removal); multi-format adaptation (resize for platforms, localization); and optimization and creative suggestion (goal-driven edits, iterative refinement from feedback). No single tool does everything yet, but we're seeing glimpses of each.

The term highlights the agency of the AI: it has a degree of autonomy. It doesn't just react to one prompt; it plans and executes multiple steps. I'm not sure the name will stick. People also use 'AI video assistant,' 'video GPT,' 'autonomous video creator.' What matters is the trend: video creation moving from manual, labour-intensive craft toward AI-assisted process where the technical heavy-lifting can be offloaded.

Quality and taste (human review still common), consistency in longer videos, trust and UX (editors prefer timelines to chatbots), and ethics around copyright, consent, and authenticity. I've summed these up in the "Where I stand" section above.

If you want to go deeper

My podcast features conversations with people building these tools. My tutorials share what I'm learning as I go.

Working on something similar?

I'm always curious to hear from people exploring this space. If you're experimenting with AI and video production, or just thinking about where this is all heading, I'd enjoy the conversation.

Online or in person · Edinburgh