Motion & AI

Agentic Video

What happens when AI stops waiting for instructions and starts directing the edit.

Jack Vaughan

I'm Jack Vaughan. I've worked in post-production and motion for over 15 years. Over the last few years I've been automating and improving more and more of my processes with AI. This page is my attempt to make sense of agentic video.

Agentic video is AI-orchestrated video production: the system directs the edit, makes creative decisions, and holds context across the pipeline, rather than generating a single clip from a prompt.

I've spent over a decade making motion graphics. Keyframes, expressions, render queues. The usual. And lately, something has shifted.

I've started using Claude to write animation code. Remotion to render video from React components. AI tools that don't just generate a clip, but participate in the creative process, making decisions about timing, structure, what to include.

People are calling this agentic video. I'm not sure the name will stick, but the shift feels real.

This page is my attempt to make sense of it.

What I mean by "agentic"

Most AI video tools today are reactive. You give a prompt, you get a clip. Useful, but limited. Each output is an island.

Agentic video is different. The AI understands context, makes decisions, orchestrates multiple steps. You might say "find the best moments from this interview and cut a highlight reel" and it actually does it, watching the footage, making editorial choices, assembling the result.

It's the difference between asking someone to hand you a tool and asking them to help you build something.

There's a useful distinction going around: tools like Sora or Runway are black box. You prompt, you get a video. If the cat has three arms, you re-roll and hope. With programmatic video and an LLM that writes code, you get a glass box. You can see the instructions. You can change the colour yourself. That's why the describe-it, get-code, get-video pipeline matters to me.

So what? I'm betting on glass-box, code-first workflows. The bottleneck is shifting from execution to direction.

Why now?

A few things have converged to make agentic video possible in the last year or two. None of them is the whole picture, but together they unlock something new.

  • Vision models that understand video. New multimodal models can process long footage, generate timestamped analyses, and distinguish A-roll from B-roll or identify story beats. An editing agent has to watch and understand the material before it can cut it.
  • Longer context and memory. LLMs can hold a full transcript or storyboard. The agent can remember what was said in minute 2 while working on minute 15.
  • Agents that use tools and APIs. Function calling, code-driven pipelines like Remotion. The AI can actually perform edits, not just suggest them.
  • Better generative models. Clip quality and consistency are improving. You can fill gaps with generated B-roll or synthetic shots when needed.
  • Multimodal integration. Audio, visual, and text in one pipeline. Transcribe, trim filler, re-sync. Editing is inherently multi-modal.
  • Hardware and software. WebGPU, WebCodecs, browser-based editing. Heavy tasks can run server-side while the agent orchestrates; we're getting to the point where an AI editor in the browser is feasible.

Why this matters to me

Motion design involves a lot of repetitive work: keyframing the same animation again, rendering variants, exporting formats. What if that could be delegated to a system that understands your style? The goal isn't to replace the craft. It's to spend less time pushing pixels and more time guiding the idea.

  • Automating the grind. Most time in video goes into editing; an agent could speed that up.
  • Lowering the barrier. Agentic tools give anyone a reasonable first cut on demand; you become more of a director.
  • Quality and taste. Agents might learn what makes a video good, not just how to cut; we're not fully there yet.
  • Scale and personalisation. Same promo in five aspect ratios and three languages; brand consistency at volume.
  • New possibilities. Creative collaborator: feed hours of footage, ask for a storyline; sketch a concept, AI fills in details.
  • Accessibility. Small teams and indies could produce polished video without a big budget.

What can agents do?

It's still early, but we can already see capability buckets that agentic systems are tackling. Think of these as building blocks of a full pipeline.

  • Content analysis and triage. Ingest footage, find best takes, summarize, annotate for search. Classify A-roll vs B-roll, detect when multiple angles cover the same scene. Some tools do this as a first step before any editing.
  • Autonomous editing and orchestration. Rough cuts from intent. Multi-step coordination of models: generate a clip, insert it, add transitions. Context across the edit so the AI remembers earlier choices.
  • Polishing and post-production. Color correction, audio cleanup, filler removal, consistency. One-click to fix an entire timeline's look, or strip "um" and "uh" from speech.
  • Multi-format adaptation. Resize and reframe for platforms (16:9 to 9:16). Cut a longer video into shorter variants. Localisation: translate, new voiceover, adjust timings.
  • Optimization and creative suggestion. Goal-driven edits (e.g. maximise retention). "Agents with taste" that suggest creative flourishes. Iterative refinement: you give high-level feedback, the agent executes and delivers a new version.

Conversations around agentic video

View all episodes

Justin Taylor · Episode 35

Justin Taylor runs Hyper Brew and created the Bolt frameworks that power plugins and automation for creative teams. We talk about pipeline at studios like Buck, the Adobe plugin landscape from expressions to UXP, open source as a business model, and where video tooling is headed—including AI and whether we'll all be writing our own tools soon.

Matt Perry · Episode 18

Matt Perry built Framer Motion and now leads Motion (motion.dev). We cover the history of animation on the web, layout animations, the hybrid rendering engine, and why Motion has explicit AI support. Essential listening for anyone thinking about animation as code and LLMs that can write motion.

Ian Waters · Episode 16

Ian Waters is CTO of Cavalry, a motion design tool built from the ground up. We discuss conceptual design as the foundation of the app, the render engine built on Skia, and how a well-defined architecture enables motion tools to evolve in the direction of automation and programmability.

Arie Stavchansky · Episode 19

Arie Stavchansky founded Dataclay; their flagship product Templater is like mail merge for video. We talk about data-driven video, why After Effects remains resilient, the future of AI-generated content alongside data-driven workflows, and building tools that automate reversioning at scale.

Grant Shaddick · Episode 31

Grant Shaddick is CEO of Tella, a cloud-based recording and editing tool. We talk about editing agents, video in the cloud, design constraints that keep tools simple, and how to balance high-quality output with a delightful, low-friction experience—relevant to anyone thinking about agentic editing.

Isaac Gili · Episode 14

Isaac Gili is co-founder and CEO of Shuffll, an AI video studio for B2B marketing. We discuss generating vectorized art and AI-driven animation, the gap between high-touch launches and fast cadence content, and where this category of technology is heading.

Coming soon

Jonny Berger · Coming soon

Founder of Remotion. Conversation about programmatic video and the describe-it, get-code, get-video pipeline coming soon.

Where I stand

What I'm actually doing

I've been experimenting with Claude for motion graphics, small workflow tools, and treating video production as a conversation. On my podcast I talk with people building these tools; in my tutorials I share what I'm learning. I'm still figuring out where the boundaries are.

Why I'm uncertain

I'm not here to sell you on a vision. I don't know if "agentic video" will stick as a category or if craft will stay central. My instinct is that craft will still matter but the work will shift: less repetitive execution, more direction and taste. The way I work has already changed.

Something is shifting. I'd rather understand it than ignore it.

Real limitations

  • Quality and taste: Outputs can miss on storytelling or brand tone; human review is still common. The glass box (code you can edit) helps; not all tools offer it.
  • Consistency: In longer videos, characters and style can drift. Defining "good" in a way a machine can optimize is tough.
  • Trust and UX: Editors are used to timelines, not chatbots. Building trust needs demonstrable reliability.
  • Ethics and law: Copyright, consent, and authenticity are open questions. The goal is to augment editors, not replace them.

Dataset

Tools and sources from ongoing research. Get in touch for the full dataset →

ToolCategoryDescription
RemotionEngines & FrameworksWrite video as React components; render programmatically. Core to the describe-it, get-code, get-video pipeline.
Motion (motion.dev)Engines & FrameworksAnimation library for the web with explicit AI support; LLMs can generate motion code.
Hyper Brew / BoltEngines & FrameworksPlugins and automation for creative teams; pipeline tooling for studios.
Descript UnderlordEditingAI co-editor: apply polish edits (audio cleanup, jump-cut removal, leveling) in one go.
Adobe Premiere Pro (AI agent)EditingPreviewed AI co-pilot for Premiere: pull selects, suggest cuts, remove filler, color and audio adjustments.
Eddie (heyeddie.ai)EditingAI-powered editing assistant for content creators.
Cardboard (YC W26)EditingAgentic video editor: upload footage, give high-level instructions to an AI Director, get a publish-ready cut.
Kino.aiEditingBrowser-based AI editing; import from Premiere, FCP, Avid; AI search and agentic editing with shareable URLs.
K2 (k2.video)EditingAI editing assistant: auto pull best takes, choose highlights, assemble rough cuts for teams.
Veed.io AIEditingAI editing and captioning in the browser; auto highlights and formatting for social.
Vidyo.aiEditingAuto highlight reels and short clips from long videos (e.g. streams, Zoom); captions and aspect ratios.
WisecutEditingAuto-edit talking-head videos: cut silences, add background music.
Runway (Gen-2 / Gen-3, Storyboard)Video GenText-to-video and image-to-video; timeline edits via natural language; style transfer.
Pika LabsVideo GenGenerate and edit short video clips from prompts; stylized output.
KaiberVideo GenAI video generation; music videos and stylized sequences from prompts.
OpenAI SoraVideo GenText-to-video and image-to-video; high fidelity, dynamic clips with audio.
ShuffllDesignAI video studio for B2B marketing; vectorized art, AI-driven animation, on-brand output.
CapsuleDesignEnterprise templates; teams create once, anyone generates on-brand variants via form.
FocalDesignAgentic AI tools that produce full videos from high-level input.
SynthesiaAvatarsAI avatars and voice-over; generate presenter-led videos from scripts.
HeyGenAvatarsAvatar video and voice cloning; translate and dub into multiple languages.
Wonder Studio (Wonder Dynamics)VFX & SpecialistAuto animate, light, and compose CG characters into live-action; AI-driven VFX.
GlifAgents & LLMsAgents that coordinate multiple models; chain image, video, and motion for user goals.
ViMax (HKUDS/ViMax, GitHub)Agents & LLMsOpen-source multi-agent video framework; director, screenwriter, producer; character and scene consistency.
ShotstackEngines & FrameworksCloud video editing API; JSON timeline to rendered video; AI video generator endpoint.
Figma (Weave)DesignAI canvas for design; generative nodes and workflow; design-to-motion pipeline.
JitterMotionBrowser-based motion design; AI to animate text and icons by description (e.g. "bounce like a ball").
DynascoreAudioDynamic music engine; composes music that syncs to your edit and recomposes when you change the cut.
Anthropic (Claude)Agents & LLMsLLM with tool use and long context; used to write motion code, scripts, and orchestrate pipelines.

Common questions

AI that doesn't just generate a clip when you ask, but orchestrates an entire production: research, scripting, editing, audio. The AI makes creative decisions and holds context across the edit, not just a single prompt. It's the layer that directs, not only generates.

Those tools are black box: you prompt, you get a clip. If something's wrong, you re-roll. Agentic video that uses code (e.g. Remotion plus an LLM) is glass box: you get instructions you can see and edit. You're not hoping for a good result; you're auditing and fixing the code that created it. It's the difference between generating a photo and directing a shoot.

Partly. The pieces are emerging. I've used Claude to write motion graphics code, Remotion to render video programmatically, various tools to automate parts of the process. It's not one smooth pipeline yet, but it's further along than most people realise.

I genuinely don't know. My instinct is that craft will still matter, but the nature of the work will shift. Less time on repetitive execution, more on creative direction and taste. I'm not resolving that tension here. I'm watching carefully.

Early systems are tackling five kinds of work: content analysis and triage (ingest footage, find best takes, summarize); autonomous editing and orchestration (rough cuts from intent, multi-step coordination of models); polishing and post-production (color, audio cleanup, filler removal); multi-format adaptation (resize for platforms, localization); and optimization and creative suggestion (goal-driven edits, iterative refinement from feedback). No single tool does everything yet, but we're seeing glimpses of each.

The term highlights the agency of the AI: it has a degree of autonomy. It doesn't just react to one prompt; it plans and executes multiple steps. I'm not sure the name will stick. People also use 'AI video assistant,' 'video GPT,' 'autonomous video creator.' What matters is the trend: video creation moving from manual, labour-intensive craft toward AI-assisted process where the technical heavy-lifting can be offloaded.

Quality and taste (human review still common), consistency in longer videos, trust and UX (editors prefer timelines to chatbots), and ethics around copyright, consent, and authenticity. I've summed these up in the "Where I stand" section above.

If you want to go deeper

My podcast features conversations with people building these tools. My tutorials share what I'm learning as I go.

Working on something similar?

I'm always curious to hear from people exploring this space. If you're experimenting with AI and video production, or just thinking about where this is all heading, I'd enjoy the conversation.

Online or in person · Edinburgh