AI Story Videos: Full Pipeline From Research to Final Edit
Build a complete faceless story video using Minimax Audio, Hailuo AI, and CapCut, the exact system behind channels earning $100K+/year from AI content.

There's a faceless animated rabbit channel on YouTube with 2.8 million subscribers that keeps going viral. SocialBlade estimates the channel "Fern", a history storytelling channel with 3.9 million subscribers, earns up to $900,000 per year. Both channels follow the same basic format: short narrated story, cinematic or animated visuals, no face, no crew.
Here's the exact pipeline to replicate it, from finding a topic to exporting the final video, using free tiers of three tools.
Step 1: Find Viral Gold Mines Before You Write a Word#
Most creators pick a story they find interesting and wonder why it doesn't perform. The research step is what separates channels that scale from ones that stall.
Open YouTube in an incognito window. Incognito strips your watch history, so YouTube serves fresher, less personalized results. Search your topic, "history stories," "animal stories," whatever niche you're targeting. Then filter: set duration to 4–20 minutes and upload date to this week. Scroll through what's getting views.
Look for title patterns, not just topics. On Fern, four of the five most popular videos follow the same structure: "The [Thing] That [Did Something]." The Hunt for the King of the Web. The Kids Who Hacked the CIA. Simple constructions, strong curiosity gap.
Story themes that consistently hit 500,000+ views share a few traits: a strong emotional hook, something shareable (cute, funny, or relatable), and something that makes people comment, either because it's controversial or because it's personally resonant. In terms of monetization, the highest-CPM topics cluster around finance, history with moral stakes, underdog stories, and what-if scenarios.
Once you have a topic direction, build your storyboard in a doc. Break it into narration segments and scene descriptions side by side, the narration goes into your voice tool, the scene descriptions go into your video generator. Having both in one place before you touch any tool saves time later.
For generating attention-grabbing story hooks with AI, there are prompt workflows built specifically for this that can speed up the writing phase considerably.
Step 2: Generate the Voiceover with Minimax Audio#
Minimax Audio has a free tier with monthly credits, and it gives you three distinct paths for voice creation.
Path 1: Pre-built library voices. Browse the voice library and audition characters. For a historical narration style, a "captivating storyteller" voice works well out of the box. Once you select a voice, you can adjust modifiers, deeper or lighter, stronger or softer, nasally or crisper, and set playback speed. A speed of 1.2x is the sweet spot for keeping engagement up without sounding rushed.
Punctuation controls pacing more than you'd expect. Commas and periods add pauses. Removing punctuation tightens delivery. Experiment with both before you commit to a final take.
Path 2: Voice cloning. Record 30 seconds of your own voice directly in the platform. Minimax clones it, captures your accent and cadence, and outputs narration that sounds like you actually read it. The clone quality is close enough that you can build a consistent channel identity without recording every video.
Path 3: Voice design from scratch. If you don't want to use your own voice or any pre-built option, the voice design tool lets you describe what you want. Prompt it with something like "warm female voice with a slight British accent, storybook reader, children's story narrator." It generates three options to preview. Pick the one that fits, name it, save it, and it's available in your editor going forward.
For model selection: the Speech 2.5 HD Preview gives the best quality. The 2.5 Turbo is about 40% cheaper and still solid. One character costs roughly 0.6 credits in turbo mode, a full scene narration runs around 300 credits.
Download each audio segment as you go. You'll need them labeled and organized for the editing step.
Step 3: Generate Cinematic Scenes with Hailuo AI Agent#
Hailuo AI recently released an Agent mode, and it changes the batch generation workflow entirely. Instead of prompting each clip individually, you paste one structured document, global style prompt at the top, then each scene description, and the agent processes all of them in sequence.
In the model settings, select Nano Banana, Cream, and Flux Context for image generation, and Hailuo O2 for video. Each video clip costs between 25 and 80 credits depending on complexity.
The format of your input matters. Structure it as:
- Global style note (e.g., "cinematic lighting, muted earth tones, slow push-in camera moves")
- Scene 1 description
- Scene 2 description
- ... and so on
Paste the whole thing into Agent mode, confirm your model selections, and hit enter. The agent parses the global style, splits out individual scenes, and generates each clip one by one. For a 10-scene video, it runs through all of them without you touching anything.
Run the same process for each story you're producing. If you're building two videos in a session, say, an Isaac Newton piece and an animated animal story, you can queue both agents while you work on something else.
This is meaningfully different from how most people use AI video tools. The manual approach, where you prompt one clip, download it, prompt the next, is slow enough that most people give up or cut corners. Batch generation through the agent removes that friction. If you want to see how this kind of automation thinking applies across a full content workflow, this breakdown of automating a YouTube production stack is worth reading alongside this one.
Step 4: Assemble in CapCut#
CapCut handles the final edit. Import your video clips and match them to your audio segments. The narration drives the cut, each audio paragraph maps to one scene clip.
A few things that speed this up:
- Label your audio files by scene number before you import anything
- Trim clips to match audio length rather than cutting audio to match clips
- Add captions using CapCut's auto-caption feature; on short-form content especially, captions increase watch time
The whole assembly step, once your assets are organized, takes under five minutes for a short video.
What the Full Pipeline Actually Looks Like#
Research: 10–15 minutes in incognito YouTube to identify a proven title structure and emotional hook.
Story and storyboard: Write narration segments and scene descriptions in parallel in a doc. If you're using an AI writing workflow, this can go fast, the storyboard format is simple and repeatable.
Voiceover: Select or design a voice in Minimax Audio, paste in your narration segments, generate and download. Budget 10–15 minutes for tweaking speed and punctuation.
Video generation: Paste your storyboard into Hailuo Agent, set your models, let it run. This is mostly wait time.
Final edit: Import everything into CapCut, sync audio to clips, add captions, export.
The full cycle, from blank doc to finished video, runs under 20 minutes once you've done it once. All three tools have free tiers. The format is proven at scale. The only variable is whether you do the research first or skip it and wonder why the video doesn't perform.
For a different take on faceless video production using a single integrated platform, this walkthrough of AI stick figure animation for finance channels shows how much you can compress the workflow when you're not chasing cinematic quality.
Watch the full video on YouTube: https://youtu.be/bqO0J0L2znk
This post contains affiliate links. I only recommend tools I actually use.
Get new videos in your inbox
Weekly AI workflows. No fluff.
No spam. Unsubscribe anytime.