use case·8 min read·June 3, 2026

How to Make an Educational Video with AI

The exact workflow for making educational videos with AI in 2026: instructor avatars, visual aids, on-screen diagrams, and captions. From $25 in compute, not $1500.

You can make an educational video with AI by combining an instructor avatar (Higgsfield Soul 2.0), generated visual aids and diagrams (Nano Banana Pro), and supporting motion clips (Kling 3.0), then assembling with auto-captions. A 10-minute Python tutorial broken into six segments runs about $17 to $25 in model credits versus $1,200 to $1,500 for a traditional production day.

TL;DR

Use Higgsfield Soul 2.0 for the instructor avatar. One reference portrait, all modules.
Generate diagrams and visual aids with Nano Banana Pro. Verify every factual claim before publishing.
Route by audience: STEM needs diagram accuracy first, language needs instructor naturalness, business needs clean slide-style visuals.
A 10-minute Python tutorial: 6 segments at roughly $3 to $4 each, exported with captions, ready for Teachable, Udemy, or YouTube.

Where this workflow fits

Online course module. Identity consistency across 12 or 20 modules is essential, and most platforms require ADA-compliant captions. Higgsfield Soul 2.0 is the anchor model.

YouTube tutorial. Viewers tolerate a slightly less polished look if the explanation is fast. Generate fewer variants per segment and put the savings into a stronger thumbnail (Nano Banana Pro, 20 seconds per image).

K-12 explainer. Short segments (2 to 4 minutes), bright diagram colors, clear labels. Kling 3.0 for animated concept clips where the visual needs to move but doesn't need a human face: water cycles, cell division, basic physics.

Corporate L&D. Neutral tone, slide-style Nano Banana Pro visuals, no cinematic b-roll. Identity consistency matters because employees see the same training face across modules over months.

Museum or exhibit content. Kling 3.0 motion clips for historical scenes, geological processes, or scientific phenomena, with a voiceover rather than an on-screen presenter.

The 4-step workflow

Step 1: Generate the instructor avatar

Model: Higgsfield Soul 2.0

Start with a single reference portrait: front-facing, neutral or slightly warm expression, clean lighting. Generate it with Nano Banana Pro if you don't have a photo you own the rights to. Every module uses this same image, which is what keeps viewers from noticing they're watching different generation sessions.

Upload the reference to Higgsfield and use this prompt structure per segment:

[Instructor description] seated at a desk or standing in front of a clean background,
looks directly at camera, explains [specific concept] in a calm teaching voice.
Slight natural head movement. Warm soft lighting, neutral background. 16:9.
Clean audio. No music. Professional but approachable. 30 to 60 seconds.

For the Python tutorial, segment 1 (variables and data types):

Mid-30s man with short dark hair and glasses, seated at a desk with a blurred bookshelf background,
looks at camera, explains what a variable is in Python in plain terms a beginner would understand.
Calm pacing. Slight head nods when making key points. Warm soft overhead lighting. 16:9.
No music. Clean audio. 45 seconds.

Generation time for a 45-second clip: 90 to 120 seconds. Generate 2 variants per segment, pick the one with cleaner eye contact. The reference portrait keeps face, skin tone, and glasses consistent across all six segments. Cost for 6 segments (2 variants each): about $14.

Step 2: Generate visual aids and diagrams

Model: Nano Banana Pro

Nano Banana Pro generates diagram-style images at 2K in 15 to 20 seconds each. One critical rule: verify every diagram before it reaches learners. A Python syntax diagram with a missing colon, a cell diagram with mislabeled organelles, or a chart with inverted axes destroys learner trust faster than a rough cut ever would.

Prompts that worked for the Python tutorial:

Clean educational diagram showing a Python variable as a labeled box. The box has the label "name"
on the outside and the string "Alice" inside. Light background, simple sans-serif font labels,
thin border. Looks like a textbook illustration, not a stock image.

Clean, accurate result on first generation. Cost: $0.04.

Side-by-side of Python string, integer, and float types. Each as a labeled container
with an example value. Blue for string, green for integer, orange for float.
White background, minimal, educational style. Muted colors.

Two generations. First attempt had oversaturated colors and an ambiguous float. Second pass specified "muted colors" and added "3.14" as the explicit float value. $0.08 total.

Full tutorial: 9 diagram generations, 8 used, $0.36.

Step 3: Generate on-screen motion clips

Model: Kling 3.0

Some concepts need animated motion, not a static diagram: a variable receiving a value, a loop counting up. Kling generates a clip in about 60 seconds. For the Python tutorial:

Abstract animation showing a labeled box labeled "count" on a white background. The number inside
starts at 0 and counts up to 5, one step per second, as if a loop is running. Clean, minimal,
educational. 6 seconds. No text on screen except the label and the number.

Clean on first generation. We used 3 Kling clips total (loop animation, list visualization, function call flow) at $2.10. For K-12, use Kling for physical processes: molecule bonding, pendulum motion, tectonic plates. Specify timing in the prompt ("over 4 seconds, slowly") because the default pacing is often too fast for educational viewing.

Step 4: Assemble with captions

Tools: 8frame Studio or any NLE

Assembly order: instructor clip, cut to diagram or motion clip at the moment you reference it, cut back. Explain, show, return. That pattern matches how learners encode information.

Segment	Length	Structure
Introduction	60 sec	Instructor only, overview of what the lesson covers
Concept explanation	45 sec	Instructor (30 sec) + diagram cut-in (15 sec)
Code example	60 sec	Instructor sets up the example, cut to screen recording or Kling animation
Common mistakes	45 sec	Instructor only, or diagram showing wrong vs right
Practice prompt	30 sec	Instructor + text overlay of the exercise
Summary	30 sec	Instructor only, recap of 3 key points

Captions are required. Udemy, Coursera, and most LMSes mandate them. On YouTube they're the difference between a 65% and a 45% average view duration for technical content. Use auto-captions, then review any segment with technical terms: "str type" generates as "stir type" in every auto-caption system.

Font: white text, thin dark outline, lower third, sized for mobile. 8frame's workflow library includes an educational video template with segment timing, caption style, and diagram cut-in positioning configured.

Routing by audience and subject

The workflow stays constant. What changes is the emphasis at each step.

STEM (math, science, programming). Generate and verify diagrams before instructor segments. If a diagram turns out wrong after the instructor clip is done, you may need to redo the explanation. Verify every equation and code snippet before publishing.

Language learning. Instructor naturalness matters more than diagram precision. Generate 4 to 5 Higgsfield variants per segment and pick the one with the most natural pacing. Visual aids are mostly vocabulary text overlays, not complex diagrams.

Business and professional skills. Slide-style Nano Banana Pro visuals, no cinematic Kling clips. Process flows and framework diagrams, nothing that reads consumer-facing.

Art and design. Visual quality gets scrutinized most here. Prompt explicitly for color accuracy: "exact CMYK values in the label," "Bauhaus primary colors, not approximations." Verify against a reference before publishing.

Walkthrough: 10-minute Python tutorial

A beginner Python tutorial (variables, data types, basic operations). Six segments, one instructor, eight diagrams, three motion clips.

Compute costs on 8frame, June 2026:

Item	Model	Quantity	Cost
Instructor segments (2 variants each)	Higgsfield Soul 2.0	12 clips	$14.40
Diagrams	Nano Banana Pro	9 generations (8 used)	$0.36
Motion clips	Kling 3.0	3 clips	$2.10
Reference portrait generation	Nano Banana Pro	3 generations (1 used)	$0.12
Total			$16.98

Assembly including caption review: about 45 minutes. A comparable production day with a human instructor and studio runs $1,200 to $1,500. This workflow produces the same output for $17 and scales to a 10-module course for under $200. Identity stays consistent because every session used the same reference portrait.

Pitfalls

Factual accuracy of generated diagrams. A wrong diagram leaves learners with incorrect information they may carry for years. Every diagram with a formula, labeled process, or verifiable fact needs a check against a primary source before publish. It's a permanent review step, not a prompt fix.

Instructor identity drift across modules. The failure mode is switching reference images between modules or using a portrait with different crop or lighting. Save the exact file, label it, reload it every session. If a module looks off, re-generate with the original reference.

Accessibility and captions. "Numpy" becomes "new pie." "kwargs" generates as "k-wargs" or worse. For any technical course, do a full caption review pass before upload. About 10 minutes for a 10-minute video. Some LMSes require SRT files. Export separately.

FAQ

Can AI teach accurately?

The video can be accurate, but accuracy is the instructor's responsibility. Higgsfield generates how the instructor says something, not whether it's correct. Nano Banana Pro generates diagrams that look authoritative. Neither model checks facts. Every diagram and code example needs a human review pass before learners see it.

Can I clone myself as the instructor?

Yes. Upload a front-facing photo and the model uses your face across all segments. You don't need to speak on camera. Some instructors scale one reference session into dozens of modules without a studio. The output won't match a direct camera recording, but for most platforms learners don't notice once they've accepted the instructor as the instructor.

Best aspect ratio for course platforms?

16:9 for Udemy, Coursera, LinkedIn Learning, and most LMSes. Their players don't handle vertical well. For YouTube mobile or TikTok education content, generate at 9:16. If unsure, generate 16:9. Every model in this workflow supports it natively. Specify it in the prompt.

Run the workflow

Reference portrait, instructor segments, diagrams, motion clips, captions. Each step has a model, a prompt structure, and a cost under $5. Iterate individual segments without re-shooting anything.

Clone the template from 8frame's workflow library. For the broader AI content strategy, see 10 AI workflows every brand should have.

How to Make an Educational Video with AI

TL;DR

Where this workflow fits

The 4-step workflow

Step 1: Generate the instructor avatar

Step 2: Generate visual aids and diagrams

Step 3: Generate on-screen motion clips

Step 4: Assemble with captions

Routing by audience and subject

Walkthrough: 10-minute Python tutorial

Pitfalls

FAQ

Can AI teach accurately?

Can I clone myself as the instructor?

Best aspect ratio for course platforms?

Run the workflow

Related articles

Make it
move.

Stay in the loop

TL;DR

Where this workflow fits

The 4-step workflow

Step 1: Generate the instructor avatar

Step 2: Generate visual aids and diagrams

Step 3: Generate on-screen motion clips

Step 4: Assemble with captions

Routing by audience and subject

Walkthrough: 10-minute Python tutorial

Pitfalls

FAQ

Can AI teach accurately?

Can I clone myself as the instructor?

Best aspect ratio for course platforms?

Run the workflow

Related articles

Make itmove.

Stay in the loop

Make it
move.