← Back to blog

How to Make a Coffee Ad with AI

A 4-step AI workflow for coffee ads: hero stills via Nano Banana Pro, pour and steam via Seedance 2.0, cafe lifestyle via Kling 3.0. Full cold brew DTC example for $5.

You can make a coffee ad with AI using three models on 8frame: Nano Banana Pro for hero product stills, Seedance 2.0 for pour and steam motion, and Kling 3.0 for cafe lifestyle footage. A full 15-second cold brew DTC spot costs roughly $5 in model credits and takes about 45 minutes the first time through. That's the same deliverable a studio food photographer quotes at $800 to $2,000, not counting video.

TL;DR

Coffee ad categories and what each one needs

Coffee advertising splits into five formats. Pick yours before writing a single prompt.

Cafe atmosphere. Kling 3.0 is the primary model. You're generating mood, not product accuracy, and you need 6 to 10 clips to fill 30 seconds of atmosphere-forward content.

Packaged-bag DTC. Nano Banana Pro for hero stills, Seedance 2.0 for motion clips where the bag needs to stay on-brand and legible. Label rendering is the hard part (more on that in pitfalls).

Espresso machine demo. Seedance 2.0 with a reference photo of the actual machine. Photo-to-video conditioning keeps the hardware recognizable across clips.

Cold brew lifestyle. Kling 3.0 for person-in-context shots, Seedance 2.0 for the pour-over-ice close-up, Nano Banana Pro for the bottle hero. Most multi-model of the five formats.

Seasonal latte launch. Nano Banana Pro for the drink-in-hand still, Seedance 2.0 for steam and swirl, Kling 3.0 for cozy seasonal b-roll (rainy window, knit sweater, first snow).

The 4-step workflow

Step 1: Hero still via Nano Banana Pro

Start with a hero product still before generating any video. It anchors the visual identity every motion clip needs to match. Nano Banana Pro generates at 2K in about 15 to 20 seconds and handles reflective surfaces, liquid texture, and dark packaging cleanly. Upload your actual product reference if you have one; describe the packaging if you're generating spec work.

Tested prompt, cold brew bottle:

A sleek matte-black cold brew bottle with a minimalist white label, centered on a wet slate surface. Single beam of natural light from camera left, dark moody background. A few condensation droplets on the bottle exterior. No reflections in label text. No lens flare. Hero product shot, clean and premium.

Generated in 17 seconds. The condensation micro-detail was sharp and the matte label had zero blown-out highlights. The slate surface picked up a natural-looking wet sheen without becoming distracting. This still works as an end-card background, a pause-frame insert, and a static ad unit.

Tested prompt, espresso shot in demitasse:

An espresso shot being poured into a white ceramic demitasse cup, tiger-stripe crema forming at the surface. Overhead angle, 45 degrees. Warm wood surface underneath, soft diffused overhead light. Crema texture is thick and intact, no bubbles. Sharp focus on the crema surface. Product photography, clean.

The crema texture rendered with the striated pattern that espresso professionals recognize as correct. That specificity matters for premium espresso brand ads where the audience is discerning. Generation time was 19 seconds. The second variant had a bubble artifact near the rim; the first was clean.

Step 2: Pour and steam motion via Seedance 2.0

Upload your hero still as the reference image and switch to Seedance 2.0's photo-to-video mode. The model builds motion from your reference rather than inventing a new scene, keeping packaging accurate across the clip. If the label looks different in the video than in your still, the ad falls apart.

Tested prompt, cold brew pour over ice:

Cold brew coffee being poured from a matte-black bottle over a glass of ice. Dark amber liquid in slow motion, ice cubes shifting gently as liquid fills. The bottle label stays readable and in-frame throughout. Minimal steam, room-temperature atmosphere. Dark moody background, single backlight catching the pour stream. Vertical 9:16, 5 seconds.

Generated in 74 seconds on 8frame. The pour stream had realistic fluid dynamics, the ice shifted naturally, and the bottle label stayed flat and legible throughout the clip. The backlight caught the liquid stream in a way that made it look genuinely cinematic rather than stock-footage generic. Cost: $0.58.

Tested prompt, latte steam and surface:

A latte in a ceramic keep-cup, steam rising slowly from the surface. Camera is at table level looking across the rim, slight upward tilt. Cozy warm cafe background, soft bokeh. The milk-foam latte art on the surface stays intact, not disturbed by steam movement. Golden morning light. Vertical 9:16, 5 seconds.

The latte art held its shape through the full clip, which is one of the harder generation challenges in coffee content. Steam motion was smooth and directional rather than chaotic. One variant had the foam surface distort slightly at 3 seconds; the second was clean. Generate 2 variants minimum on any clip with liquid surface detail. Cost: $0.55.

Step 3: Cafe lifestyle via Kling 3.0

Kling 3.0 handles the human-in-environment footage: hand wrapping around a cup, someone discovering latte art, the ambient energy of a busy espresso bar, a person stepping into morning light with a to-go cup. These shots don't require product accuracy in the same way as Steps 1 and 2. They're providing emotional context and lifestyle signaling.

Tested prompt, hand-on-cup morning ritual:

A pair of hands wrapping around a warm ceramic mug. Morning light from a window, steam rising. The hands are relaxed, no nail polish, wearing a simple ring on one finger. Wooden table surface. Camera slow push-in from medium to close-up. Warm morning color temperature, soft focus on background. Vertical 9:16, 5 seconds.

The hand motion was natural and the steam direction was consistent with the lighting. Kling held the ring detail across the push-in without distortion, which is a detail that typically causes problems in close-up motion clips. Generated in 61 seconds. Cost: $0.32.

Tested prompt, third-wave cafe atmosphere:

Interior of a small specialty coffee shop. A barista in a dark apron pulls an espresso shot at a sleek black espresso machine. Morning light through large windows. A few customers at tables in soft focus background. Camera holds steady on the barista's hands and the machine. Warm amber tones, clean and modern space. 16:9, 5 seconds.

The espresso machine geometry stayed proportionally correct. The barista's hand motion during the pull looked practiced, not generic. Good establishing shot for a cafe brand's awareness campaign. Generation time: 63 seconds. Cost: $0.32.

Step 4: Audio

For DTC coffee ads running on Reels and TikTok, ambient texture beats voiceover most of the time. A light lo-fi track or a licensed cafe ambience loop keeps viewers in the mood you're setting. If your format requires a spoken benefit claim (cold brew DTC converting on product claims, for example), write the voiceover to match the clip timing before you start generating, so you know exactly which shots need to be how long.

Export 9:16 for Reels and TikTok, 1:1 for Instagram feed, 16:9 for YouTube pre-roll. Most of the footage in this workflow was specified as 9:16. For a 16:9 master, adjust the framing prompts to "16:9 horizontal" in Steps 2 and 3.

Routing by coffee sub-niche

Cafe chain (local or regional). Volume is the constraint. Use Kling 3.0 as your primary model for speed. Generate 10 to 15 atmosphere and lifestyle clips per session, pick the 6 to 8 that feel cohesive. Reserve Seedance for your flagship seasonal launch when product accuracy matters. Budget $1 to $2 per final clip.

DTC roastery (bags and subscriptions). Your bag design is your brand. Nano Banana Pro for every hero still, Seedance 2.0 for every motion clip where the bag is in frame. Kling handles lifestyle shots. Budget $5 to $10 per finished 15-second ad.

Ready-to-drink (RTD) cold brew. Packaging-in-motion is the constraint. Seedance 2.0 with your product photo as reference is the core. Supplement with Kling for lifestyle and Nano Banana Pro for display ad stills. Budget $4 to $8 per ad.

Espresso equipment brand. Machine demo accuracy is the constraint. Upload your actual product photo as the Seedance reference for extraction and machine-interaction clips. Nano Banana Pro for catalog hero shots. Kling for lifestyle shots where the machine is background. Budget $6 to $12 per ad since machine detail needs more generation variants.

Walkthrough example: cold brew DTC ad for $5

Here's the exact run for a 15-second cold brew Instagram Reels ad for a DTC brand launching a new single-origin cold brew concentrate.

Inputs: one product reference photo of the 16oz bottle, brief with brand tone (clean, minimal, understated premium)

Generations run on 8frame:

Clip Model Prompt result Cost
Bottle hero still Nano Banana Pro Matte label, condensation, slate surface $0.08
Cold brew pour over ice Seedance 2.0 Fluid dynamics clean, label legible throughout $0.58
Bottle rotate (dark bg) Seedance 2.0 Slow rotation, label stable, backlight rim $0.55
Hand on cold brew glass, morning Kling 3.0 Natural hand motion, warm window light $0.32
Lifestyle: person at window with RTD bottle Kling 3.0 Clean silhouette, product visible $0.32
Bottle hero still #2 (end card) Nano Banana Pro White background, brand colors, no text $0.08

Total: $1.93 in model credits. Two Seedance variants per clip adds roughly $1.10. Final batch with selects and alternates: $3.03. Add a licensed ambient track and the total comes in under $5. Edit time: 35 minutes including color grade and caption burn.

Pitfalls

Crema realism. Generic coffee prompts produce a beige foam that looks wrong to anyone who drinks specialty coffee. Fix: specify "tiger-stripe crema, honey-amber color, intact surface, espresso extraction" in any espresso prompt. Vague prompts get vague crema.

Label text drift. Seedance 2.0 holds packaging shape but sometimes warps label text on clips longer than 5 seconds or on motion paths that angle the label away from camera. Fix: add "label text stays sharp and legible throughout" to every prompt where the label is visible. If both variants drift, reduce the camera motion intensity.

Steam authenticity. Badly prompted steam looks like a smoke machine. Authentic coffee steam is wispy, low-rising, and directional. Fix: use "thin wisp of steam rising gently, slight drift in one direction from ambient air, not dense or smoke-like." Specify the light source so the steam has translucency.

Hand-on-cup ergonomics. AI hands occasionally produce anatomically off grips that every coffee drinker clocks immediately. Fix: add "natural relaxed grip, fingers wrapped loosely, correct hand ergonomics" to any hand-with-cup prompt. Generate 3 to 4 variants and discard the bad ones; don't try to word-fix it.

FAQ

Can the footage pass as a real coffee shop shoot?

At social-ad scale, yes for most viewers. A 6-second pre-roll or a Reels organic post from Kling 3.0 and Seedance 2.0 is past the attention threshold where most viewers disengage. For a CTV brand spot or a campaign where the footage runs at full broadcast quality, you'd notice the difference. This workflow is for social performance ads and organic content.

What aspect ratio should I use for Reels?

9:16 throughout. Specify it in every Seedance and Kling prompt ("vertical 9:16") so the composition is designed for the format. For YouTube pre-roll or a display ad, run a separate batch with "16:9 horizontal" in the prompt. Cropping a 9:16 clip to 16:9 cuts off the product in most coffee shots.

How many variants should I generate for testing?

At least 3 distinct creative concepts and 2 to 3 clip variants per concept. That's 6 to 9 ad units from one 8frame session at $15 to $30 in model credits. Run them in Meta's creative ranking or TikTok's smart performance campaigns for 5 to 7 days. A 9-variant test consistently finds a winner that outperforms any single "best guess" creative.


For the broader UGC and video ad workflow that this fits into, the how to make a UGC ad with AI guide covers talking heads, hook formulas, and assembly structure in detail. And the 8frame workflow library has a food and beverage ad template with the full timeline, color grade, and export settings pre-configured for Reels and TikTok.

Related articles

use caseHow to Make a Shopify Product Video with AIuse caseHow to Make a UGC Ad with AI (Without Filming)use caseHow to Make a Cooking Video with AI

Your frames start here

Watch the canvas power your creative flow in real time

Stay in the loop

Be the first to hear about our launch and get product updates