Hailuo AI prompts guide: how to create expressive videos and keep one character consistent

When I first started testing Hailuo, what stood out was not just motion. It was performance.

A lot of AI video tools can animate a scene. Hailuo often feels more interesting when the shot depends on facial expression, body language, and that small in-between moment where a character stops looking like an asset and starts feeling like a person.

That makes it especially useful for ads, short narrative scenes, social content, and visual experiments where continuity matters. Not just one nice clip. The same character, kept alive across multiple generations.

In this guide I want to focus on four practical things:

  • which Hailuo version to use and why
  • how to prompt Hailuo without overloading the model
  • when to use image-to-video, first-and-last-frame, and subject reference
  • how to build a consistency workflow in Phygital+ so one character survives more than one generation

Hailuo versions compared: 02 vs 2.3 vs 2.3 Fast

The current Hailuo lineup makes much more sense once you stop looking for one universal best version.

According to MiniMax’s official docs, MiniMax-Hailuo-2.3 is the newer general video generation model with improvements in body movement, facial expressions, physical realism, and prompt adherence. If the shot depends on acting, motion clarity, or subtle performance, this is the version I would look at first.

MiniMax-Hailuo-2.3-Fast is officially positioned as an image-to-video model focused on value and efficiency. That makes it useful when I am still exploring options and need multiple quick branches rather than a final polished result.

MiniMax-Hailuo-02 still matters because it remains the most flexible when you want longer duration and resolution combinations, and it is also the official model documented for the First-and-Last-Frame workflow. If I need a more controlled transformation from one defined image to another, Hailuo 02 becomes the practical choice.

VersionBest useWhy I would use it
MiniMax-Hailuo-2.3Expression-heavy scenes, polished clipsBest fit for body movement, facial expression, realism, and stronger prompt response.
MiniMax-Hailuo-2.3-FastQuick image-to-video iterationUseful for faster, cheaper branching when testing ideas.
MiniMax-Hailuo-02First-and-last-frame control, flexible duration and resolutionBest choice when the shot needs a defined visual destination.

As of the current official docs, both Hailuo 2.3 and Hailuo 02 support text-to-video, while Hailuo 2.3 Fast is documented in image-to-video. The First-and-Last-Frame workflow is specifically documented for Hailuo 02, and subject-reference generation is documented separately as S2V-01.

My practical rule: use 2.3 Fast for exploration, 2.3 for expression-heavy shots, and 02 when the start-to-end visual path matters more than speed.

How to use Hailuo without fighting the model

Hailuo becomes easier once you stop treating it like a chatbot and start treating it like a performance engine.

With many video generators, people start by describing cinematic camera language. With Hailuo, I would reverse the order. Start with the character, define the emotional state, then describe the physical action, and only after that tell the model how the camera should observe the moment.

That sequence usually gives more believable results, especially in close-ups, portrait-led scenes, product storytelling, and short ad shots where the viewer needs to feel a human presence quickly.

A useful order is:

  • who is on screen
  • what they feel internally
  • what they do physically
  • where the moment happens
  • how the camera moves
  • what kind of light shapes the scene

What Hailuo officially supports

MiniMax’s official video documentation currently describes four practical workflow types around the Hailuo ecosystem.

  • Text-to-video: useful when the scene starts from language alone. Officially documented for MiniMax-Hailuo-2.3 and MiniMax-Hailuo-02.
  • Image-to-video: the most practical mode for real workflows. Officially documented for MiniMax-Hailuo-2.3, MiniMax-Hailuo-2.3-Fast, and MiniMax-Hailuo-02.
  • First-and-last-frame video: documented specifically for MiniMax-Hailuo-02.
  • Subject-reference video: documented separately as S2V-01, which makes it a dedicated identity-preservation workflow rather than a simple prompt trick.

Camera command syntax: one feature worth using directly

One of the most useful officially documented Hailuo features is camera control with bracket syntax. MiniMax explicitly documents commands such as [Push in], [Pull out], [Pan left], [Pan right], [Tilt up], [Tilt down], [Truck left], [Truck right], [Tracking shot], and [Static shot].

Natural language still works, but the docs note that explicit commands usually give more accurate results. If I want the camera behavior to be read clearly, I prefer to use the bracket syntax.

Camera syntax example
A ceramic perfume bottle stands on a dark stone surface [Push in]. A soft reflection moves across the glass, then the shot settles into a clean hero frame [Static shot]. Premium studio lighting, high-end beauty campaign mood.

Prompt structure for Hailuo

For Hailuo, I would use a prompt structure that treats emotion as its own layer.

SubjectExpression / inner stateActionSceneCameraLighting
Who is on screenWhat they feelWhat they doWhere it happensHow the shot is observedWhat shapes the mood

This matters because Hailuo often responds better when the character is not just described visually, but emotionally.

Structure example
A young man in round glasses and a blue jacket, visibly strained but determined. He leans forward and pushes a massive stone sphere uphill with both hands. Dust rises around his shoes as he struggles for balance. A sunlit ochre hillside, dry air, cinematic realism. Medium shot from a slight side angle [Tracking shot]. Warm late-afternoon light.

The point is simple: give the model more to perform, not just more to illustrate.

How to build a consistency pipeline in Phygital+

This is where Hailuo becomes much more useful in practice.

The easiest way to lose consistency in AI video is to regenerate the same character from scratch every time. The first clip may be strong, but the second clip already starts drifting. The face changes a little. The proportions shift. The styling slides away from the original idea.

Inside Phygital+, I would not treat Hailuo as a one-shot tool. I would treat it as a branching engine built around one anchor image.

The workflow is simple:

  1. Create or choose one strong anchor image of the character.
  2. Send that image into several Hailuo branches.
  3. Keep the identity description stable across those branches.
  4. Change only one layer at a time: action, camera, or atmosphere.
  5. Reuse the best output frame as the next starting point if needed.

That is much more reliable than rewriting the whole prompt every time.

The real benefit of Phygital+ here is workflow visibility. You can keep the base image, compare generations side by side, and build a repeatable chain where the same character returns instead of being reinvented in every clip.

In the workflow above, the same visual base feeds multiple Hailuo branches. That is exactly what helps with consistency. Instead of asking the model to rediscover the character from text alone every time, you guide motion and variation from a shared visual starting point.

Branch 1

A first Hailuo branch generated from the same visual base, useful for comparing motion and character stability side by side.

Branch 2

A second branch from the same anchor image, showing how the workflow helps preserve the same character while testing a different variation.

If consistency matters more than novelty, do not regenerate the character from zero. Preserve identity first, then iterate on motion, angle, or atmosphere.

15 Hailuo prompt examples

These examples are inspired by the official structure and syntax in MiniMax’s docs, but adapted into new scenes, new characters, and more practical use cases.

Cinematic acting moment

A young woman in a dark green coat, nervous but trying to stay composed. She stands under a train station clock, then slowly lifts her eyes as if she has finally made a decision. Pale morning fog, quiet platform, realistic cinematic tone. Medium shot [Push in]. Cool grey-blue light.

Dialogue tension

Two friends stand in an empty parking structure after sunset. One tries to smile, the other keeps their jaw tense and hands still. The silence feels heavier than the scene itself. Medium two-shot [Truck right], then [Static shot]. Harsh overhead industrial lighting.

Portrait social clip

A young man with silver earrings and a cream bomber jacket looks into the lens with a confident half-smile. He adjusts the collar once, turns slightly, and holds eye contact. Clean outdoor wall, soft daylight, modern lifestyle mood. Waist-up shot [Push in]. Bright but natural lighting.

Fashion video

A woman in a long black dress walks slowly through a narrow stone corridor, calm and self-possessed. The fabric moves lightly as she turns her head toward the light. Editorial mood, minimal architecture, soft shadow play. Full-body frame [Tracking shot]. Cool luxury lighting.

Beauty ad

A ceramic perfume bottle stands on a dark stone surface. A narrow beam of light slides across the glass, then settles into a clean hero moment. Premium studio setting, restrained elegance. Close-up [Push in], then [Static shot]. Soft directional light, high-end beauty campaign style.

Skincare product shot

A frosted cream jar rests on pale travertine beside a ripple of water. The lid shifts slightly as reflected light moves across the surface. Minimal studio environment, quiet premium mood. Macro close-up [Tilt down]. Clean daylight with soft shadow edges.

Food commercial

A chilled citrus drink is placed on a sunlit outdoor table. Drops of water slide down the bottle while the background remains gently out of focus. Fresh summer mood, commercial realism. Close-up [Push in]. Warm natural light.

Character effort scene

A young man in round glasses and a blue jacket, exhausted but stubborn. He braces his body and pushes a huge stone sphere uphill with both hands. Dust lifts around his shoes and the strain is visible in his shoulders. Ochre hillside under warm sunlight. Medium shot from a slight side angle [Tracking shot]. Dry golden light.

Lifestyle rooftop scene

Two friends stand on a rooftop in the late afternoon. One laughs and leans on the rail, the other looks toward the skyline before answering. The mood is loose but intimate. Medium-wide shot [Pan right]. Golden sunset lighting.

Music video portrait

A performer stands still in a dim room while long ribbons move around them in slow air currents. They raise their chin and look straight at the camera as if entering the first note of a song. Medium shot [Push in]. Deep blue light with soft haze.

Fantasy scene

A rider in a dark cloak crosses a ridge above a valley filled with pale glowing flowers. The horse slows near the edge as the rider looks down into the mist. Wide cinematic frame [Truck left]. Moonlit silver-blue atmosphere.

Storytelling close-up

A phone vibrates on a kitchen table while someone stands blurred in the background. Their hand trembles slightly before they step forward into focus. Night interior, intimate tension. Start in close-up [Static shot], then [Push in]. Warm practical lighting.

Stylized ad moment

A red vinyl chair sits alone in a sunlit room with geometric shadows. A woman enters frame, pauses beside it, and runs her hand across the backrest as if introducing an object in a design film. Clean modern interior. Medium-wide shot [Pan left]. Warm editorial daylight.

Start-to-end transformation idea

A schoolboy stands alone in a courtyard at the beginning of autumn, uncertain and still. Across the clip, his posture matures and his expression settles into quiet confidence as the scene shifts toward a later stage of life. The emotional transition is restrained, not dramatic. Natural outdoor light, cinematic realism.

Character consistency test

A young woman with short dark hair, a cream trench coat, and a narrow silver ring, calm but slightly guarded. She walks toward the camera, then glances sideways as if she recognizes someone just outside the frame. Her face remains consistent and readable, with natural body movement and realistic pacing. City sidewalk after light rain. Medium shot [Tracking shot]. Soft overcast light.

FAQ

Which Hailuo version should I start with?

If you are exploring several image-to-video ideas quickly, start with Hailuo 2.3 Fast. If the shot depends on expression, body movement, and a stronger final result, move to Hailuo 2.3. If the workflow depends on a defined start and end frame, use Hailuo 02.

Is Hailuo better for text-to-video or image-to-video?

For real production work, image-to-video is usually more reliable. Text-to-video is great for exploration, but a strong start frame gives the model a much better anchor for character consistency and styling.

When should I use first-and-last-frame generation?

Use it when the shot must travel toward a specific visual destination. Transformations, short narrative beats, and connected storytelling are the clearest use cases.

How do I keep one face consistent across several clips?

Start from a strong anchor image, keep the identity sentence stable, and avoid changing everything at once. If identity is the real priority, use a subject-reference workflow rather than expecting prompt wording alone to solve it.

Does Hailuo understand camera direction well?

Yes. MiniMax officially documents bracket-based camera commands such as [Push in], [Pan right], and [Tracking shot]. Natural language also works, but explicit syntax is often more reliable.

Why do my generations look good but still feel like different characters?

Because visual style consistency is not the same as identity consistency. The scene may remain similar while the face, posture, or proportions drift. That is why start images, branching workflows, and subject control matter.

What is the safest way to iterate in Phygital+?

Branch from one approved image, compare multiple Hailuo outputs side by side, and only change one variable per branch. Small controlled changes preserve identity much better than full prompt rewrites.


Explore more