A dancing group poses in sync, their arms arched, standing in formation on a stage in a dark dance studio. In the background is a black backdrop with LED lights hanging on metal railing rigs.
Zion Harris, center, rehearses for Jeté, a monthly dance showcase at Heart WeHo in West Hollywood in Los Angeles, on Sept. 19, 2024. Photo by Alisha Jucevic for CalMatters

In summary

None of the videos generated by the leading AI platforms showed the actual dances we requested.

Artificial intelligence models can produce lifelike video footage with a simple text prompt. But these tools still struggle with generating realistic videos of complex natural movements, like human dance. 

When CalMatters and The Markup asked dancers and choreographers about whether AI could disrupt their industry, most concluded that human dancers could not be replaced.

For the most part, we found that they were right. We tested nine different cultural, modern and popular dance styles using four commercially available generative AI video models, generating a total of 36 videos. We found that the latest commercially available AI video generation models produced convincingly lifelike videos of people dancing — but none produced a figure performing the prompted dance.

About a third of the generated videos exhibited inconsistencies in a subject’s appearance from frame to frame, along with abnormalities in movement and limbs. The frequency and magnitude of issues observed were a significant improvement compared to initial testing in late 2024.

Methodology

Define task

CalMatters and The Markup tested four commercial video generation models produced by major tech companies to create video clips of traditional and popular dance.

We limited our tests to consumer-facing, closed-source generative video tools because they are the most readily available for everyday users and tend to perform better than open-source models. We tested Sora 2 by OpenAI, Veo 3.1 by Google, Kling 2.5 by Kuaishou, and Hailou 2.3 by MiniMax.

Prepare prompts

We drafted nine video prompts testing a variety of dances in different settings, such as dance floors, stages, bedrooms, studios, cultural events, public squares and classrooms. We tested for popular, modern and traditional cultural dance styles, including the Macarena, the Mashed Potato, folklorico and popular TikTok dances. See the Appendix for more details.

We varied the level of specificity to test whether identifying the dance by name was enough to generate a video of the desired motion, or whether explicitly specifying the exact physical movements improved output. 

Before finalizing the list of prompts, we submitted them to ChatGPT for edits based on the Sora 2 Prompting Guide. See Limitations: Prompt Optimization for more details.

Submit prompts for video generation

Each prompt was submitted once, using each model’s default settings for generating landscape-oriented videos. Three prompts submitted to Sora 2 were edited to remove words that triggered OpenAI’s filter, blocking prompts that may have violated “guardrails concerning similarity to third-party content.” For example, Sora 2 flagged prompts referencing specific years, popular music artists and banned words. One blocked prompt was for a video of a politician dancing the Macarena. For that prompt, replacing “politician in a suit” with “man in a suit” bypassed the guardrail. Veo 3.1 flagged similar prompts when we submitted via Gemini or Flow but did not when we submitted directly to the Veo 3.1 API.

Evaluate generated videos

We evaluated the generated videos on six different criteria related to prompt alignment and video consistency:

  1. Did the main subject dance in any way?
  2. Did the main subject perform the specific dance we prompted for?
  3. Did the main subject maintain the same physical appearance throughout the video?
  4. Did the main subject produce realistic motions based on human physiology?
  5. Did the scene and setting match the prompt?
  6. Did the camera match the prompted camera angle and position?

Each of the above criteria was assessed as a pass or a fail by a single reviewer, with the assistance of a second reviewer when needed. The generated videos of cultural dances were reviewed for accuracy by dancers familiar with them.

Results

Of the 36 videos generated, all but one showed a figure dancing. The one video that did not show a dancing figure — produced by Kling 2.5 — instead presented the bottom half of a figure performing side lunges.

No video produced the actual dance we prompted for. For the Cahuilla Band of Indians bird dance, tribal member Emily Clarke said, “None of these depictions are anywhere close to bird dancing, in my opinion.” The videos for the Horton dance did not show the specific dance movement we prompted for, but choreographer Emma Andre said she found the depiction by Veo 3.1 to be “staggeringly lifelike.” 

For the remaining pop culture dances, we compared the generated videos to videos we found on YouTube to evaluate whether the dance was accurate.

11 out of 36 videos exhibited issues with either motion or appearance consistency. This included sudden changes in clothing, hair or limb structure, such as heads rotating on separate axes from their bodies and limbs liquefying and reconstituting.

See the Appendix for full results and videos.

Limitations

Image-to-video generation

We did not use images to prompt the models. Image-to-video generation involves uploading a static image along with a text prompt, producing a dynamic video from both. Image-to-video generation is an advertised use case for models that produce dance videos from user-submitted images. 

Multi-subject dance videos

We did not prompt for videos with multiple dancers, even though some of the dances are often performed in groups. We limited our video prompts to showcase a single dancer to avoid ambiguity around whether a failed evaluation was due to issues with generating complex human movement or a realistic multi-subject video.

Prompt optimization

We did not optimize prompts on a per-model basis. Each company publishes its own prompt guide. (See the guidelines for Veo 3.1, Hailou 2.5, Kling 2.3, and Sora 2.) Instead, we used ChatGPT 5 to standardize prompts across models to align with the Sora 2 Prompting Guide. It’s possible that optimizing prompts for the models according to their specific guides could have yielded more accurate results.

We also tried to improve the quality of the videos by giving detailed step-by-step instructions of each dance. However, these instructions did not produce videos that were any more accurate than those produced with simpler prompts.

Human-motion generation models

We did not test generative models focused on human-motion generation. These models are used in animation and video games to generate and capture natural human motion. Researchers train some state-of-the-art academic models in this space using large datasets, including footage of popular dances on TikTok. Although these models may perform better than the consumer-facing models we tested, they require technical expertise and substantial computational resources to run.  

Sample size

Our evaluation is limited to the generated videos for nine prompts; it is not a comprehensive assessment of the models used. Some video generation benchmarks, like those from Tencent’s AI Lab and others, use several hundred prompts to test capabilities such as complex motion, multiple subjects and creative style.

Acknowledgements

We thank Yuhang Yang (University of Science and Technology of China) and Xiaodong Cun (Great Bay University) for reviewing an early draft of this methodology.

Appendix

View evaluations by prompt or evaluations by model.

Evaluations by prompt

Videos generated using, clockwise from top-left, OpenAI’s Sora 2, Google’s Veo 3.1, Kuaishou’s Kling 2.5 and MiniMax’s Hailou 2.3.

Prompt: “In a bright dance-studio, a woman grooves the ‘Apple’ dance from summer 2024 (the tune by Charli XCX plays). The camera holds still as she hits the signature moves—pastel outfit, energetic attitude, clean room. The scene feels fresh, fun, contemporary.” On Sora 2, we removed the reference to Charli XCX because it was rejected, citing possible violation of the platform’s guardrails for third-party likeness. (View reference video)


Videos generated using, clockwise from top-left, OpenAI’s Sora 2, Google’s Veo 3.1, Kuaishou’s Kling 2.5 and MiniMax’s Hailou 2.3.

Prompt: “In a bright school gym, a teacher in casual clothes does the chicken dance—flapping arms, twisting hips, fun and silly. The camera doesn’t move, just watches the whole body doing the moves.” (View reference video)


Videos generated using, clockwise from top-left, OpenAI’s Sora 2, Google’s Veo 3.1, Kuaishou’s Kling 2.5 and MiniMax’s Hailou 2.3.

Prompt: “A Cahuilla Band of Indians woman in colorful ribbon skirt dances the bird dance in slow, majestic movement. The camera holds still. The mood is respectful, ceremonial and visually rich.” (View reference video)


Videos generated using, clockwise from top-left, OpenAI’s Sora 2, Google’s Veo 3.1, Kuaishou’s Kling 2.5 and MiniMax’s Hailou 2.3.

Prompt: “In a bright, white-walled dance studio, a dancer in tights performs the Horton fortification number 3 from the Lester Horton technique: strong lines, extended limbs, controlled movement. The camera stays still and watches the body’s form and transition.” (View reference video)


Videos generated using, clockwise from top-left, OpenAI’s Sora 2, Google’s Veo 3.1, Kuaishou’s Kling 2.5 and MiniMax’s Hailou 2.3.

Prompt: “In a golden-sunlit public square somewhere in Jalisco, Mexico, a woman in a traditional Jalisco folklórico dress performs a Jalisco folklórico dance. The camera holds still, watching her full-body movement from a comfortable distance. The mood is festive and free-spirited, with natural sunlight and bright colours of the dress catching the light.” (View reference video)


Videos generated using, clockwise from top-left, OpenAI’s Sora 2, Google’s Veo 3.1, Kuaishou’s Kling 2.5 and MiniMax’s Hailou 2.3.

Prompt: “On a bright dance floor, a smartly dressed politician in a suit dances the ‘Macarena’. The camera holds still, capturing his suit-clad moves, the retro fun of the moment, and the light shining on the floor.” On Sora 2, we replaced “politician in a suit” with “man in a suit”, because the original prompt was rejected citing possible violation of the platform’s guardrails for third-party likeness. (View reference video)


Videos generated using, clockwise from top-left, OpenAI’s Sora 2, Google’s Veo 3.1, Kuaishou’s Kling 2.5 and MiniMax’s Hailou 2.3.

Prompt: “On a brightly lit stage, a man in a shiny satin shirt and flared trousers grooves the classic 1962 mashed-potato dance. The camera holds still, watching his feet swivel and his body rock in vintage style. The vibe is retro, fun and energetic.” On Sora 2, we removed reference to the year because it was rejected citing possible violation of the platform’s guardrails for third-party likeness. (View reference video)


Videos generated using, clockwise from top-left, OpenAI’s Sora 2, Google’s Veo 3.1, Kuaishou’s Kling 2.5 and MiniMax’s Hailou 2.3.

Prompt: “In a cozy blue-walled bedroom, someone in fun pajamas does the Renegade dance (TikTok-famous from 2019). The camera doesn’t move—it just watches them hit the moves. The vibe is playful and relaxed.” (View reference video)


Videos generated using, clockwise from top-left, OpenAI’s Sora 2, Google’s Veo 3.1, Kuaishou’s Kling 2.5 and MiniMax’s Hailou 2.3.

Prompt: “On a dark stage under a spotlight, a person in a sharp business suit does the classic sprinkler move — one hand behind the neck, the other sweeping wide-arc as they spin lightly. The camera stays still and captures the full body. The mood is retro, playful and fluid.” (View reference video)


Evaluations by model




Mohamed Al Elew is a journalism engineer, where they use data and software to produce investigative reporting. Before joining CalMatters and The Markup, Mohamed was a data reporter at Reveal from the Center...

Khari Johnson is part of the tech team and is CalMatters’ first tech reporter. He has covered artificial intelligence for nearly a decade and previously worked at WIRED, VentureBeat, and Imperial Beach...

Levi Sumagaysay covers the California economy for CalMatters with an eye on accountability and equity. She reports on the insurance market, taxes and anything that affects the state’s residents, labor...