Tencent’s New Free AI Video Generator Battles OpenAI’s Sora

By Jeet Adhvaryu On Dec 6, 2024

Tencent claims their algorithm outperforms Runway Gen-3, Luma 1.6, and three prominent Chinese video production tools in human testing.

Tencent discreetly unveiled a model that is already equivalent to top-tier video producers while OpenAI teases Sora after months of delays.

Tencent launched Hunyuan Video, a free and open-source AI video generator, alongside OpenAI’s 12-day announcement campaign, which is expected to feature the launch of Sora, its highly awaited video tool.

“We propose Hunyuan Video, a unique open-source video foundation model that demonstrates performance in video creation that is equivalent to, if not superior to, leading closed-source models,” Tencent said.

The Shenzhen, China-based tech behemoth says their model “outperforms” Runway Gen-3, Luma 1.6, and “three top-performing Chinese video generative models” in expert human evaluations.

Between the SDXL and Flux periods of open-source picture generators, Tencent published a comparable image generator.

HunyuanDit increased bilingual text comprehension quite well, although it was not widely used. The family has big language models.

Hunyuan Video encodes text using a decoder-only multimodal large language model instead of CLIP and T5-XXL like other AI video tools and picture producers.

Tencent claims that this improves the model’s ability to follow instructions, understand image details, and learn new tasks without training. A token refiner helps it understand prompts better than traditional models.

Rewrites prompts to enrich them and improve generations. An example prompt is “A guy walking his dog” with specifics, scene setup, light circumstances, quality artifacts, and race.

As with Meta’s LLaMA 3, Hunyuan is free to use and commercialize until you reach 100 million users, which most developers won’t need to exceed soon.

The catch? Running its 13 billion parameter model locally requires a powerful machine with at least 60GB of GPU RAM, such as an Nvidia H800 or H20 card. It has more vRAM than typical gaming PCs.

FAL.ai, a developer-focused generative media platform, uses Hunyuan and charges $0.5 per film. Cloud companies like Replicate and GoEhnance are also providing the service. The official Hunyuan Video server gives 150 $10 credits, with each video production costing 15 credits.

Runpod and Vast.ai let customers run the model on leased GPUs. Early studies compare Hunyuan to Luma Labs Dream Machine and Kling AI. Photorealistic human and animal motion videos take 15 minutes to create.

Testing shows the model’s English prompt comprehension may be better. However, open source allows developers to tweak and enhance the model.

Tencent claims their text encoder achieves 68.5% alignment rates and 96.4% visual quality ratings based on internal testing.

Also Read: WikiLeaks protects Assange’s Afghan War Logs using Bitcoin blockchain