
daVinci-MagiHuman
No description available.
The Lens
DaVinci-MagiHuman does it in one model. No separate video generation, no separate voice synthesis, no stitching. One 15-billion-parameter transformer takes text and a reference image and jointly produces video and audio.
The numbers are real: 5-second 1080p video in 38 seconds on a single H100. Supports Mandarin, Cantonese, English, Japanese, Korean, German, and French. Beats Ovi 1.1 (80% win rate) and LTX 2.3 (60.9% win rate) in human evaluation. The full model stack is released: base model, distilled model, super-resolution model, and inference code.
From Shanghai's GAIR Lab and Sand.ai.
The catch: you need serious hardware. An H100 for the fast inference numbers, and the 15B parameter model isn't running on a consumer GPU. No license file listed; check before commercial use. And 'joint audio-video generation' is still early. The 5-second clip limit means this is for avatars and short-form content, not video production.
Get tools like this every Wednesday
One featured tool, three on the radar. No fluff.
Free vs Self-Hosted vs Paid
fully freeOpen source research release. No paid tier, no hosted version. You need your own GPU infrastructure: an H100 or equivalent for reasonable inference times. The model weights are on Hugging Face.
Free to use. You pay for GPU compute, and you'll need a lot of it.
About
- Owner
- SII - Generative Artificial Intelligence Research Lab (GAIR) (Organization)
- Stars
- 1,991
- Forks
- 202
Explore Further
More tools in the directory
openclaw
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
371.4k ★claw-code
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
191.3k ★n8n
Fair-code workflow automation with native AI capabilities
187.6k ★