
daVinci-MagiHuman
No description available.
The Lens
DaVinci-MagiHuman does it in one model. No separate video generation, no separate voice synthesis, no stitching. One 15-billion-parameter transformer takes text and a reference image and jointly produces video and audio.
The numbers are real: 5-second 1080p video in 38 seconds on a single H100. Supports Mandarin, Cantonese, English, Japanese, Korean, German, and French. Beats Ovi 1.1 (80% win rate) and LTX 2.3 (60.9% win rate) in human evaluation. The full model stack is released: base model, distilled model, super-resolution model, and inference code.
From Shanghai's GAIR Lab and Sand.ai.
The catch: you need serious hardware. An H100 for the fast inference numbers, and the 15B parameter model isn't running on a consumer GPU. No license file listed; check before commercial use. And 'joint audio-video generation' is still early. The 5-second clip limit means this is for avatars and short-form content, not video production.
Free vs Self-Hosted vs Paid
fully freeOpen source research release. No paid tier, no hosted version. You need your own GPU infrastructure: an H100 or equivalent for reasonable inference times. The model weights are on Hugging Face.
Free to use. You pay for GPU compute, and you'll need a lot of it.
Get tools like this every Wednesday
One featured tool, three on the radar. No fluff.
About
- Owner
- SII - Generative Artificial Intelligence Research Lab (GAIR) (Organization)
- Stars
- 2,066
- Forks
- 212
Explore Further
More tools in the directory
openclaw
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
380.5k ★everything-claude-code
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
221.8k ★hermes-agent
The agent that grows with you
203.2k ★