Ex-Omni Talking Avatar
Text/speech to spoken response + 3D talking-avatar video
None defined yet.
Text/speech to spoken response + 3D talking-avatar video
Speaker-conditioned TTS with emotion & energy control
Multi-image instruction-guided image editing
Word-level timestamp alignment from audio + transcript
Subject-driven text-to-video from reference images (Wan2.2)
Image matting with diverse prompts via SAM2Matting
Multi-modal generation with diffusion transformers
Anima depth-conditioned image generation via VACE ControlNet
Separate audio into vocals and instruments with BS-Roformer
Polish speech recognition with fine-tuned Whisper Small
Real-time zero-shot stereo disparity estimation
Phone-use GUI agent - screenshot + task to next action
Video verification & temporal grounding with VideoSearch-R1
GUI grounding with VISTA-9B — predict click coordinates
Multi-view visual reasoning VLM based on Qwen3-VL 4B
Keep identity from reference, follow lineart structure
2x latent super-resolution with FlowUpscaler in Flux.2 space
Object and Material Selection VLM
Document-parsing VLM (1.2B) by KoreaDeep
Vietnamese text-to-speech with Kokoro TTS
Interleaved text and image generation with SenseNova-U1
Parallel region captioning with multimodal diffusion LLM
Unified AR model for image understanding & generation