LocalVideoDub (LVD Studio) - Linux AI Video Dubbing Utility

An isolated, self-contained Python application where short video clips or photos can be uploaded, and TTS, GFPGAN, and Wav2Lip work together so a user can change the dialog and have the lips match perfectly.

Features & Highlights

Zero System Bloat: Nothing is installed system-wide. Everything remains self-contained inside the folder.
Lightweight Download: The compressed download is only ~108MB because the required AI models auto-download on the first launch.
Full Setup Size: The fully installed environment expands to approximately 9GB.
Smart Loop: If your speech text is longer than the original video, the video automatically loops to match the audio length.

Setup and Installation Instructions

CRITICAL DRIVE NOTE: Ensure your installation directory or USB thumb drive is formatted to a native Linux filesystem (like ext4). Do NOT use spaces in your folder or USB drive names (e.g., use LVD_Studio, not LVD Studio), or the installer engine will crash.

Download and Extract Download LocalVideoDub.zip from the Files tab and extract it.
Open Terminal Open your Linux terminal inside the extracted LocalVideoDub project folder.
Build the Isolated Environment Run the installation script to build the local Python 3.10 sandbox and restore all core dependencies:
```
./install_environment.sh
```
(The first installation can take a while depending on your internet speed.)
Launch the Application Launch the program with:
```
./launch.sh
```
You will see a lot of terminal text while models and services load. This is normal. On the very first boot, it will automatically download the required AI model weight files (GFPGANv1.4.pth, wav2lip.pth, and lipsync_expert.pth) directly to your local models directory.
Access the WebUI When the server is ready, you will see text mentioning: Debugger is active. Open your browser and navigate to:
```
http://127.0.0.1:5000
```
(Bookmark it for quick access! To stop the server at any time, press CTRL+C in your active terminal window.)

Basic Usage

Load a short video or photo into "Target Video or Photo".
(Optional) Load external audio if your target video has no speech.
Enter your text into "Replacement Speech Text".
Click "Execute Video Generation".

The finished video will appear in the WebUI on the left side. Click the video to play it. Generated videos are automatically saved in the outputs/ folder. You can use the program as many times as you want!

A 720x720 Mona Lisa sample image and matching audio clip are included in the clips/ folder for immediate testing.

Tips for Better Results

Punctuation: The less punctuation you use, the better the speech generation tends to work.
Voice Cloning: If you use a photo or silent video, you must provide external audio for voice cloning.
Video Dimensions: Best results are usually achieved with videos around 720x720 resolution.
Head Movements: Keep the subject facing forward whenever possible. Minimal head movement produces better lip sync. Side angles and extreme facial rotations will not work properly.
Mouth Restrictions: Avoid videos where the mouth opens extremely wide or moves erratically. Male facial hair can sometimes create texture artifacts around the mouth area.
Defaults: Leave "Seed" and "Acoustic Temperature" at their defaults unless you want to experiment.

Troubleshooting & OOM (Out Of Memory) Errors

If you encounter an OOM error or the system lags:

Stop the server with CTRL+C in the terminal.
Restart the application using ./launch.sh.
Refresh your web browser page. This usually flushes the GPU VRAM memory and resolves the issue.

System Tested On:

OS: Xubuntu Linux
GPU: NVIDIA RTX 3050 (8GB VRAM)
RAM: 32GB System RAM

This project is considered stable and feature-complete. Future updates are unlikely unless major issues are discovered. Have fun, and be responsible!

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support