HOT-Step CPP SuperSep — ONNX Stem Separation Models

Pre-converted ONNX models for multi-stem audio separation in HOT-Step CPP. These run natively via ONNX Runtime GPU — no Python required.

Models

File	Architecture	Size	Purpose
`bs_roformer_sw.onnx`	BS-Roformer	672 MB	Stage 1: Primary 6-stem split (vocals, drums, bass, guitar, piano, other)
`mel_band_roformer_karaoke.onnx`	Mel-Band RoFormer	875 MB	Stage 2: Vocal sub-separation (lead vs backing)
`mdx23c_drumsep.onnx`	MDX23C	418 MB	Stage 3: Drum sub-separation (kick, snare, toms, hi-hat, cymbals)
`htdemucs_6s.onnx`	HTDemucs	105 MB	Stage 4: "Other" stem refinement

Total: ~2.07 GB

Usage

These models are designed for use with the HOT-Step CPP Model Manager. In the app:

Open the Model Manager (click "Get More Models" in the Models dropdown)
Go to the Stem Separation tab
Click Download on each model (or use the Stem Separation starter pack)

Models are downloaded to models/supersep/ and loaded automatically by the SuperSep engine.

Technical Details

Format: ONNX (opset 18, legacy TorchScript export)
Precision: FP32
Input: Spectrogram representation (STFT performed in C++ engine)
Output: Separation masks (iSTFT performed in C++ engine)
Runtime: ONNX Runtime 1.25.1+ with CUDA Execution Provider

The models export only the neural network portion — STFT/iSTFT operations are handled natively in C++ for optimal performance.

Conversion

These were converted from PyTorch checkpoints using the MSS_ONNX_TensorRT toolset with dynamo=False (legacy TorchScript exporter) for compatibility with complex attention architectures.

Attribution & Licenses

Training & Checkpoints

BS-Roformer checkpoint by aufr33 — trained on the Music Source Separation framework
Mel-Band RoFormer Karaoke checkpoint by aufr33 & viperx — SDR 10.1956 on karaoke separation
MDX23C DrumSep checkpoint by aufr33 & jarredou — drum sub-component isolation
HTDemucs by Meta / Facebook AI Research — Hybrid Transformer architecture

Frameworks & Tools

Music-Source-Separation-Training by ZFTurbo — training framework for BS-Roformer, Mel-Band RoFormer, and MDX23C architectures
MSS_ONNX_TensorRT by ZFTurbo — ONNX conversion tooling with STFT extraction and model validation
Demucs by Meta Research — HTDemucs architecture and pre-trained weights (MIT License)

Architecture Papers

BS-Roformer: "Music Source Separation with Band-Split RoFormer" (arXiv:2309.02612)
Mel-Band RoFormer: Mel-frequency variant of Band-Split RoFormer
MDX23C: Based on TFC-TDF-UNet v3 architecture
HTDemucs: "Hybrid Transformers for Music Source Separation" (arXiv:2211.08553)

License

The conversion and packaging is released under MIT. Individual model weights are subject to their original training licenses — see the attribution links above for details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for scragnog/HOT-Step-CPP-SuperSep

Music Source Separation with Band-Split RoPE Transformer

Paper • 2309.02612 • Published Sep 5, 2023 • 1

Hybrid Transformers for Music Source Separation

Paper • 2211.08553 • Published Nov 15, 2022 • 1