Boson AI HuBERT Base

A general-purpose HuBERT-Base checkpoint released by Boson AI, used inside the Higgs Audio Tokenizer as the semantic teacher.

What it is

Standard HuBERT-Base architecture (12 transformer layers, hidden size 768, ~95M params)
16 kHz audio input
Loadable via AutoModel with trust_remote_code=True
Outputs 768-dim per-layer hidden states (output_hidden_states=True)

How it is used in Higgs Audio

The Higgs Audio Tokenizer distills semantic features from this HuBERT into its semantic branch. From boson_multimodal/audio_processing/higgs_audio_tokenizer.py (semantic_techer="hubert_base_general"):

from transformers import AutoModel

semantic_model = AutoModel.from_pretrained("bosonai/hubert_base", trust_remote_code=True)
# 16 kHz, 768-dim semantic features, all hidden layers consumed by the tokenizer

Direct usage

import torch
import torchaudio
from transformers import AutoModel

model = AutoModel.from_pretrained("bosonai/hubert_base", trust_remote_code=True).eval()

waveform, sr = torchaudio.load("audio.wav")
if sr != 16000:
    waveform = torchaudio.functional.resample(waveform, sr, 16000)

with torch.no_grad():
    out = model(waveform, output_hidden_states=True)

# out.last_hidden_state:  (B, T, 768)
# out.hidden_states:      tuple of (B, T, 768) for each of the 13 layers (embedding + 12 transformer blocks)

License

Apache 2.0.

Downloads last month: 119,667

Safetensors

Model size

94.4M params

Tensor type

F32