Boson AI HuBERT Base

A general-purpose HuBERT-Base checkpoint released by Boson AI, used inside the Higgs Audio Tokenizer as the semantic teacher.

What it is

  • Standard HuBERT-Base architecture (12 transformer layers, hidden size 768, ~95M params)
  • 16 kHz audio input
  • Loadable via AutoModel with trust_remote_code=True
  • Outputs 768-dim per-layer hidden states (output_hidden_states=True)

How it is used in Higgs Audio

The Higgs Audio Tokenizer distills semantic features from this HuBERT into its semantic branch. From boson_multimodal/audio_processing/higgs_audio_tokenizer.py (semantic_techer="hubert_base_general"):

from transformers import AutoModel

semantic_model = AutoModel.from_pretrained("bosonai/hubert_base", trust_remote_code=True)
# 16 kHz, 768-dim semantic features, all hidden layers consumed by the tokenizer

Direct usage

import torch
import torchaudio
from transformers import AutoModel

model = AutoModel.from_pretrained("bosonai/hubert_base", trust_remote_code=True).eval()

waveform, sr = torchaudio.load("audio.wav")
if sr != 16000:
    waveform = torchaudio.functional.resample(waveform, sr, 16000)

with torch.no_grad():
    out = model(waveform, output_hidden_states=True)

# out.last_hidden_state:  (B, T, 768)
# out.hidden_states:      tuple of (B, T, 768) for each of the 13 layers (embedding + 12 transformer blocks)

License

Apache 2.0.

Downloads last month
119,667
Safetensors
Model size
94.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support