Instructions to use bosonai/hubert_base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bosonai/hubert_base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="bosonai/hubert_base")# Load model directly from transformers import AutoProcessor, AutoModel processor = AutoProcessor.from_pretrained("bosonai/hubert_base") model = AutoModel.from_pretrained("bosonai/hubert_base") - Notebooks
- Google Colab
- Kaggle
Boson AI HuBERT Base
A general-purpose HuBERT-Base checkpoint released by Boson AI, used inside the Higgs Audio Tokenizer as the semantic teacher.
What it is
- Standard HuBERT-Base architecture (12 transformer layers, hidden size 768, ~95M params)
- 16 kHz audio input
- Loadable via
AutoModelwithtrust_remote_code=True - Outputs 768-dim per-layer hidden states (
output_hidden_states=True)
How it is used in Higgs Audio
The Higgs Audio Tokenizer distills semantic features from this HuBERT into its semantic branch. From boson_multimodal/audio_processing/higgs_audio_tokenizer.py (semantic_techer="hubert_base_general"):
from transformers import AutoModel
semantic_model = AutoModel.from_pretrained("bosonai/hubert_base", trust_remote_code=True)
# 16 kHz, 768-dim semantic features, all hidden layers consumed by the tokenizer
Direct usage
import torch
import torchaudio
from transformers import AutoModel
model = AutoModel.from_pretrained("bosonai/hubert_base", trust_remote_code=True).eval()
waveform, sr = torchaudio.load("audio.wav")
if sr != 16000:
waveform = torchaudio.functional.resample(waveform, sr, 16000)
with torch.no_grad():
out = model(waveform, output_hidden_states=True)
# out.last_hidden_state: (B, T, 768)
# out.hidden_states: tuple of (B, T, 768) for each of the 13 layers (embedding + 12 transformer blocks)
License
Apache 2.0.
- Downloads last month
- 119,667