Instructions to use CloveAI/clov-embed-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use CloveAI/clov-embed-v2 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("CloveAI/clov-embed-v2") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
# ββ Load βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
tokenizer = AutoTokenizer.from_pretrained("CloveAI/clov-embed-v2", subfolder="tokenizer")
onnx_path = hf_hub_download("CloveAI/clov-embed-v2", "onnx/biencoder_rope.onnx")
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
# ββ Encode βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
def encode(texts):
if isinstance(texts, str): texts = [texts]
enc = tokenizer(texts, padding=True, truncation=True, max_length=256, return_tensors="np")
return session.run(["embeddings"], {"input_ids": enc["input_ids"], "attention_mask": enc["attention_mask"]})[0]
# ββ Test βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
emb = encode("Hello world!")
print(emb) # (1, 256)
BiEncoder RoPE β Sentence Embedding Model
A 34M parameter sentence embedding model trained from scratch using PyTorch.
Architecture
- 6-layer Transformer encoder with RoPE positional embeddings
- Mean pooling + L2 normalization
- 256-dim output vectors
Training (Curriculum)
| Phase | Dataset | Loss |
|---|---|---|
| 1 | all-nli | MNRLoss |
| 2 | squad | MNRLoss |
| 3 | msmarco-bm25 | HardNegativeLoss |
| 4 | natural-questions | MNRLoss |
Files
tokenizer/β HuggingFace tokenizer (bert-base-uncased)pytorch/checkpoint_phase4_nq.ptβ PyTorch weightsonnx/biencoder_rope.onnxβ ONNX FP32onnx/biencoder_rope_int8.onnxβ ONNX INT8 (recommended for CPU)
Performance
- FP32 ONNX size : 134.3 MB
- INT8 ONNX size : 34.6 MB
- Throughput : ~247 sentences/sec on CPU
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support