Sentence Similarity
sentence-transformers
PyTorch
ONNX
Safetensors
OpenVINO
xlm-roberta
mteb
Sentence Transformers
Eval Results (legacy)
Eval Results
text-embeddings-inference
Instructions to use intfloat/multilingual-e5-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use intfloat/multilingual-e5-base with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("intfloat/multilingual-e5-base") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Inference
- Notebooks
- Google Colab
- Kaggle
Semantic similarity
#29
by ZijieAsus - opened
I am trying to use this model for multilingual semantic search.
model = SentenceTransformer('intfloat/multilingual-e5-base')
prefix = "query: "
en_emb = model.encode(prefix + "how do i change my google profile photo?", normalize_embeddings=True)
zh_emb = model.encode(prefix + "我如何更改我的Google個人照片?", normalize_embeddings=True)
from sentence_transformers.util import cos_sim
print(cos_sim(en_emb, zh_emb)) # 0.9223
# When the input is a word, it seems to be more obvious.
en_emb = model.encode(prefix + "Apple", normalize_embeddings=True)
jp_ emb = model.encode(prefix + "リンゴ", normalize_embeddings=True)
print(cos_sim(en_emb, jp_emb)) # 0.7541
In the first case, I expected the cosine similarity to be very close to 1.0 (for example, 0.99, 0.98), but the result was 0.9223. Is this within expectations?
or is there a reason for this?
Thanks !