Semantic similarity

#29

by ZijieAsus - opened May 29, 2025

Discussion

ZijieAsus

May 29, 2025

•

edited May 29, 2025

I am trying to use this model for multilingual semantic search.

model = SentenceTransformer('intfloat/multilingual-e5-base')
prefix = "query: "
en_emb = model.encode(prefix + "how do i change my google profile photo?", normalize_embeddings=True)
zh_emb = model.encode(prefix + "我如何更改我的Google個人照片？", normalize_embeddings=True)

from sentence_transformers.util import cos_sim
print(cos_sim(en_emb, zh_emb)) # 0.9223

# When the input is a word, it seems to be more obvious.
en_emb = model.encode(prefix + "Apple", normalize_embeddings=True)
jp_ emb = model.encode(prefix + "リンゴ", normalize_embeddings=True)

print(cos_sim(en_emb, jp_emb)) # 0.7541

In the first case, I expected the cosine similarity to be very close to 1.0 (for example, 0.99, 0.98), but the result was 0.9223. Is this within expectations?
or is there a reason for this?

Thanks !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment