Sentence Similarity
sentence-transformers
Safetensors
Transformers
Russian
English
bert
feature-extraction
russian
pretraining
embeddings
text-embeddings-inference
Instructions to use sergeyzh/BERTA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use sergeyzh/BERTA with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("sergeyzh/BERTA") sentences = [ "Это счастливый человек", "Это счастливая собака", "Это очень счастливый человек", "Сегодня солнечный день" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use sergeyzh/BERTA with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("sergeyzh/BERTA") model = AutoModel.from_pretrained("sergeyzh/BERTA") - Inference
- Notebooks
- Google Colab
- Kaggle
Does BERTA support matryoshka dimensions?
#2
by dantetemplar - opened
Hello, can't find information on that - FRIDA, BERTA, others models do not declare matryoshka representation, but it looks I have no loss when truncate dim up to 384.
This is indeed an interesting observation. Although BERTA were not trained using Matryoshka Representation Learning (MRL), using a vector truncated by 2–3 times shows almost no drop in accuracy for most tasks. In my own testing, I tried various truncation methods (such as [:384], [384:], [0::2], and [1::2]) and did not observe any significant degradation either.
sergeyzh changed discussion status to closed