Sentence Similarity
sentence-transformers
Safetensors
qwen2
feature-extraction
Generated from Trainer
dataset_size:99000
loss:MultipleNegativesSymmetricRankingLoss
custom_code
text-embeddings-inference
Instructions to use FINGU-AI/Fingu-instruct-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use FINGU-AI/Fingu-instruct-1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("FINGU-AI/Fingu-instruct-1", trust_remote_code=True) sentences = [ "Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: Glay", "The Theory of Good and Evil is a 1907 book about ethics by the English philosopher Hastings Rashdall, in which the author expounds a theory he calls \"ideal utilitarianism\". It has been seen as Rashdall's most important philosophical work.", "GLAY is a Japanese rock band , formed in Hakodate in 1988 . Glay primarily composes songs in the rock and pop genres , but they have also arranged songs using elements from a wide variety of genres , including punk , electronic , R&B , progressive rock , folk , reggae , gospel , and ska . Originally a visual kei band , the group slowly shifted to less dramatic attire through the years . As of 2008 , Glay had sold an estimated 51 million records ; 28 million singles and 23 million albums , making them one of the top ten best-selling artists of all time in Japan .", "Aashirwad is a 1968 Bollywood film , directed by Hrishikesh Mukherjee . The film stars Ashok Kumar and Sanjeev Kumar . The film is notable for its inclusion of a rap-like song performed by Ashok Kumar , `` Rail Gaadi '' ." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
SentenceTransformer based on dunzhang/stella_en_1.5B_v5
This is a sentence-transformers model finetuned from dunzhang/stella_en_1.5B_v5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: dunzhang/stella_en_1.5B_v5
- Maximum Sequence Length: 8096 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8096, 'do_lower_case': False}) with Transformer model: Qwen2Model
(1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 1536, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: Ahu A Umi Heiau',
'Ahu A ʻ Umi Heiau means "shrine at the temple of ʻ Umi" in the Hawaiian Language.',
'The digit ratio is the ratio of the lengths of different digits or fingers typically measured from the midpoint of bottom crease ( where the finger joins the hand ) to the tip of the finger . It has been suggested by some scientists that the ratio of two digits in particular , the 2nd ( index finger ) and 4th ( ring finger ) , is affected by exposure to androgens , e.g. , testosterone while in the uterus and that this 2D :4 D ratio can be considered a crude measure for prenatal androgen exposure , with lower 2D :4 D ratios pointing to higher prenatal androgen exposure . The 2D :4 D ratio is calculated by dividing the length of the index finger of a given hand by the length of the ring finger of the same hand . A longer index finger will result in a ratio higher than 1 , while a longer ring finger will result in a ratio lower than 1 . The 2D :4 D digit ratio is sexually dimorphic : although the second digit is typically shorter in both females and males , the difference between the lengths of the two digits is greater in males than in females . A number of studies have shown a correlation between the 2D :4 D digit ratio and various physical and behavioral traits .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Logs
| Epoch | Step | Training Loss | retrival loss |
|---|---|---|---|
| 0.6466 | 500 | 0.0424 | 0.0060 |
- Downloads last month
- 5
Model tree for FINGU-AI/Fingu-instruct-1
Base model
NovaSearch/stella_en_1.5B_v5