zerank-2-reranker-seq

A Qwen3ForSequenceClassification reranker derived from zeroentropy/zerank-2-reranker.

The original model is a Qwen3ForCausalLM reranker that scores a (query, document) pair using the next-token logit of a single relevance token (true_token_id = 9454, from its 1_LogitScore sentence-transformers head). Because the model uses tied embeddings, that logit is hidden_state · embed_tokens.weight[9454]. This conversion copies that single embedding row into the score head of a standard Qwen3ForSequenceClassification model, producing a num_labels=1 reranker whose output logit is identical (by construction) to the original relevance score.

This makes the model loadable directly via AutoModelForSequenceClassification and servable as a cross-encoder reranker (e.g. by infinity), without the causal-LM + logit-extraction path.

Conversion method: https://github.com/michaelfeil/infinity/blob/main/docs/lm_head_to_classifier/convert_lm.py

Details

architectures: ["Qwen3ForSequenceClassification"]
num_labels: 1 (single relevance logit; apply a sigmoid for a 0–1 score)
dtype: bfloat16 (matches the source; not downcast to fp16)
score head: Linear(hidden_size, 1, bias=False), weight = embed_tokens.weight[9454]

Note on prompt formatting

The original model was trained with a chat template that places the query in a system turn and the document in a user turn, followed by an assistant generation prefix. Generic sequence-classification servers tokenize the raw (query, document) pair and do not apply this template, which can shift scores relative to the native sentence-transformers usage. For best fidelity, format inputs as:

<|im_start|>system
{query}<|im_end|>
<|im_start|>user
{document}<|im_end|>
<|im_start|>assistant

Usage

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

name = "baseten-admin/zerank-2-reranker-seq"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForSequenceClassification.from_pretrained(name, torch_dtype=torch.bfloat16).eval()

query, document = "What is the capital of France?", "The capital of France is Paris."
text = (
    f"<|im_start|>system\n{query}<|im_end|>\n"
    f"<|im_start|>user\n{document}<|im_end|>\n"
    f"<|im_start|>assistant\n"
)
with torch.no_grad():
    logit = model(**tok(text, return_tensors="pt")).logits.reshape(-1)[0]
    score = torch.sigmoid(logit)
print(score.item())

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for baseten-admin/zerank-2-reranker-seq

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

zeroentropy/zerank-2-reranker

Finetuned

(4)

this model