TeCoD SQL Template Matcher

Fine-tune of Qwen/Qwen3-Reranker-4B used by TeCoD, a template-guided constrained decoding system for text-to-SQL.

This model is the TeCoD template-matching reranker. It scores whether a user question matches a retrieved masked question/template, helping TeCoD select recurring SQL templates before generation.

Project page: https://sslab-cse-iitb.github.io/tecod/
Source repository: https://github.com/SSLab-CSE-IITB/tecod
Base model: https://huggingface.co/Qwen/Qwen3-Reranker-4B
Training data source: BIRD train split.

Intended Use

This model is intended as an internal component of TeCoD and related template-based text-to-SQL systems. It is not a standalone SQL generator. In TeCoD, it is used after vector retrieval and before SQL generation to rerank candidate SQL templates.

Input Format

The model is used as a cross-encoder over a question pair. Order matters: the first sequence should be the masked candidate/template question, and the second sequence should be the raw user question.

Premise:    "Show movies released in _ sorted by popularity desc"
Hypothesis: "What are the top films from 2010 by viewer count?"

Entity values in the candidate question are masked with a space-padded underscore _. The same mask token is used for strings, numbers, dates, and other literal values. Swapping the input order or changing the masking convention can degrade reranking quality.

Training Summary

Base model: Qwen/Qwen3-Reranker-4B
Architecture: Qwen3ForSequenceClassification
Data: approximately 1.48M NLI pairs derived from BIRD questions.
Positive pairs: template-paired questions, self paraphrases, and partner paraphrases that preserve the SQL template.
Negative pairs: hard negatives mined using nearest-neighbor retrieval over masked questions, with both masked and unmasked query variants used during pair construction.
Labels: entailment, neutral, contradiction.
The neutral label is retained for compatibility with a 3-class NLI head but was not used as a training target.

Limitations

Specialized for masked text-to-SQL question/template matching.
Not intended for general NLI, semantic similarity, or SQL generation.
Assumes the same masking convention and candidate-template construction used by TeCoD.
The neutral label is untrained; inference should use entailment vs. contradiction or renormalize over labels {0, 2}.
Very long question pairs and non-English inputs are not validated.
The reranking score is one signal in a larger text-to-SQL pipeline; it does not guarantee final SQL correctness.

References

TeCoD project page: https://sslab-cse-iitb.github.io/tecod/
TeCoD source repo: https://github.com/SSLab-CSE-IITB/tecod
Base model: https://huggingface.co/Qwen/Qwen3-Reranker-4B
Training Data - BIRD Train Set: https://bird-bench.github.io/

If you use this model as part of TeCoD, please cite:

@article{10.1145/3769822,
  author = {Jivani, Smit and Maheshwari, Saravam and Sarawagi, Sunita},
  title = {Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding},
  journal = {Proceedings of the ACM on Management of Data},
  volume = {3},
  number = {6},
  pages = {1--26},
  year = {2025},
  month = dec,
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  doi = {10.1145/3769822},
  url = {https://doi.org/10.1145/3769822}
}

License

Apache 2.0

Downloads last month: 70

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for smitxxiv/Qwen3-Re4B-SQL-TeCoD-TMM

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-Reranker-4B

Finetuned

(5)

this model