TeCoD SQL Template Matcher

Fine-tune of Qwen/Qwen3-Reranker-4B used by TeCoD, a template-guided constrained decoding system for text-to-SQL.

This model is the TeCoD template-matching reranker. It scores whether a user question matches a retrieved masked question/template, helping TeCoD select recurring SQL templates before generation.

Intended Use

This model is intended as an internal component of TeCoD and related template-based text-to-SQL systems. It is not a standalone SQL generator. In TeCoD, it is used after vector retrieval and before SQL generation to rerank candidate SQL templates.

Input Format

The model is used as a cross-encoder over a question pair. Order matters: the first sequence should be the masked candidate/template question, and the second sequence should be the raw user question.

Premise:    "Show movies released in _ sorted by popularity desc"
Hypothesis: "What are the top films from 2010 by viewer count?"

Entity values in the candidate question are masked with a space-padded underscore _. The same mask token is used for strings, numbers, dates, and other literal values. Swapping the input order or changing the masking convention can degrade reranking quality.

Training Summary

  • Base model: Qwen/Qwen3-Reranker-4B
  • Architecture: Qwen3ForSequenceClassification
  • Data: approximately 1.48M NLI pairs derived from BIRD questions.
  • Positive pairs: template-paired questions, self paraphrases, and partner paraphrases that preserve the SQL template.
  • Negative pairs: hard negatives mined using nearest-neighbor retrieval over masked questions, with both masked and unmasked query variants used during pair construction.
  • Labels: entailment, neutral, contradiction.
  • The neutral label is retained for compatibility with a 3-class NLI head but was not used as a training target.

Limitations

  • Specialized for masked text-to-SQL question/template matching.
  • Not intended for general NLI, semantic similarity, or SQL generation.
  • Assumes the same masking convention and candidate-template construction used by TeCoD.
  • The neutral label is untrained; inference should use entailment vs. contradiction or renormalize over labels {0, 2}.
  • Very long question pairs and non-English inputs are not validated.
  • The reranking score is one signal in a larger text-to-SQL pipeline; it does not guarantee final SQL correctness.

References

If you use this model as part of TeCoD, please cite:

@article{10.1145/3769822,
  author = {Jivani, Smit and Maheshwari, Saravam and Sarawagi, Sunita},
  title = {Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding},
  journal = {Proceedings of the ACM on Management of Data},
  volume = {3},
  number = {6},
  pages = {1--26},
  year = {2025},
  month = dec,
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  doi = {10.1145/3769822},
  url = {https://doi.org/10.1145/3769822}
}

License

Apache 2.0

Downloads last month
70
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for smitxxiv/Qwen3-Re4B-SQL-TeCoD-TMM

Finetuned
(5)
this model