TelecomGPT-R1: The Best Telecom-Specific Large Language Model

A 27B open model that ranks #1 on the GSMA Open Telco Leaderboard across all models (open or closed, general-purpose or telecom-specialized), with an average score of 89.6%, demonstrating that an open telecom reasoning model can match top performance on telecom benchmarks.

1 — A New State of the Art for Telecom LLMs

TelecomGPT-R1 (27B) reaches state-of-the-art (SOTA) performance on the GSMA Open Telco Leaderboard at 89.6% average, matching or leading every open-source and closed-source entrant across both general-purpose and telecom-specialized categories. The leaderboard aggregates 7 benchmarks spanning 4 reasoning axes — protocol understanding (3GPP/O-RAN normative prose), knowledge QA (vendor and operator facts), modeling & computation (RF/queueing derivations), and fault analysis (RAN drive-test logs) — as reported in Figure 1.

Among open-source models, TelecomGPT-R1 leads DeepSeek-V3-0324 (685B) by +30.3, LLaMA-3.3-70B by +34.9, and Qwen2.5-72B by +35.6, while operating at roughly 25× fewer active parameters than the next-best open entrant.
Among closed-source models, TelecomGPT-R1 reaches SOTA performance across both the general-purpose frontier tier and the telecom-specialized tier, as detailed in the two bullets below.
Among general-purpose frontier models, TelecomGPT-R1 leads Gemini-3.1-Pro by +14.0, Claude-Opus-4.6 by +16.3, and GPT-5 by +17.7. These systems sit at the trillion-parameter-class frontier (active-parameter counts are not publicly disclosed but are widely reported as orders of magnitude larger than 27B), making the margin a parameter-efficiency result as much as an accuracy result.
Among telecom-specialized models, TelecomGPT-R1 is on par with the leading closed operator-internal telecom model AT&T's OTel-LLM-8.3B-QnA, and leads SoftBank LTM by +16.0, demonstrating that an open telecom reasoning model can reach SOTA performance alongside top operator-internal baselines on the GSMA Open Telco Leaderboard.^†

In one line: TelecomGPT-R1 demonstrates that an open 27B telecom reasoning model can reach SOTA performance across the full breadth of the GSMA Open Telco Leaderboard.

Figure 1 | TelecomGPT-R1 vs frontier closed-source models on the GSMA Open Telco Leaderboard. Each spoke is one benchmark (plus the overall average), normalized by its per-axis leaderboard best so that 1.0 = best score on that benchmark. Our 27B open-source policy reaches 1.0 on six of eight axes (3GPP-TSG, srsRANBench, TeleLogs, TeleQnA, TeleTables, Average) and stays at or above 0.94 on every other axis, visibly tracing the outer edge of the radar where no other model, open or closed, matches it on all axes simultaneously.

Quickstart

Requirements (verified): transformers >= 5.0, torch >= 2.4, and — for vLLM serving — vllm >= 0.19. The end-to-end stack we verified is transformers 5.3.0.dev0 + torch 2.10.0+cu128 + vllm 0.19.1.

Here is a code snippet demonstrating how to load TelecomGPT-R1 with transformers and generate a telecom-grounded response:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "KU-DFI/TelecomGPT-R1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = (
    "A 5G NR cell is observing repeated random-access failures from cell-edge UEs. "
    "Drive-test capture shows: average RSRP = -108 dBm, average RSRQ = -16 dB, "
    "PRACH preamble attempts averaging 8 with no Msg2 (RAR) received within "
    "ra-ResponseWindow, UE timing-advance range 4-7 km, and PRACH configuration "
    "uses preamble format A1 with zeroCorrelationZoneConfig = 8. "
    "Diagnose the most likely root cause and propose a configuration change."
)
messages = [
    {
        "role": "system",
        "content": (
            "You are TelecomGPT-R1, an open 27B telecom reasoning model from "
            "KU/DFI. Reason step-by-step over 3GPP standards, RAN logs, RF and "
            "network derivations, and telecom code."
        ),
    },
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048,
)
generated_ids = [
    output_ids[len(input_ids):]
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Expected timing & fast-path libraries. Loading the bf16 weights takes roughly 30–60 s on a single 80 GB GPU (we measured 22 s sharded across 8× H200). Out of the box, generation runs in the slow torch fallback at ~10–20 tok/s on a single sequence (we measured 15.8 tok/s on 8× H200, bf16). For faster inference, install Qwen3.5's optional fast-path kernels:

pip install flash-linear-attention causal-conv1d

(See flash-linear-attention and causal-conv1d for build details.)

For production / batch serving on operator-confidential data, host with vLLM:

vllm serve KU-DFI/TelecomGPT-R1 \
    --tensor-parallel-size 1 \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.85

(Scale --tensor-parallel-size, --max-model-len, and --gpu-memory-utilization up as needed for multi-GPU nodes or higher-throughput serving.)

Hardware: Following the official Qwen3.5-27B deployment guidance, TelecomGPT-R1 (27B, bf16) runs on a single A100 80GB (or equivalent H100 80GB / MI300X) with the default settings above. The bf16 weights occupy ~54 GB, leaving roughly 14 GB of an 80 GB card for KV cache at --gpu-memory-utilization 0.85 — enough for context lengths up to ~8K tokens on a single GPU. Longer contexts (16K, 32K) or larger batches require multi-GPU sharding (e.g. --tensor-parallel-size 2 or more) behind an operator firewall.

Smaller TelecomGPT-R1 variants — coming soon. Lighter checkpoints better suited to edge / device-side inference are currently in training, extending the family from data-center GPU deployment down toward on-device telecom intelligence.

2 — Toward Universal Telecom Reasoning

2.1 — Why telecom needs specialized reasoning models

The telecommunications sector does not communicate in a single data language. A practical telecom workflow has to read 3GPP specification clauses written in stilted normative prose, parse RAN logs and PCAPs at the byte level, interpret KPI dashboards as time-series, walk fault trees across multi-vendor subsystems, and close RF/network derivations symbolically. Moreover, many such questions route through specification text, structured telemetry, and physical-layer math in a single chain.

Therefore, these tasks demand complex multi-step reasoning across heterogeneous modalities, which cannot be reduced to surface retrieval, MCQ classification, or single-axis fact lookup.

2.2 — Why existing general-purpose LLMs are not enough

Yet until now, general-purpose AI giants have stumbled when confronted with these highly diverse domain-specific data landscapes, despite powerful native reasoning abilities. A strong general reasoner produces well-formed chains operating on wrong telecom facts. RL cannot manufacture knowledge that was never in the model.

Therefore, the path forward is to construct dense telecom-specific domain knowledge that anchors general reasoning ability onto concrete telecom tasks.

2.3 — Why open-source matters compared with closed proprietary models

Building a real telecom LLM requires substantial compute, carefully curated multi-modal telecom data, and engineering investment beyond what most academic groups can muster. A handful of operators with the resources to absorb that cost have made attempts (such as AT&T's OTel-LLM-8.3B-QnA and SoftBank's LTM), yet their models remain inaccessible to anyone outside the issuing organization. Most publicly released "telecom AI" stops at narrow extractive baselines (log classifiers, MCQ taggers, RAG retrieval) rather than full-stack reasoning systems.

Therefore, the industry needs an open-source telecom reasoner that can be:

Self-hosted behind an operator's firewall.
Run directly on operator-confidential data: RAN logs, PCAP captures, KPI dashboards, customer traffic.
Fine-tuned on each operator's proprietary subsystem data.
Audited line-by-line for 3GPP / GSMA / O-RAN compliance.
Transferred across carriers and equipment vendors without renegotiating an API contract.

2.4 — What TelecomGPT-R1 improves

TelecomGPT-R1 represents a definitive leap forward: a 27B open-weights base trained to perform universal reasoning across protocol understanding, knowledge QA, modeling & computation, and fault analysis under a single unified policy. Rather than stitching together specialized heads per task, one model handles the full four-axis surface evaluated by the GSMA Open Telco Leaderboard (producing the leaderboard result reported in §1), while remaining small enough to self-host, fine-tune, and audit inside an operator environment. Figure 2 | The four kinds of reasoning a telecom engineer juggles. Each scope shows one axis of telecom work (protocol understanding 22.7%, knowledge QA 15.3%, modeling & computation 43.5%, fault analysis 18.5%) and the share of the 158,915-example TelecomGPT-R1 training corpus that targets it. The cross-axis distribution explains why we train one unified policy rather than four specialists: a real workflow mixes all four in the same session.

3 — How We Built TelecomGPT-R1

The challenges in §2 (heterogeneous modalities, missing telecom domain knowledge in general LLMs, and the scarcity of open vertical reasoners) required an end-to-end recipe rather than a single training trick. TelecomGPT-R1 is built on two design pillars.

A single unified telecom-reasoning corpus, not a stack of per-task datasets. Telecom concepts do not stay in one format: a scheduling rule can appear as prose in a standard, a row in a configuration table, a constraint in an equation, a pattern in a log, or logic inside code. We curate all five source families into one 158,915-example corpus indexed by reasoning axis and train one policy over the whole space, so that cross-modal reasoning is learned jointly rather than glued together at inference time.

A multi-stage post-training procedure that grounds general reasoning in telecom facts. Supervised fine-tuning installs the telecom "language" (how to read standards, follow protocol constraints, walk a log, close a derivation) that subsequent reinforcement learning then sharpens. Without this grounding step, RL amplifies fluent wrong reasoning: well-formed chains that happen to operate on hallucinated 3GPP clauses, mis-read log features, or unit-dropped derivations. The RL stage targets the three failure modes that naïve outcome-reward training suffers on heterogeneous telecom data (sparse final-answer signal, uneven learning progress across axes, and reward gaming via shortcut answers), with the full algorithmic details described in the accompanying paper.

The combined effect is what §1 reports: a single 27B open policy that reaches 89.6% average on the GSMA Open Telco Leaderboard, leading every open-source, frontier-closed, and operator-internal entrant.

Figure 3 | The end-to-end TelecomGPT-R1 recipe. Frame ① distills four families of heterogeneous telecom material — standards documents, network telemetry and drive-test logs, math papers and code, and Q&A seeds and glossaries — into a single axis-balanced curated corpus of 158,915 examples across four reasoning axes (protocol, knowledge, modeling, fault). Frame ② then drives the corpus through a three-stage post-training progression — domain grounding, policy stabilization, and verifiable reasoning refinement under axis-aligned signals — yielding TelecomGPT-R1.

4 — KU/DFI's Open Telecom-AI Program

TelecomGPT-R1 is the latest milestone in KU/DFI's open telecom-AI program: a focused effort to build auditable, reproducible, and domain-grounded foundation models for the telecom industry. The program started from telecom-language modeling, expanded into RF perception and network world modeling, and now moves toward standards-grounded reasoning for real telecom workflows.

Why KU/DFI

KU/DFI is positioned to lead open telecom AI because it combines three assets that are rarely found together: world-class wireless research leadership, a dedicated applied-AI institute, and direct engagement with telecom operators, vendors, and standards ecosystems.

The program is led by Prof. Mérouane Debbah, a leading figure in modern wireless communications whose work spans 4G small cells, 5G Massive MIMO, 6G intelligent surfaces, semantic communications, distributed AI, and foundation models for networks. This gives the program a critical advantage: KU/DFI is not adapting generic AI to telecom from the outside; it is building telecom AI from inside the discipline.

The Digital Future Institute (DFI) gives this long-running research trajectory an institutional home. Formally launched in January 2026, DFI was created as Khalifa University's applied AI and ICT institute to turn domain-specific foundation models, benchmarks, validation pilots, and deployable AI systems into real operational infrastructure.

In less than six months, that mandate has already become visible: KU/DFI has moved from prior telecom-AI research foundations to a coordinated open program spanning telecom-language modeling, RF understanding, network-world modeling, and standards-grounded reasoning. This speed is the central point: DFI did not start from zero; it concentrated years of wireless-AI expertise into an execution platform for open telecom AI.

What the program has already built

Large Generative AI Models for Telecom [Bariah et al., 2023]. Established the original vision that large generative models could become a foundation for self-evolving wireless networks, instead of remaining task-specific optimization tools.
Understanding Telecom Language Through Large Language Models [Bariah et al., 2023]. Demonstrated that LLMs can learn telecom standards language, using 3GPP technical documents as an early test case for telecom-domain adaptation.
TelecomGPT [Zou et al., 2025]. Built the first major telecom-specific LLM framework from the group, covering telecom standards, RAN logs, mathematical modeling, code tasks, and domain evaluation.
Seeing Radio [Zou et al., 2026]. Opened the RF-perception direction by showing that wireless signals can be converted into interpretable visual representations for multimodal AI models.
RF-GPT [Zou et al., 2026]. Delivered the program's first open RF foundation model, enabling LLM-style reasoning over RF spectrograms and wireless-spectrum scenes.
Telecom World Models [Zou et al., 2026]. Proposed a world-model architecture that unifies digital twins, foundation models, uncertainty-aware prediction, and action-conditioned planning for 6G networks.
RF-Analyzer [Bara et al., 2026]. Built an SDR-to-AI evaluation platform to test whether VLMs trained on synthetic RF spectrograms can generalize to real over-the-air wireless environments.
TelecomGPT-R1 [this work, 2026]. Extends the program from telecom knowledge and RF perception to standards-grounded reasoning, producing an open telecom reasoning model for verifiable decision support.

The open-program thesis

The core logic is simple: telecom AI cannot be led by closed models alone. Operators, vendors, regulators, and standards bodies need systems that can be inspected, benchmarked, reproduced, adapted, and deployed under real telecom constraints.

KU/DFI's role is to build that open commons. The program now spans the key layers of the future telecom-AI stack: telecom language, RF perception, network-world modeling, and reasoning. TelecomGPT-R1 is therefore a starting point, not an endpoint: the beginning of an open, full-stack telecom-AI foundation that the wider industry can audit, improve, and build upon.

Resources

Paper. [Coming soon!]
Model weights. KU-DFI/TelecomGPT-R1
Unified benchmark. GSMA Open Telco Leaderboard

Citation

@inproceedings{wang2026telecomgptr1,
  title     = {TelecomGPT-R1: Post-Training Recipes for Universal Reasoning in Telecom},
  author    = {Wang, Bohao and Wu, Chenwei and Li, Haoyu and Zou, Hang and Tian, Yu
               and Bariah, Lina and Huang, Chongwen and Shen, Yongliang and Zhang, Zhaoyang and Debbah, M\'{e}rouane},
  booktitle = {[Venue coming soon!]},
  year      = {2026}
}

@article{zou2025telecomgpt,
  title     ={Telecomgpt: A framework to build telecom-specific large language models},
  author    ={Zou, Hang and Zhao, Qiyang and Tian, Yu and Bariah, Lina and Bader, Faouzi and Lestable, Thierry and Debbah, M\'{e}rouane},
  journal   ={IEEE Transactions on Machine Learning in Communications and Networking},
  year      ={2025},
  publisher ={IEEE}
}

Acknowledgements

This work was supported by the Digital Future Institute of Khalifa University; the College of Information Science and Electronic Engineering, Zhejiang University; the College of Computer Science and Technology, Zhejiang University; and the Research Computing team of Khalifa University.

† On TeleTables, we follow the original paper's evaluation protocol by attaching the table content directly to the prompt — a table-grounded reasoning setup rather than retrieval without table id or content.