Instructions to use MultiverseComputingCAI/Hypernova-60B-2602 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MultiverseComputingCAI/Hypernova-60B-2602 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MultiverseComputingCAI/Hypernova-60B-2602")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2602")
model = AutoModelForCausalLM.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2602")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MultiverseComputingCAI/Hypernova-60B-2602 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MultiverseComputingCAI/Hypernova-60B-2602"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2602",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2602

SGLang

How to use MultiverseComputingCAI/Hypernova-60B-2602 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MultiverseComputingCAI/Hypernova-60B-2602" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2602",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MultiverseComputingCAI/Hypernova-60B-2602" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MultiverseComputingCAI/Hypernova-60B-2602",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MultiverseComputingCAI/Hypernova-60B-2602 with Docker Model Runner:
```
docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2602
```

Hypernova-60B-2602 / README.md

aaronibanez

Update README.md

70af7b1 verified 2 months ago

preview code

raw

history blame contribute delete

14.8 kB

	---
	base_model:
	- openai/gpt-oss-120b
	- MultiverseComputingCAI/HyperNova-60B
	library_name: transformers
	license: apache-2.0
	---
	<div align="center">

	# HyperNova 60B 2602

	### Powered by CompactifAI

	[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
	[![HuggingFace](https://img.shields.io/badge/🤗-Model_Hub-yellow.svg)](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2602)
	[![Discord](https://img.shields.io/badge/Discord-Community-5865F2?logo=discord&logoColor=white)](https://discord.gg/cGas9uStqp)

	Optimized for Efficient Inference · Reduced Memory Footprint · Native Tool Calling Support

	</div>

	---

	## Table of Contents

	- [Highlights](#highlights)
	- [Model Overview](#model-overview)
	- [Key Characteristics](#key-characteristics)
	- [Quick Start](#quick-start)
	- [What's New in HyperNova 60B 2602](#whats-new-in-hypernova-60b-2602)
	- [Tool Calling](#tool-calling)
	- [Training & Fine-Tuning](#training--fine-tuning)
	- [Architecture](#architecture)
	- [Evaluation & Benchmarks](#evaluation--benchmarks)
	- [Languages](#languages)
	- [Intended Use](#intended-use)
	- [Safety & Limitations](#safety--limitations)
	- [Model Information](#model-information)
	- [Citation](#citation)

	---

	## Model Overview

	HyperNova 60B 2602 is a model developed based on [OpenAI’s gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b), developed by Multiverse Computing. The original gpt-oss-120b is an open-weight model (117B parameters, 5.1B active in MoE) designed for powerful reasoning, agentic tasks, and versatile developer use. This version is compressed with CompactifAI, Multiverse Computing’s proprietary technology, reducing parameter count and memory requirements while aiming to preserve strong reasoning.

	The model is instruction-tuned and supports native tool calling (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with lower memory footprint and deployment flexibility.

	## Technical Deep Dive
	For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B v2602, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2602-same-intelligence-half-the-size-improved-tool-calling-capability)

	---

	## Key Characteristics

	\| Characteristic \| Description \|
	\|-----------------------\|-------------\|
	\| Base model \| [OpenAI gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (117B params, MoE; open-weight, Apache 2.0) \|
	\| 🛠️ Tool calling \| Native support; OpenAI-style function / tool calling schemas; agentic use (e.g. function calling, structured outputs) \|
	\| 🧠 Parameters \| 60B total parameters after CompactifAI compression (reduced vs. base 117B) \|
	\| 📐 Architecture \| Decoder-only Transformer (from gpt-oss lineage) \|
	\| 🗜️ Compression \| CompactifAI (proprietary compression technology) \|
	\| Primary language \| English \|
	\| Other languages \| Not formally evaluated \|
	---
	## Quick Start
	This model can be loaded with the Transformers API. Use `trust_remote_code=True` (required for the gpt-oss architecture). Recommended approach: `AutoModelForCausalLM` with `apply_chat_template`:
	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	model_id = "MultiverseComputingCAI/HyperNova-60B-2602"
	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype="auto",
	trust_remote_code=True,
	)
	messages = [{"role": "user", "content": "What is a Hypernova?"}]
	inputs = tokenizer.apply_chat_template(
	messages,
	return_tensors="pt",
	add_generation_prompt=True,
	)
	inputs = inputs.to(model.device)
	attention_mask = torch.ones_like(inputs, dtype=torch.long, device=inputs.device)
	outputs = model.generate(
	inputs,
	max_new_tokens=512,
	do_sample=True,
	temperature=0.7,
	attention_mask=attention_mask,
	)
	reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
	print(reply)
	```
	Alternatively you can use the `pipeline` API with `trust_remote_code=True`; the pipeline returns the full conversation structure, so extract the assistant message from `outputs[0]["generated_text"]` as needed.

	---

	## What’s New in HyperNova 60B 2602

	HyperNova 60B 2602 is a model developed based on gpt-oss-120b, retaining the base model’s strengths while reducing memory and improving deployment flexibility.

	### Summary

	- Model developed based on [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b): Same Apache 2.0 license and design goals (reasoning, agentic tasks, tool use); smaller footprint via CompactifAI.
	- Tool use: Retains support for function calling, structured outputs, and agent-style workflows (OpenAI-style schemas).
	- Reasoning: Compatible with configurable reasoning effort (e.g. low / medium / high in system prompt) where the format is preserved; full chain-of-thought available for debugging and analysis.
	- Evaluated on tool-focused benchmarks (e.g. BFCL v4, Tau2-bench) and general benchmarks alongside other CompactifAI and gpt-oss variants.

	---

	## Tool Calling

	HyperNova 60B 2602 supports native tool use and is well-suited for:

	- Function calling with defined schemas
	- Structured outputs
	- Agentic operations (e.g. browser tasks, code execution where supported)

	The model can detect when to invoke tools, emit structured JSON tool calls, and consume tool outputs to continue generation. Tool-calling behavior follows OpenAI-style schemas; compatibility refers to format and structure—exact parity with the base or other models is not guaranteed.

	### Example Tool Call

	```json
	{
	"name": "get_weather",
	"arguments": {
	"city": "Paris",
	"date": "2026-02-10"
	}
	}
	```

	---

	## Training & Fine-Tuning

	### Base Model: gpt-oss-120b

	The base model [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) was trained on OpenAI’s harmony response format and is intended for use with that format for correct behavior. It supports configurable reasoning levels (low / medium / high) and native tool use. See the [original model card](https://huggingface.co/openai/gpt-oss-120b) and [arXiv:2508.10925](https://arxiv.org/abs/2508.10925) for details.

	### CompactifAI Compression & Optional Fine-Tuning

	- Compression: CompactifAI was applied to produce a smaller, efficient model (60B parameters) while aiming to preserve reasoning and tool-use capabilities.
	- Optional fine-tuning: This variant may include additional fine-tuning for tool calling and structured outputs; exact training details are model-specific.

	---

	## Architecture

	### Model Specifications

	\| Specification \| Value \|
	\|-------------------\|--------------------\|
	\| Base model \| [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (117B params, 5.1B active MoE) \|
	\| Total parameters \| 60B, 4.8B active MoE \|

	---

	## Evaluation & Benchmarks

	### Evaluation Methodology

	Benchmark scores were obtained with the following setups. Methodology varies by benchmark family.

	#### MMLU-Pro, AIME25, GPQA:d, LiveCodeBench

	- Evaluation framework: [Lighteval](https://github.com/huggingface/lighteval)
	- Inference library: vLLM 0.14.0
	- Reasoning effort: medium
	- Decoding: temperature = 0.6, max_tokens = 131072, top_p = 1.0, top_k = 0
	- Batch size: 64

	#### IFBench, AA-LCR, SciCode

	- Evaluation framework: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills)
	- Inference library: vLLM 0.14.0
	- Reasoning effort: medium
	- Decoding: temperature = 1.0, max_tokens = 131072, top_p = 1.0, top_k = 0
	- Batch size: 64

	#### BFCL v4 (17 splits)

	- Evaluation framework: [EvalScope](https://github.com/EvalScope/EvalScope) 1.4.1
	- Inference library: vLLM 0.14.0
	- Reasoning effort: high
	- Decoding: temperature = 0.6, max_tokens = 16384, parallel_tool_calls = true, tool-call parser openai

	#### Tau2-bench (Telecom)

	- Evaluation framework: [EvalScope](https://github.com/EvalScope/EvalScope) 1.4.1
	- Inference library: vLLM 0.14.0
	- Reasoning effort: high (agent `extra_body.reasoning_effort`)
	- Decoding (agent): temperature = 1.0, top_p = 1.0, min_tokens = 1
	- Decoding (judge / user simulator): temperature = 0.7, timeout = 600
	- Reproducibility: subset telecom (default); max steps 100; repeats 3; tool-call parser openai (agent), hermes (judge)

	#### Terminal-Bench Hard (Artificial Analysis subset):

	- Evaluation framework: laude-institute/harbor == 0.1.43
	- Inference library: vLLM == 0.15.0
	- Reasoning effort: high
	- Decoding: temperature = 1.0, top_p = 1.0, max-model-len = 131072
	- Reproducibility: subset from AA (https://artificialanalysis.ai/methodology/intelligence-benchmarking#terminal-bench-hard)
	- Agent: terminus-2, max episodes 100; repeats 3;

	### Quantitative Results (Reported & Planned)

	Scores are accuracy or benchmark-specific metrics. Use `—` or TBD for evaluations not yet run. Reported numbers use the methodology described above (reasoning: cai-eval + Nemo-skills; BFCL v4 and Tau2-bench: cai-eval + EvalScope); other entries to be documented.

	\| Benchmark \| gpt-oss-20b \| gpt-oss-120b \| HyperNova 60B 2602 \|
	\|-----------------------\|-----------------------\|------------------------\|--------------------------\|
	\| MMLU-Pro \| 74 \| 78 \| 74 \|
	\| BFCL v4 \| 61 \| 64 \| 62 \|
	\| Tau2-bench (Telecom) \| 59 \| 68 \| 61 \|
	\| AIME25 \| 72 \| 80 \| 76 \|
	\| GPQA:d \| 63 \| 69 \| 69 \|
	\| IFBench \| 55 \| 63 \| 60 \|
	\| SciCode \| 34 \| 38 \| 32 \|
	\| LiveCodeBench \| 64 \| 66 \| 64 \|
	\| Terminal Bench \| 9 \| 22 \| 16 \|
	\| AA-LCR \| 37 \| 50 \| 36 \|
	\| AA-Omnis. Index \| -40 \| -36 \| -41 \|
	\| AA-Omnis. Accuracy \| 16 \| 21 \| 15 \|

	![Intelligence](assets/intelligence.png)
	![Tool-calling](assets/tool-calling.png)

	### Quantitative Results (Inference Performance)

	Representative throughput and memory under the evaluation setup above. Comparison against gpt-oss-120b on the same hardware.

	#### Performance evaluation conditions

	- Inference library: vLLM 0.14.0
	- Hardware: 1× NVIDIA H200 Tensor Core GPU
	- Conditions: concurrency=128

	Summary of Improvements:

	- Throughput (tok/s): Hypernova is 39.5% faster
	- Median TTFT (ms): Hypernova is 50.8% faster


	![Performance](assets/performance.png)

	---

	## Languages

	- Primary language: English
	- Other languages: Not formally evaluated

	The model was trained primarily on English-language data. Performance on other languages may vary and has not been systematically measured.

	---

	## Intended Use

	### Recommended Use Cases

	Aligned with [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) use cases, with the benefit of a smaller footprint:

	- Reasoning and analysis (with configurable reasoning effort where supported)
	- Tool-augmented and agentic applications (function calling, web browsing, code execution, structured outputs)
	- Code generation and reasoning
	- Chatbots and virtual assistants
	- Retrieval-augmented generation (RAG)
	- Deployments where gpt-oss-120b is desirable but memory or latency is constrained

	### Out-of-Scope Uses

	- Harmful, illegal, or deceptive content generation
	- Impersonation of real individuals without consent
	- High-risk decision-making without human oversight
	- Surveillance or tracking of individuals
	- Any use that violates applicable laws or regulations

	---

	## Safety & Limitations

	### Known Limitations

	- English-centric training data (inherited from base model).
	- Format: For best results, use the same [harmony response format](https://huggingface.co/openai/gpt-oss-120b) as gpt-oss-120b where applicable; behavior may differ otherwise.
	- Tool calling depends on correct schema and tool design; exact parity with gpt-oss-120b or other models is not guaranteed.
	- Compression may affect some behaviors; evaluate for your use case.

	### Recommendations

	- Validate tool outputs before execution
	- Use human oversight for critical applications
	- Perform task-specific evaluation prior to deployment

	---

	## Model Information

	\| Field \| Value \|
	\|--------------\|--------------------- \|
	\| Model name \| HyperNova 60B 2602 \|
	\| Based on \| [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) \|
	\| Version \| 2602 \|
	\| Release date \| 26/02/2026 \|
	\| Developed by \| Multiverse Computing \|
	\| License \| Apache 2.0 \|
	\| Contact \| business@multiversecomputing.com \|

	---

	## Citation

	If you use this model, please cite the base model and this variant:

	```bibtex
	@misc{openai2025gptoss120b,
	title = {gpt-oss-120b \& gpt-oss-20b Model Card},
	author = {OpenAI},
	year = {2025},
	eprint = {2508.10925},
	archivePrefix = {arXiv},
	primaryClass = {cs.CL},
	url = {https://arxiv.org/abs/2508.10925}
	}
	@misc{hypernova60b2602,
	title = {HyperNova 60B 2602: Model developed based on gpt-oss-120b},
	author = {Multiverse Computing},
	year = {2026},
	url = {https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2602},
	note = {Model developed based on openai/gpt-oss-120b using CompactifAI technology}
	}
	```

	Built by [Multiverse Computing](https://www.multiversecomputing.com) · [Report an issue](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2602/discussions) · [Discord](https://discord.gg/8mT9FveN)