Instructions to use MultiverseComputingCAI/Hypernova-60B-2602 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MultiverseComputingCAI/Hypernova-60B-2602 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MultiverseComputingCAI/Hypernova-60B-2602") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2602") model = AutoModelForCausalLM.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2602") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MultiverseComputingCAI/Hypernova-60B-2602 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MultiverseComputingCAI/Hypernova-60B-2602" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MultiverseComputingCAI/Hypernova-60B-2602", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2602
- SGLang
How to use MultiverseComputingCAI/Hypernova-60B-2602 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MultiverseComputingCAI/Hypernova-60B-2602" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MultiverseComputingCAI/Hypernova-60B-2602", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MultiverseComputingCAI/Hypernova-60B-2602" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MultiverseComputingCAI/Hypernova-60B-2602", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use MultiverseComputingCAI/Hypernova-60B-2602 with Docker Model Runner:
docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2602
| base_model: | |
| - openai/gpt-oss-120b | |
| - MultiverseComputingCAI/HyperNova-60B | |
| library_name: transformers | |
| license: apache-2.0 | |
| <div align="center"> | |
| # HyperNova 60B 2602 | |
| ### Powered by CompactifAI | |
| [](https://opensource.org/licenses/Apache-2.0) | |
| [](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2602) | |
| [](https://discord.gg/cGas9uStqp) | |
| **Optimized for Efficient Inference** · **Reduced Memory Footprint** · **Native Tool Calling Support** | |
| </div> | |
| --- | |
| ## Table of Contents | |
| - [Highlights](#highlights) | |
| - [Model Overview](#model-overview) | |
| - [Key Characteristics](#key-characteristics) | |
| - [Quick Start](#quick-start) | |
| - [What's New in HyperNova 60B 2602](#whats-new-in-hypernova-60b-2602) | |
| - [Tool Calling](#tool-calling) | |
| - [Training & Fine-Tuning](#training--fine-tuning) | |
| - [Architecture](#architecture) | |
| - [Evaluation & Benchmarks](#evaluation--benchmarks) | |
| - [Languages](#languages) | |
| - [Intended Use](#intended-use) | |
| - [Safety & Limitations](#safety--limitations) | |
| - [Model Information](#model-information) | |
| - [Citation](#citation) | |
| --- | |
| ## Model Overview | |
| **HyperNova 60B 2602** is a **model developed based on [OpenAI’s gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b)**, developed by **Multiverse Computing**. The original gpt-oss-120b is an open-weight model (117B parameters, 5.1B active in MoE) designed for powerful reasoning, agentic tasks, and versatile developer use. This version is compressed with **CompactifAI**, Multiverse Computing’s proprietary technology, reducing parameter count and memory requirements while aiming to preserve strong reasoning. | |
| The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2602 is intended for the same broad use cases as gpt-oss-120b—reasoning, code generation, RAG, and tool-augmented applications—with **lower memory footprint** and deployment flexibility. | |
| ## Technical Deep Dive | |
| For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B v2602, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2602-same-intelligence-half-the-size-improved-tool-calling-capability) | |
| --- | |
| ## Key Characteristics | |
| | Characteristic | Description | | |
| |-----------------------|-------------| | |
| | Base model | [OpenAI gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (117B params, MoE; open-weight, Apache 2.0) | | |
| | 🛠️ **Tool calling** | Native support; OpenAI-style function / tool calling schemas; agentic use (e.g. function calling, structured outputs) | | |
| | 🧠 **Parameters** | 60B total parameters after CompactifAI compression (reduced vs. base 117B) | | |
| | 📐 **Architecture** | Decoder-only Transformer (from gpt-oss lineage) | | |
| | 🗜️ **Compression** | CompactifAI (proprietary compression technology) | | |
| | Primary language | English | | |
| | Other languages | Not formally evaluated | | |
| --- | |
| ## Quick Start | |
| This model can be loaded with the **Transformers** API. Use `trust_remote_code=True` (required for the gpt-oss architecture). Recommended approach: `AutoModelForCausalLM` with `apply_chat_template`: | |
| ```python | |
| import torch | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_id = "MultiverseComputingCAI/HyperNova-60B-2602" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| device_map="auto", | |
| torch_dtype="auto", | |
| trust_remote_code=True, | |
| ) | |
| messages = [{"role": "user", "content": "What is a Hypernova?"}] | |
| inputs = tokenizer.apply_chat_template( | |
| messages, | |
| return_tensors="pt", | |
| add_generation_prompt=True, | |
| ) | |
| inputs = inputs.to(model.device) | |
| attention_mask = torch.ones_like(inputs, dtype=torch.long, device=inputs.device) | |
| outputs = model.generate( | |
| inputs, | |
| max_new_tokens=512, | |
| do_sample=True, | |
| temperature=0.7, | |
| attention_mask=attention_mask, | |
| ) | |
| reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True) | |
| print(reply) | |
| ``` | |
| Alternatively you can use the `pipeline` API with `trust_remote_code=True`; the pipeline returns the full conversation structure, so extract the assistant message from `outputs[0]["generated_text"]` as needed. | |
| --- | |
| ## What’s New in HyperNova 60B 2602 | |
| **HyperNova 60B 2602** is a model developed based on **gpt-oss-120b**, retaining the base model’s strengths while reducing memory and improving deployment flexibility. | |
| ### Summary | |
| - **Model developed based on [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b):** Same Apache 2.0 license and design goals (reasoning, agentic tasks, tool use); smaller footprint via CompactifAI. | |
| - **Tool use:** Retains support for function calling, structured outputs, and agent-style workflows (OpenAI-style schemas). | |
| - **Reasoning:** Compatible with configurable reasoning effort (e.g. low / medium / high in system prompt) where the format is preserved; full chain-of-thought available for debugging and analysis. | |
| - **Evaluated** on tool-focused benchmarks (e.g. BFCL v4, Tau2-bench) and general benchmarks alongside other CompactifAI and gpt-oss variants. | |
| --- | |
| ## Tool Calling | |
| HyperNova 60B 2602 supports **native tool use** and is well-suited for: | |
| - **Function calling** with defined schemas | |
| - **Structured outputs** | |
| - **Agentic operations** (e.g. browser tasks, code execution where supported) | |
| The model can detect when to invoke tools, emit structured JSON tool calls, and consume tool outputs to continue generation. Tool-calling behavior follows **OpenAI-style schemas**; compatibility refers to format and structure—exact parity with the base or other models is not guaranteed. | |
| ### Example Tool Call | |
| ```json | |
| { | |
| "name": "get_weather", | |
| "arguments": { | |
| "city": "Paris", | |
| "date": "2026-02-10" | |
| } | |
| } | |
| ``` | |
| --- | |
| ## Training & Fine-Tuning | |
| ### Base Model: gpt-oss-120b | |
| The base model [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) was trained on OpenAI’s **harmony response format** and is intended for use with that format for correct behavior. It supports configurable reasoning levels (low / medium / high) and native tool use. See the [original model card](https://huggingface.co/openai/gpt-oss-120b) and [arXiv:2508.10925](https://arxiv.org/abs/2508.10925) for details. | |
| ### CompactifAI Compression & Optional Fine-Tuning | |
| - **Compression:** CompactifAI was applied to produce a smaller, efficient model (60B parameters) while aiming to preserve reasoning and tool-use capabilities. | |
| - **Optional fine-tuning:** This variant may include additional fine-tuning for tool calling and structured outputs; exact training details are model-specific. | |
| --- | |
| ## Architecture | |
| ### Model Specifications | |
| | Specification | Value | | |
| |-------------------|--------------------| | |
| | Base model | [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (117B params, 5.1B active MoE) | | |
| | Total parameters | 60B, 4.8B active MoE | | |
| --- | |
| ## Evaluation & Benchmarks | |
| ### Evaluation Methodology | |
| Benchmark scores were obtained with the following setups. Methodology varies by benchmark family. | |
| #### MMLU-Pro, AIME25, GPQA:d, LiveCodeBench | |
| - **Evaluation framework**: [Lighteval](https://github.com/huggingface/lighteval) | |
| - **Inference library**: vLLM 0.14.0 | |
| - **Reasoning effort**: medium | |
| - **Decoding**: temperature = 0.6, max_tokens = 131072, top_p = 1.0, top_k = 0 | |
| - **Batch size**: 64 | |
| #### IFBench, AA-LCR, SciCode | |
| - **Evaluation framework**: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills) | |
| - **Inference library**: vLLM 0.14.0 | |
| - **Reasoning effort**: medium | |
| - **Decoding**: temperature = 1.0, max_tokens = 131072, top_p = 1.0, top_k = 0 | |
| - **Batch size**: 64 | |
| #### BFCL v4 (17 splits) | |
| - **Evaluation framework**: [EvalScope](https://github.com/EvalScope/EvalScope) 1.4.1 | |
| - **Inference library**: vLLM 0.14.0 | |
| - **Reasoning effort**: high | |
| - **Decoding**: temperature = 0.6, max_tokens = 16384, parallel_tool_calls = true, tool-call parser openai | |
| #### Tau2-bench (Telecom) | |
| - **Evaluation framework**: [EvalScope](https://github.com/EvalScope/EvalScope) 1.4.1 | |
| - **Inference library**: vLLM 0.14.0 | |
| - **Reasoning effort**: high (agent `extra_body.reasoning_effort`) | |
| - **Decoding (agent)**: temperature = 1.0, top_p = 1.0, min_tokens = 1 | |
| - **Decoding (judge / user simulator)**: temperature = 0.7, timeout = 600 | |
| - **Reproducibility**: subset telecom (default); max steps 100; repeats 3; tool-call parser openai (agent), hermes (judge) | |
| #### Terminal-Bench Hard (Artificial Analysis subset): | |
| - **Evaluation framework**: laude-institute/harbor == 0.1.43 | |
| - **Inference library**: vLLM == 0.15.0 | |
| - **Reasoning effort**: high | |
| - **Decoding**: temperature = 1.0, top_p = 1.0, max-model-len = 131072 | |
| - **Reproducibility**: subset from AA (https://artificialanalysis.ai/methodology/intelligence-benchmarking#terminal-bench-hard) | |
| - **Agent**: terminus-2, max episodes 100; repeats 3; | |
| ### Quantitative Results (Reported & Planned) | |
| Scores are accuracy or benchmark-specific metrics. Use `—` or *TBD* for evaluations not yet run. Reported numbers use the methodology described above (reasoning: cai-eval + Nemo-skills; BFCL v4 and Tau2-bench: cai-eval + EvalScope); other entries to be documented. | |
| | Benchmark | gpt-oss-20b | gpt-oss-120b | HyperNova 60B 2602 | | |
| |-----------------------|-----------------------|------------------------|--------------------------| | |
| | MMLU-Pro | 74 | 78 | 74 | | |
| | BFCL v4 | 61 | 64 | 62 | | |
| | Tau2-bench (Telecom) | 59 | 68 | 61 | | |
| | AIME25 | 72 | 80 | 76 | | |
| | GPQA:d | 63 | 69 | 69 | | |
| | IFBench | 55 | 63 | 60 | | |
| | SciCode | 34 | 38 | 32 | | |
| | LiveCodeBench | 64 | 66 | 64 | | |
| | Terminal Bench | 9 | 22 | 16 | | |
| | AA-LCR | 37 | 50 | 36 | | |
| | AA-Omnis. Index | -40 | -36 | -41 | | |
| | AA-Omnis. Accuracy | 16 | 21 | 15 | | |
|  | |
|  | |
| ### Quantitative Results (Inference Performance) | |
| Representative throughput and memory under the evaluation setup above. Comparison against **gpt-oss-120b** on the same hardware. | |
| #### Performance evaluation conditions | |
| - **Inference library**: vLLM 0.14.0 | |
| - **Hardware**: 1× NVIDIA H200 Tensor Core GPU | |
| - **Conditions**: concurrency=128 | |
| **Summary of Improvements:** | |
| - **Throughput (tok/s)**: Hypernova is 39.5% faster | |
| - **Median TTFT (ms)**: Hypernova is 50.8% faster | |
|  | |
| --- | |
| ## Languages | |
| - **Primary language**: English | |
| - **Other languages**: Not formally evaluated | |
| The model was trained primarily on English-language data. Performance on other languages may vary and has not been systematically measured. | |
| --- | |
| ## Intended Use | |
| ### Recommended Use Cases | |
| Aligned with [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) use cases, with the benefit of a smaller footprint: | |
| - **Reasoning and analysis** (with configurable reasoning effort where supported) | |
| - **Tool-augmented and agentic applications** (function calling, web browsing, code execution, structured outputs) | |
| - **Code generation and reasoning** | |
| - **Chatbots and virtual assistants** | |
| - **Retrieval-augmented generation (RAG)** | |
| - **Deployments** where gpt-oss-120b is desirable but memory or latency is constrained | |
| ### Out-of-Scope Uses | |
| - Harmful, illegal, or deceptive content generation | |
| - Impersonation of real individuals without consent | |
| - High-risk decision-making without human oversight | |
| - Surveillance or tracking of individuals | |
| - Any use that violates applicable laws or regulations | |
| --- | |
| ## Safety & Limitations | |
| ### Known Limitations | |
| - **English-centric** training data (inherited from base model). | |
| - **Format:** For best results, use the same [harmony response format](https://huggingface.co/openai/gpt-oss-120b) as gpt-oss-120b where applicable; behavior may differ otherwise. | |
| - **Tool calling** depends on correct schema and tool design; exact parity with gpt-oss-120b or other models is not guaranteed. | |
| - **Compression** may affect some behaviors; evaluate for your use case. | |
| ### Recommendations | |
| - Validate tool outputs before execution | |
| - Use human oversight for critical applications | |
| - Perform task-specific evaluation prior to deployment | |
| --- | |
| ## Model Information | |
| | Field | Value | | |
| |--------------|--------------------- | | |
| | Model name | HyperNova 60B 2602 | | |
| | Based on | [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) | | |
| | Version | 2602 | | |
| | Release date | 26/02/2026 | | |
| | Developed by | Multiverse Computing | | |
| | License | Apache 2.0 | | |
| | Contact | business@multiversecomputing.com | | |
| --- | |
| ## Citation | |
| If you use this model, please cite the base model and this variant: | |
| ```bibtex | |
| @misc{openai2025gptoss120b, | |
| title = {gpt-oss-120b \& gpt-oss-20b Model Card}, | |
| author = {OpenAI}, | |
| year = {2025}, | |
| eprint = {2508.10925}, | |
| archivePrefix = {arXiv}, | |
| primaryClass = {cs.CL}, | |
| url = {https://arxiv.org/abs/2508.10925} | |
| } | |
| @misc{hypernova60b2602, | |
| title = {HyperNova 60B 2602: Model developed based on gpt-oss-120b}, | |
| author = {Multiverse Computing}, | |
| year = {2026}, | |
| url = {https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2602}, | |
| note = {Model developed based on openai/gpt-oss-120b using CompactifAI technology} | |
| } | |
| ``` | |
| **Built by [Multiverse Computing](https://www.multiversecomputing.com)** · [Report an issue](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2602/discussions) · [Discord](https://discord.gg/8mT9FveN) |