Ollama support - io.yaml files and gguf weights
Summary
Adds Ollama support for all 4 GPT-OSS-20B LoRA adapters (answerability, citations, hallucination_detection, query_rewrite).
What's included
io.yamlconfig files for each LoRA adapter (converted from the original Pythonio.yamlformat to Ollama-compatible format)Lora-q8_0.ggufβ pre-converted GGUF LoRA weights (Q8_0 quantization) for each adapterModelfileβ Ollama Modelfile for each adapter (referencesgpt-oss:20bbase model)run_ollama.shβ script to load all LoRA adapters into a running Ollama instance_ollama/convert_to_gguf.shβ conversion script that downloads the base model, clones llama.cpp, and converts all LoRA adapters from safetensors to GGUF_ollama/convert_io_yaml_files.pyβ helper to convert io.yaml files
llama.cpp patches required
The convert_to_gguf.sh script clones llama.cpp from master, but two local patches are needed for the answerability and query_rewrite adapters (which target MoE expert layers via PEFT target_parameters):
convert_lora_to_gguf.pyβ Remap PEFT'sbase_layernaming convention to actual HuggingFace tensor names; split interleavedgate_up_projLoRA into separate gate/up LoRAs; bypass MXFP4 codepath for LoRA tensorsgguf-py/gguf/gguf_writer.pyβ Handle 2D expert LoRA tensors in parameter counting
These patches are not yet upstreamed β no existing issue or PR on ggml-org/llama.cpp addresses this. The GGUF files in this PR were generated with these patches applied.
Usage
# 1. Have Ollama running with gpt-oss:20b loaded
# 2. From the repo root:
bash run_ollama.sh
From https://github.com/ibm-granite/granite-common/pull/134#issuecomment-3994228106
Ok, turns out Ollama has a new engine that is required by but not yet supporting gpt-oss LoRAs. Claude found the issue in the Ollama code:
Ollama has two inference runners β the older llamarunner (C++ based, supports LoRA) and the newer ollamarunner (Go based, LoRA is a TODO). Models like gpt-oss, deepseek2, gemma3, qwen3, llama4, etc. are hardcoded to require the
ollamarunner via OllamaEngineRequired(). So LoRA adapters cannot be used with any of these models. The code has TODO(jessegross): LoRA loading but no issue or PR tracking it.
Closing this PR until support is available.