Instructions to use MayankLad31/invoice_schema with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MayankLad31/invoice_schema with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MayankLad31/invoice_schema",
	filename="inv.Q8_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MayankLad31/invoice_schema with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MayankLad31/invoice_schema:Q8_0
# Run inference directly in the terminal:
llama-cli -hf MayankLad31/invoice_schema:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MayankLad31/invoice_schema:Q8_0
# Run inference directly in the terminal:
llama-cli -hf MayankLad31/invoice_schema:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MayankLad31/invoice_schema:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf MayankLad31/invoice_schema:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MayankLad31/invoice_schema:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MayankLad31/invoice_schema:Q8_0

Use Docker

docker model run hf.co/MayankLad31/invoice_schema:Q8_0

LM Studio
Jan
Ollama
How to use MayankLad31/invoice_schema with Ollama:
```
ollama run hf.co/MayankLad31/invoice_schema:Q8_0
```

Unsloth Studio new

How to use MayankLad31/invoice_schema with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MayankLad31/invoice_schema to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MayankLad31/invoice_schema to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MayankLad31/invoice_schema to start chatting

Pi new

How to use MayankLad31/invoice_schema with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MayankLad31/invoice_schema:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MayankLad31/invoice_schema:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MayankLad31/invoice_schema with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MayankLad31/invoice_schema:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MayankLad31/invoice_schema:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use MayankLad31/invoice_schema with Docker Model Runner:
```
docker model run hf.co/MayankLad31/invoice_schema:Q8_0
```

Lemonade

How to use MayankLad31/invoice_schema with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MayankLad31/invoice_schema:Q8_0

Run and chat with the model

lemonade run user.invoice_schema-Q8_0

List all available models

lemonade list

Model Description

Extract text in any user specified schema.

Update: I finetuned qwen3.5 0.8B a bit. We now have qwen_finetune.Q8_0.gguf and qwen_finetune.F16-mmproj.gguf that you may use. I found it was already decent and needed just a little finetuning. Used unsloth.

Previously, I took a small 1.5B model fine-tuned with RL (GRPO on Qwen2.5-Coder) and asked it to extract structured JSON from OCR text based on any user-defined schema. You can find the model and the gguf.(100% local). Although not completely happy with my training and it still needs more work but it works! You can ignore that now.

How to Get Started with the Model

Use qwen_finetune.Q8_0.gguf and qwen_finetune.F16-mmproj.gguf. Ignore other files.

Start llama server(for cpu)

llama-server \
  -m qwen_finetune.Q8_0.gguf \
  --mmproj qwen_finetune.F16-mmproj.gguf \
  --host 0.0.0.0 \
  --port 8000 \
  --jinja \
  --reasoning off \
  -ngl 0 \
  -t 4 \
  -n 1024

You can the use openai sdk as follows, you can specify any schema like in the example below for your invoice

from openai import OpenAI
import base64

# Use 'base_url' instead of 'base_client_url'
client = OpenAI(
    base_url="http://localhost:8000/v1", 
    api_key="sk-no-key-required"
)

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

base64_image = encode_image("out.jpeg")

response = client.chat.completions.create(
    model="local-model",
    messages=[
        {
            "role": "user",
            "content": [
            {"type": "text", "text": """Extract the data in JSON format using the schema: `{ "date": "string", "invoice_id": "string","all_items":[//list of items {"description":"string","quantity":"number","unit_price":"number","line_total":"number"}]}`"""},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
            ],
        }
    ],
    temperature=0.1,
    # 'min_p' is passed via 'extra_body' for OpenAI-compatible local servers
    extra_body={"min_p": 0.1} 
)

print(response.choices[0].message.content)

We asked for Extract the data in JSON format using the schema: { "date": "string", "invoice_id": "string","bill_to":"string" // name and address,"ship_to":"string","all_items":[//list of items {"description":"string","quantity":"number","unit_price":"number","line_total":"number"}],"total":"number"}

Example response for the below image(a random invoice just for testing purposes, I am not the owner) with the code above

{'date': 'August 20, 2006', 'invoice_id': 'INV1048', 'bill_to': 'C1003, Test Customer Two, 88 WILLIAM Square, Sydney 12345, Australia', 'ship_to': '', 'all_items': [{'description': 'Very long product description that occupies more than 1 line - in fact, it occupies 2 lines', 'quantity': 1, 'unit_price': 199.99, 'line_total': 199.99}, {'description': 'One line product description', 'quantity': 2, 'unit_price': 420.0, 'line_total': 840.0}], 'total': 1140.87}

Connect with me on linkedin if you have an interesting project.

https://www.linkedin.com/in/mayankladdha31/

Previous model(in case you still want to try that,but i would not recommend it): inv.Q8_0.gguf.

Use it in combination with paddleocr. Define any schema and hopefully you get the json. Needs some more work but it still works!

from llama_cpp import Llama
from paddleocr import PaddleOCR
text = ""
ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr("test_image.jpg", cls=True)
for idx in range(len(result)):
    res = result[idx]
    for line in res:
        text =  text + line[-1][0]+ "\n"
        
llm = Llama(model_path="inv.Q8_0.gguf",n_ctx=2048)

import re

def extract_largest_json_block(text):
    pattern = r"```json\s*(.*?)\s*```"
    blocks = re.findall(pattern, text, re.DOTALL)
    if not blocks:
        return None
    return max(blocks, key=len)


def extract_xml_answer(text: str) -> str:
    answer = text.split("<answer>")[-1]
    answer = answer.split("</answer>")[0]
    return extract_largest_json_block(answer.strip())

messages = [
    {"role": "system", "content": """Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
```json 
```
</answer>"""},
    {"role": "user", "content": f"{text}\n"+"""

Extract the data in JSON format using the schema: 

{
  "invoice_no":"string",
  "issued_to": {
    "name": "string", 
    "address": "string" // Address of the client
  },
  "pay_to": {
    "bank_name": "string",  // Name of the bank
    "name": "string", // Name 
    "account_no": "number" 
  },
  "items":[
      {
        "description": "string",
        "quantity": "number",
        "unit_price": "number",
        "total":"number"
      }
    ],
  "subtotal":"number",
  "total":"number"
} """},
]

output = llm.create_chat_completion(messages,max_tokens=1000)

print(extract_xml_answer(output['choices'][0]['message']['content']))
llm._sampler.close()
llm.close()

Downloads last month: 485

Safetensors

Model size

2B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MayankLad31/invoice_schema

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Finetuned

unsloth/Qwen3.5-0.8B

Quantized

(6)

this model