Fuse

Pull any GGUF model from HuggingFace. Extract structured data on CPU. No Pydantic boilerplate.

# one-shot extraction — no install needed
$ uvx fusellm extract "Sarah Chen, 34, architect at Stripe" \
--model bartowski/Llama-3.2-1B-Instruct-GGUF \
--fields "name:str,age:int,company:str"

{"name": "Sarah Chen", "age": 34, "company": "Stripe"}

Python 3.11+ CPU inference GGUF models Zero-shot JSON Schema LoRA fine-tuning

What is Fuse?¶

Fuse lets you pull any GGUF model from HuggingFace, run zero-shot structured extraction with dynamic schemas, fine-tune with LoRA, and export to GGUF for fast CPU inference.

Define what to extract — as a Python dict, JSON schema, or natural language description — and Fuse handles prompt construction, constrained generation, and JSON parsing.

How it works¶

flowchart LR
    S[Schema\ndict · JSON · description] --> E[Extractor]
    M[GGUF Model\nlocal or HuggingFace] --> B[LlamaCppBackend]
    B --> E
    E -->|constrained generation\nvia outlines| R[Structured JSON]

Key features¶

⚡

Zero-shot extraction

No training data needed. Pass a dict of fields and get structured JSON back from any instruction-tuned model.

🔬

Dynamic schemas

Build schemas from Python dicts, JSON Schema, or natural language. No Pydantic boilerplate required.

📥

HuggingFace auto-download

Pass a repo name and Fuse downloads the best Q4 GGUF automatically. Models are cached locally.

⚙

Fine-tune with LoRA

Train on your domain data with Unsloth or HuggingFace Transformers, then export to GGUF for deployment.

Quick example¶

import fuse

backend = fuse.LlamaCppBackend(model_name="bartowski/Llama-3.2-1B-Instruct-GGUF")
extractor = fuse.Extractor(backend)

result = extractor.extract_from_fields(
    "Sarah Chen is a 34-year-old software architect at Stripe.",
    {"name": str, "age": int, "job_title": str, "company": str}
)
# {'name': 'Sarah Chen', 'age': 34, 'job_title': 'software architect', 'company': 'Stripe'}

Supported models¶

Any GGUF model on HuggingFace works. Some good small models for CPU extraction:

Model	Size	HuggingFace Repo
Llama 3.2 1B Instruct	~1GB Q4	`bartowski/Llama-3.2-1B-Instruct-GGUF`
Llama 3.2 3B Instruct	~2GB Q4	`bartowski/Llama-3.2-3B-Instruct-GGUF`
Qwen 2.5 1.5B Instruct	~1GB Q4	`bartowski/Qwen2.5-1.5B-Instruct-GGUF`
Phi-4 Mini Instruct	~2.5GB Q4	`bartowski/Phi-4-mini-instruct-GGUF`

See all supported models