Skip to content

Fuse

Pull any GGUF model from HuggingFace. Extract structured data on CPU. No Pydantic boilerplate.

# one-shot extraction — no install needed
$ uvx fusellm extract "Sarah Chen, 34, architect at Stripe" \
  --model bartowski/Llama-3.2-1B-Instruct-GGUF \
  --fields "name:str,age:int,company:str"

{"name": "Sarah Chen", "age": 34, "company": "Stripe"}
Python 3.11+ CPU inference GGUF models Zero-shot JSON Schema LoRA fine-tuning

What is Fuse?

Fuse lets you pull any GGUF model from HuggingFace, run zero-shot structured extraction with dynamic schemas, fine-tune with LoRA, and export to GGUF for fast CPU inference.

Define what to extract — as a Python dict, JSON schema, or natural language description — and Fuse handles prompt construction, constrained generation, and JSON parsing.


How it works

flowchart LR
    S[Schema\ndict · JSON · description] --> E[Extractor]
    M[GGUF Model\nlocal or HuggingFace] --> B[LlamaCppBackend]
    B --> E
    E -->|constrained generation\nvia outlines| R[Structured JSON]

Key features

Zero-shot extraction

No training data needed. Pass a dict of fields and get structured JSON back from any instruction-tuned model.

🔬

Dynamic schemas

Build schemas from Python dicts, JSON Schema, or natural language. No Pydantic boilerplate required.

📥

HuggingFace auto-download

Pass a repo name and Fuse downloads the best Q4 GGUF automatically. Models are cached locally.

Fine-tune with LoRA

Train on your domain data with Unsloth or HuggingFace Transformers, then export to GGUF for deployment.


Quick example

import fuse

backend = fuse.LlamaCppBackend(model_name="bartowski/Llama-3.2-1B-Instruct-GGUF")
extractor = fuse.Extractor(backend)

result = extractor.extract_from_fields(
    "Sarah Chen is a 34-year-old software architect at Stripe.",
    {"name": str, "age": int, "job_title": str, "company": str}
)
# {'name': 'Sarah Chen', 'age': 34, 'job_title': 'software architect', 'company': 'Stripe'}

Supported models

Any GGUF model on HuggingFace works. Some good small models for CPU extraction:

Model Size HuggingFace Repo
Llama 3.2 1B Instruct ~1GB Q4 bartowski/Llama-3.2-1B-Instruct-GGUF
Llama 3.2 3B Instruct ~2GB Q4 bartowski/Llama-3.2-3B-Instruct-GGUF
Qwen 2.5 1.5B Instruct ~1GB Q4 bartowski/Qwen2.5-1.5B-Instruct-GGUF
Phi-4 Mini Instruct ~2.5GB Q4 bartowski/Phi-4-mini-instruct-GGUF

See all supported models