Concepts¶
Architecture¶
Fuse has three layers that are cleanly separated:
┌─────────────────────────────────┐
│ Extraction Layer │
│ Extractor · SchemaBuilder │
│ Prompts · JSON parsing │
├─────────────────────────────────┤
│ Inference Backend (Protocol) │
│ LlamaCppBackend (v0.1) │
│ RustBackend (future) │
├─────────────────────────────────┤
│ Model Resolution │
│ Local GGUF · HuggingFace Hub │
└─────────────────────────────────┘
The extraction layer never imports a concrete backend directly — it works against the InferenceBackend protocol. This means the backend can be swapped (e.g., from Python to Rust) without touching extraction code.
Inference backends¶
All backends implement the InferenceBackend protocol:
class InferenceBackend(Protocol):
def load(self, model_path: str, **kwargs) -> None: ...
def generate(self, prompt: str, *, max_tokens: int = 512, **kwargs) -> str: ...
def generate_structured(
self, prompt: str, json_schema: dict, *, max_tokens: int = 512, **kwargs
) -> dict: ...
LlamaCppBackend¶
The v0.1 backend uses llama-cpp-python for GGUF inference on CPU. Structured generation is powered by outlines for JSON schema constrained decoding.
import fuse
# From HuggingFace (auto-downloads best Q4 GGUF)
backend = fuse.LlamaCppBackend(model_name="bartowski/Llama-3.2-1B-Instruct-GGUF")
# From a local file
backend = fuse.LlamaCppBackend(model_path="./model.gguf")
# From config
config = fuse.InferenceConfig(model_name="bartowski/Phi-4-mini-instruct-GGUF", n_ctx=4096)
backend = fuse.LlamaCppBackend.from_config(config)
Schema builder¶
Fuse does not require predefined Pydantic models. Schemas are built dynamically using SchemaBuilder:
From a fields dict¶
The simplest approach — pass Python types directly:
You can also provide defaults:
From a JSON schema¶
Load a standard JSON Schema object:
schema = fuse.SchemaBuilder.from_json_schema({
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
},
"required": ["name", "age"],
})
From a description¶
Let the LLM infer the schema from natural language:
result = extractor.extract_from_description(
"The Series A raised $15M from Sequoia.",
"Extract monetary amounts, round type, and investors"
)
Extraction modes¶
The Extractor class supports three extraction modes:
| Mode | Method | Input | Output |
|---|---|---|---|
| Pydantic model | extract() |
type[BaseModel] |
BaseModel instance |
| Fields dict | extract_from_fields() |
dict[str, type] |
dict |
| Description | extract_from_description() |
str |
dict |
All modes use constrained generation under the hood — the model is forced to output valid JSON matching the schema.
Extraction with spans¶
Every extraction mode has a _with_spans variant that returns source text localization:
| Mode | Method | Output |
|---|---|---|
| Pydantic model | extract_with_spans() |
SpannedResult |
| Fields dict | extract_from_fields_with_spans() |
SpannedResult |
Each field in a SpannedResult includes:
- value — the extracted value
- evidence — a verbatim quote from the source text supporting the value
- is_explicit — whether the value appears word-for-word in the source
- span — character-offset
(start, end)in the source text
Fuse distinguishes between two types of extraction:
| Type | Example | Span points to |
|---|---|---|
| Explicit | Name: "Sarah Chen" (verbatim in text) | The value itself |
| Implicit | Sentiment: "negative" (inferred) | The evidence passage |
For explicit extractions, the value is a direct substring of the source — localization uses exact substring matching. For implicit extractions (e.g., sentiment, category), the model quotes the evidence passage verbatim, and the span points to that passage instead.
result = extractor.extract_with_spans(text, PersonSchema)
for field in result.fields:
print(f"{field.name}: {field.value}")
print(f" evidence: {field.evidence!r}")
print(f" type: {'explicit' if field.is_explicit else 'implicit'}")
if field.span:
print(f" source[{field.span.start}:{field.span.end}]")
HTML visualization¶
Generate an HTML page with color-coded highlighted spans:
from fuse.extraction.visualize import render_html
html = render_html(source_text, result)
Path("result.html").write_text(html)
Or via CLI:
The visualization uses solid outlines for explicit extractions and dashed outlines for implicit ones, with a legend showing all fields, their values, and character offsets.
Prompt formats¶
Fuse includes prompt templates for common model families:
| Format | Models | Template style |
|---|---|---|
llama |
Llama 3.x | <\|start_header_id\|>...<\|eot_id\|> |
chatml |
Qwen, Phi | <\|im_start\|>...<\|im_end\|> |
generic |
Fallback | <\|system\|>...<\|end\|> |
The prompt format is set via prompt_format in config files or defaults to llama.
Tip
llama-cpp-python automatically adds BOS tokens. Fuse's templates do not include <|begin_of_text|> to avoid duplication.
Model resolution¶
When you pass a HuggingFace repo name (e.g., bartowski/Llama-3.2-1B-Instruct-GGUF), Fuse:
- Lists all GGUF files in the repo
- Picks the best quantization (preference: Q4_K_M > Q4_K_S > Q4_0 > Q5_K_M > Q8_0)
- Downloads to
~/.cache/fuse/models/ - Returns the local path
You can override the filename with gguf_filename in config.
Training¶
Fuse supports fine-tuning with LoRA via two paths:
- Unsloth (preferred) — faster training with optimized kernels
- HuggingFace Transformers + PEFT — fallback when Unsloth is unavailable
Training produces a HuggingFace-format model that can be exported to GGUF for CPU deployment.
See Training Configuration for details.