Extraction Configuration¶
ExtractConfig¶
The ExtractConfig class bundles model settings with schema and prompt options for extraction.
YAML format¶
model:
model_name: "bartowski/Llama-3.2-1B-Instruct-GGUF"
n_ctx: 2048
temperature: 0.0
fields:
name: str
age: int
job_title: str
company: str
prompt_format: llama
max_tokens: 256
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
InferenceConfig |
required | Model configuration (see Inference) |
schema_file |
Path \| None |
None |
Path to a JSON schema file |
fields |
dict \| None |
None |
Field name to type mapping |
description |
str \| None |
None |
Natural language description of what to extract |
prompt_format |
str |
"llama" |
Prompt template format: llama, chatml, or generic |
max_tokens |
int |
512 |
Maximum tokens to generate |
Tip
You must provide exactly one of fields, schema_file, or description.
Schema input methods¶
Fields dict¶
Define fields inline with Python type names:
CLI equivalent:
JSON schema file¶
Point to a standard JSON Schema:
person.json:
{
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"skills": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["name", "age"]
}
CLI equivalent:
Description¶
Let the LLM design the schema from a natural language description:
Prompt formats¶
Choose the format matching your model family:
| Format | Use with | Example models |
|---|---|---|
llama (default) |
Llama 3.x family | Llama 3.2 1B/3B |
chatml |
ChatML-based models | Qwen 2.5, Phi-4 |
generic |
Other instruction-tuned models | Fallback |
Evidence spans and visualization¶
Any extraction mode supports evidence spans via the CLI flags --spans and --html:
# Table output with evidence, type, and character offsets
fuse extract "..." --config extract_person.yaml --spans
# HTML file with color-coded highlighted spans
fuse extract "..." --config extract_person.yaml --html result.html
In Python, use the _with_spans extraction methods:
result = extractor.extract_with_spans(text, schema)
result = extractor.extract_from_fields_with_spans(text, fields)
Each field in the result includes:
value— the extracted valueevidence— verbatim quote from the source textis_explicit—trueif the value appears word-for-word in the sourcespan— character-offset(start, end)in the source text
See Concepts — Extraction with spans for details on explicit vs. implicit extractions.
Example configs¶
Person extraction¶
model:
model_name: "bartowski/Llama-3.2-1B-Instruct-GGUF"
n_ctx: 2048
temperature: 0.0
fields:
name: str
age: int
job_title: str
company: str
prompt_format: llama
max_tokens: 256
Company extraction with Phi-4¶
model:
model_name: "bartowski/Phi-4-mini-instruct-GGUF"
n_ctx: 4096
temperature: 0.0
fields:
company_name: str
founded_year: int
industry: str
headquarters: str
prompt_format: chatml
max_tokens: 256