Skip to content

Extraction Configuration

ExtractConfig

The ExtractConfig class bundles model settings with schema and prompt options for extraction.

YAML format

model:
  model_name: "bartowski/Llama-3.2-1B-Instruct-GGUF"
  n_ctx: 2048
  temperature: 0.0

fields:
  name: str
  age: int
  job_title: str
  company: str

prompt_format: llama
max_tokens: 256

Parameters

Parameter Type Default Description
model InferenceConfig required Model configuration (see Inference)
schema_file Path \| None None Path to a JSON schema file
fields dict \| None None Field name to type mapping
description str \| None None Natural language description of what to extract
prompt_format str "llama" Prompt template format: llama, chatml, or generic
max_tokens int 512 Maximum tokens to generate

Tip

You must provide exactly one of fields, schema_file, or description.


Schema input methods

Fields dict

Define fields inline with Python type names:

fields:
  name: str
  age: int
  skills: list[str]
  active: bool

CLI equivalent:

fuse extract "some text" --model repo/name --fields "name:str,age:int,skills:list[str]"

JSON schema file

Point to a standard JSON Schema:

schema_file: "./schemas/person.json"

person.json:

{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "age": {"type": "integer"},
    "skills": {
      "type": "array",
      "items": {"type": "string"}
    }
  },
  "required": ["name", "age"]
}

CLI equivalent:

fuse extract "some text" --model repo/name --schema person.json

Description

Let the LLM design the schema from a natural language description:

description: "Extract person name, age, and list of programming skills"

Prompt formats

Choose the format matching your model family:

Format Use with Example models
llama (default) Llama 3.x family Llama 3.2 1B/3B
chatml ChatML-based models Qwen 2.5, Phi-4
generic Other instruction-tuned models Fallback
prompt_format: chatml  # for Qwen or Phi models

Evidence spans and visualization

Any extraction mode supports evidence spans via the CLI flags --spans and --html:

# Table output with evidence, type, and character offsets
fuse extract "..." --config extract_person.yaml --spans

# HTML file with color-coded highlighted spans
fuse extract "..." --config extract_person.yaml --html result.html

In Python, use the _with_spans extraction methods:

result = extractor.extract_with_spans(text, schema)
result = extractor.extract_from_fields_with_spans(text, fields)

Each field in the result includes:

  • value — the extracted value
  • evidence — verbatim quote from the source text
  • is_explicittrue if the value appears word-for-word in the source
  • span — character-offset (start, end) in the source text

See Concepts — Extraction with spans for details on explicit vs. implicit extractions.


Example configs

Person extraction

model:
  model_name: "bartowski/Llama-3.2-1B-Instruct-GGUF"
  n_ctx: 2048
  temperature: 0.0

fields:
  name: str
  age: int
  job_title: str
  company: str

prompt_format: llama
max_tokens: 256

Company extraction with Phi-4

model:
  model_name: "bartowski/Phi-4-mini-instruct-GGUF"
  n_ctx: 4096
  temperature: 0.0

fields:
  company_name: str
  founded_year: int
  industry: str
  headquarters: str

prompt_format: chatml
max_tokens: 256

Schema-based extraction

model:
  model_name: "bartowski/Llama-3.2-3B-Instruct-GGUF"
  n_ctx: 2048
  temperature: 0.0

schema_file: "./schemas/person.json"
prompt_format: llama
max_tokens: 512