How does Outlines work?
Language models generate text token by token, using the previous token sequence as input and sampled logits as output. This document explains the structured generation process, where only legal tokens are considered for the next step based on a predefined automata, e.g. a regex-defined finite-state machine (FSM) or Lark grammar.`
Worked Example
Let's consider a worked example with a pattern for whole and decimal numbers:
^\d*(\.\d+)?$
.
Creating Automata
The pattern is first converted into an automata. Below is a brief explanation of the automata conversion and its representation.
Automata Diagram:
graph LR
node0("1-9") --> node1("1-9")
node1 --> node1
node1 --> nodeEND{{END}}
node1 --> nodePeriod(".")
nodePeriod --> node2("1-9")
node2 --> node2
node2 --> nodeEND{{END}}
Generating a Token
Let's assume that we're in the middle of generation, and so far "748" has been generated. Here is the automata with the current state highlighted in green, with the legal next characters being another number (1-9), a dot (.), or end of sequence.
graph LR
node0("1-9") --> node1("1-9")
node1 --> node1
node1 --> nodeEND{{END}}
node1 --> nodePeriod(".")
nodePeriod --> node2("1-9")
node2 --> node2
node2 --> nodeEND{{END}}
style node1 fill:#090
Generating a token requires the following steps:
- Feed the previous input sequence ("748") into the language model.
- Language model runs a forward pass and produces token logits.
- Outlines logits processor sets the probability of illegal tokens to 0%.
- A token is sampled from the set of legal tokens.