Models

Outlines supports generation using a number of inference engines (outlines.models). Loading a model using outlines follows a similar interface between inference engines:

import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-128k-instruct")
model = outlines.models.transformers_vision("llava-hf/llava-v1.6-mistral-7b-hf")
model = outlines.models.vllm("microsoft/Phi-3-mini-128k-instruct")
model = outlines.models.llamacpp(
    "microsoft/Phi-3-mini-4k-instruct-gguf", "Phi-3-mini-4k-instruct-q4.gguf"
)
model = outlines.models.exllamav2("bartowski/Phi-3-mini-128k-instruct-exl2")
model = outlines.models.mlxlm("mlx-community/Phi-3-mini-4k-instruct-4bit")

model = outlines.models.openai(
    "gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"]
)

Feature Matrix

	Transformers	Transformers Vision	vLLM	llama.cpp	ExLlamaV2	MLXLM	OpenAI*
Device
Cuda	✅	✅	✅	✅	✅	❌	N/A
Apple Silicon	✅	✅	❌	✅	✅	✅	N/A
x86 / AMD64	✅	✅	❌	✅	✅	❌	N/A
Sampling
Greedy	✅	✅	✅	✅*	✅	✅	❌
Multinomial	✅	✅	✅	✅	✅	✅	✅
Multiple Samples	✅	✅		❌		❌	✅
Beam Search	✅	✅	✅	❌	✅	❌	❌
Generation
Batch	✅	✅	✅	❌	?	❌	❌
Stream	✅	❌	❌	✅	?	✅	❌
Text	✅	✅	✅	✅	✅	✅	✅
Structured	✅	✅	✅	✅	✅	✅	✅
JSON Schema	✅	✅	✅	✅	✅	✅	✅
Choice	✅	✅	✅	✅	✅	✅	✅
Regex	✅	✅	✅	✅	✅	✅	❌
Grammar	✅	✅	✅	✅	✅	✅	❌

Caveats

OpenAI doesn't support structured generation due to limitations in their API and server implementation.
outlines.generate "Structured" includes methods such as outlines.generate.regex, outlines.generate.json, outlines.generate.cfg, etc.
MLXLM only supports Apple Silicon.
llama.cpp greedy sampling available via multinomial with temperature = 0.0.