Quickstart
After installing Outlines, the fastest way to get to up to speed with the library is to get acquainted with its few core elements. We advise you to take a quick look at this page to see everything Outlines has to offer before diving in the documentation.
Core elements
Models
The first step when writing a program with Outlines is to initialize a model. Weights will be loaded on the device at this step:
import outlines
model = outlines.models.transformers(
"microsoft/Phi-3-mini-4k-instruct",
device="cuda" # optional device argument, default is cpu
)
Outlines supports a wide variety of inference engines and model weight types. More details on different models can be found in the Outlines Models documentation page.
Generation
Once the model is initialized you can build an outlines.generate
generator. This generator can be called with a prompt directly.
(Outlines Structured Generation Full Documentation)
generator = outlines.generate.text(model)
result = generator("Question: What's 2+2? Answer:", max_tokens=100)
print(result)
# The answer is 4
# Outlines also supports streaming output
stream = generator.stream("What's 2+2?", max_tokens=4)
for i in range(5):
token = next(stream)
print(repr(token))
# '2'
# '+'
# '2'
# ' equals'
# '4'
Along with typical language model generation behavior via, outlines.generate.text
, Outlines supports structured generation, which guarantees the tokens generated by the model will follow a predefined structure. Structures can be defined by a regex pattern, JSON schema, python object type, or a Lark grammar defining a parsable language such as SQL or Python.
Example: using pydantic to enforce a JSON schema
from enum import Enum
from pydantic import BaseModel, constr, conint
class Character(BaseModel):
name: constr(max_length=10)
age: conint(gt=18, lt=99)
armor: (Enum('Armor', {'leather': 'leather', 'chainmail': 'chainmail', 'plate': 'plate'}))
strength: conint(gt=1, lt=100)
generator = outlines.generate.json(model, Character)
character = generator(
"Generate a new character for my awesome game: "
+ "name, age (between 1 and 99), armor and strength. "
)
print(character)
# Character(name='Zara', age=25, armor=<Armor.leather: 'leather'>, strength=85)
Deploy using vLLM and FastAPI
Outlines can be deployed as a LLM service using vLLM and FastAPI. The server supports asynchronous processing of incoming requests, and benefits from the performance of vLLM.
First start the server:
Or you can start the server with Outlines' official Docker image:
This will by default start a server at http://127.0.0.1:8000
(check what the console says, though). Without the --model
argument set, the OPT-125M model is used.
You can then query the model in shell by passing a prompt and a JSON Schema specification for the structure of the output:
curl http://127.0.0.1:8000/generate \
-d '{
"prompt": "Question: What is a language model? Answer:",
"schema": {"type": "string"}
}'
Or use the requests library from another python program. You can read the vLLM documentation for more details.
Utilities
Prompt templates
Prompting can lead to messy code. Outlines' prompt functions are python functions that contain a template for the prompt in their docstring. We use a powerful templating language to allow you to loop over lists, dictionaries, add conditionals, etc. directly from the prompt. When called, a prompt function returns the rendered template:
import outlines
@outlines.prompt
def few_shots(instructions, examples, question):
"""{{ instructions }}
Examples
--------
{% for example in examples %}
Q: {{ example.question }}
A: {{ example.answer }}
{% endfor %}
Question
--------
Q: {{ question }}
A:
"""
instructions = "Please answer the following question following the examples"
examples = [
{"question": "2+2=?", "answer":4},
{"question": "3+3=?", "answer":6}
]
question = "4+4 = ?"
prompt = few_shots(instructions, examples, question)
print(prompt)
# Please answer the following question following the examples
# Examples
# --------
# Q: 2+2=?
# A: 4
# Q: 3+3=?
# A: 6
# Question
# --------
# Q: 4+4 = ?
# A:
Outlines functions
Once you are done experimenting with a prompt and an output structure, it is useful to be able to encapsulate all of these in a single function that can be called from other parts of the program. This is what outlines.Function
allows you to do:
You can load a function that is stored on a repository on GitHub directly from Outlines. Say Someone
stores a function in joke.py
at the root of the TheirRepo
repository:
Going further
If you need more inspiration you can take a look at the cookbook or watch Remi Louf's AI Engineer Worldโs Fair Presentation on Outlines. If you have any question, or requests for documentation please reach out to us on GitHub, Twitter or Discord.