Quickstart

After installing Outlines, the fastest way to get to up to speed with the library is to get acquainted with its few core elements. We advise you to take a quick look at this page to see everything Outlines has to offer before diving in the documentation.

Core elements

Models

The first step when writing a program with Outlines is to initialize a model. Weights will be loaded on the device at this step:

import outlines

model = outlines.models.transformers(
    "microsoft/Phi-3-mini-4k-instruct",
    device="cuda"  # optional device argument, default is cpu
)

Outlines supports a wide variety of inference engines and model weight types. More details on different models can be found in the Outlines Models documentation page.

Generation

Once the model is initialized you can build an outlines.generate generator. This generator can be called with a prompt directly.

(Outlines Structured Generation Full Documentation)

TextStructured

generator = outlines.generate.text(model)

result = generator("Question: What's 2+2? Answer:", max_tokens=100)
print(result)
# The answer is 4

# Outlines also supports streaming output
stream = generator.stream("What's 2+2?", max_tokens=4)
for i in range(5):
    token = next(stream)
    print(repr(token))
# '2'
# '+'
# '2'
# ' equals'
# '4'

Along with typical language model generation behavior via, outlines.generate.text, Outlines supports structured generation, which guarantees the tokens generated by the model will follow a predefined structure. Structures can be defined by a regex pattern, JSON schema, python object type, or a Lark grammar defining a parsable language such as SQL or Python.

Example: using pydantic to enforce a JSON schema

from enum import Enum
from pydantic import BaseModel, constr, conint

class Character(BaseModel):
    name: constr(max_length=10)
    age: conint(gt=18, lt=99)
    armor: (Enum('Armor', {'leather': 'leather', 'chainmail': 'chainmail', 'plate': 'plate'}))
    strength: conint(gt=1, lt=100)

generator = outlines.generate.json(model, Character)

character = generator(
    "Generate a new character for my awesome game: "
    + "name, age (between 1 and 99), armor and strength. "
    )
print(character)
# Character(name='Zara', age=25, armor=<Armor.leather: 'leather'>, strength=85)

Deploy using vLLM and FastAPI

Outlines can be deployed as a LLM service using vLLM and FastAPI. The server supports asynchronous processing of incoming requests, and benefits from the performance of vLLM.

First start the server:

python -m outlines.serve.serve --model="microsoft/Phi-3-mini-4k-instruct"

Or you can start the server with Outlines' official Docker image:

docker run -p 8000:8000 outlinesdev/outlines --model="microsoft/Phi-3-mini-4k-instruct"

This will by default start a server at http://127.0.0.1:8000 (check what the console says, though). Without the --model argument set, the OPT-125M model is used.

You can then query the model in shell by passing a prompt and a JSON Schema specification for the structure of the output:

curl http://127.0.0.1:8000/generate \
    -d '{
        "prompt": "Question: What is a language model? Answer:",
        "schema": {"type": "string"}
        }'

Or use the requests library from another python program. You can read the vLLM documentation for more details.

Utilities

Prompt templates

Prompting can lead to messy code. Outlines' prompt functions are python functions that contain a template for the prompt in their docstring. We use a powerful templating language to allow you to loop over lists, dictionaries, add conditionals, etc. directly from the prompt. When called, a prompt function returns the rendered template:

import outlines

@outlines.prompt
def few_shots(instructions, examples, question):
    """{{ instructions }}

    Examples
    --------

    {% for example in examples %}
    Q: {{ example.question }}
    A: {{ example.answer }}

    {% endfor %}
    Question
    --------

    Q: {{ question }}
    A:
    """

instructions = "Please answer the following question following the examples"
examples = [
    {"question": "2+2=?", "answer":4},
    {"question": "3+3=?", "answer":6}
]
question = "4+4 = ?"

prompt = few_shots(instructions, examples, question)
print(prompt)
# Please answer the following question following the examples

# Examples
# --------

# Q: 2+2=?
# A: 4

# Q: 3+3=?
# A: 6

# Question
# --------

# Q: 4+4 = ?
# A:

Outlines functions

Once you are done experimenting with a prompt and an output structure, it is useful to be able to encapsulate all of these in a single function that can be called from other parts of the program. This is what outlines.Function allows you to do:

function.pyCall a functionCall a function stored on GitHub

from pydantic import BaseModel

import outlines


@outlines.prompt
def tell_a_joke(topic):
    """Tell me a joke about {{ topic }}."""

class Joke(BaseModel):
    setup: str
    punchline: str

generate_joke = outlines.Function(
    tell_a_joke,
    Joke,
    "microsoft/Phi-3-mini-4k-instruct"
)

from .function import generate_joke

response = generate_joke("baseball")

# haha
# Joke(setup='Why was the baseball in a bad mood?', punchline='Because it got hit around a lot.')

You can load a function that is stored on a repository on GitHub directly from Outlines. Say Someone stores a function in joke.py at the root of the TheirRepo repository:

import outlines

joke = outlines.Function.from_github("Someone/TheirRepo/joke")
response = joke("baseball")

It make it easier for the community to collaborate on the infinite number of use cases enabled by these models!

Going further

If you need more inspiration you can take a look at the cookbook or watch Remi Louf's AI Engineer World’s Fair Presentation on Outlines. If you have any question, or requests for documentation please reach out to us on GitHub, Twitter or Discord.