Gemini
Installation
You need to install the google.genai
libray to be able to use the Gemini API in Outlines. Install all optional dependencies of the Gemini
model with: pip install outlines[gemini]
.
You also need to have a Gemini API key. This API key must either be set as an environment variable called GEMINI_API_KEY
or be provided to the google.genai.Client
class when instantiating it.
Model Initialization
To create a Gemini model instance, you can use the from_gemini
function. It takes 2 arguments:
client
: agoogle.genai.Client
instancemodel_name
: the name of the model you want to use in subsequent model calls (optional)
For instance:
import outlines
from google import genai
# Create the client
client = genai.Client()
# Create the model
model = outlines.from_gemini(
client,
"gemini-1.5-flash-latest"
)
Check the Gemini documentation for an up-to-date list of available models.
Text Generation
Once you've created your Outlines Gemini
model instance, you're all set to generate text with this provider. You can simply call the model with a prompt.
For instance:
import outlines
from google.genai import Client
# Create the model
model = outlines.from_gemini(
Client(),
"gemini-1.5-flash-latest"
)
# Call it to generate text
result = model("What's the capital of Latvia?", max_output_tokens=20)
print(result) # 'Riga'
Vision
Some Gemini models support vision input. To use this feature, provide a list containing a text prompt and Image
instances.
For instance:
import io
import requests
import PIL
import outlines
from google.genai import Client
from outlines.inputs import Image
# Create the model
model = outlines.from_gemini(
Client(),
"gemini-1.5-flash-latest"
)
# Function to get an image
def get_image(url):
r = requests.get(url)
return PIL.Image.open(io.BytesIO(r.content))
# Create the prompt containing the text and the image
prompt = [
"Describe the image",
Image(get_image("https://picsum.photos/id/237/400/300"))
]
# Call the model to generate a response
response = model(prompt, max_output_tokens=50)
print(response) # 'This is a picture of a black dog.'
Chat
You can also use chat inputs with the Gemini
model. To do so, call the model with a Chat
instance. The content of messsage within the chat can be vision inputs as described above.
For instance:
import io
import requests
import PIL
import outlines
from google.genai import Client
from outlines.inputs import Chat, Image
# Create the model
model = outlines.from_gemini(
Client(),
"gemini-1.5-flash-latest"
)
# Function to get an image
def get_image(url):
r = requests.get(url)
return PIL.Image.open(io.BytesIO(r.content))
# Create the chat input
prompt = Chat([
{"role": "user", "content": "You are a helpful assistant that helps me described pictures."},
{"role": "assistant", "content": "I'd be happy to help you describe pictures! Please go ahead and share an image"},
{
"role": "user",
"content": ["Describe the image", Image(get_image("https://picsum.photos/id/237/400/300"))]
},
])
# Call the model to generate a response
response = model(prompt, max_output_tokens=50)
print(response) # 'This is a picture of a black dog.'
Streaming
Finally, the Gemini
model supports streaming through the stream
method.
For instance:
import outlines
from google.genai import Client
# Create the model
model = outlines.from_gemini(
Client(),
"gemini-1.5-flash-latest"
)
# Stream text
for chunk in model.stream("Write a short story about a cat.", max_output_tokens=20):
print(chunk) # 'In...'
Structured Generation
Gemini provides supports for some forms of structured output: multiple choice, JSON schema (with caveats) and lists of structured objects. To use it, call the model with an output_type
on top of your prompt.
Multiple Choice
import outlines
from google import genai
from enum import Enum
class PizzaOrBurger(Enum):
pizza = "pizza"
burger = "burger"
# Create the model
model = outlines.from_gemini(genai.Client(), "gemini-1.5-flash-latest")
# Call it with the ouput type to generate structured text
result = model("Pizza or burger?", PizzaOrBurger, max_output_tokens=20)
print(result) # 'pizza'
JSON Schema
Gemini supports only three types of objects used to define a JSON Schema:
- Pydantic classes
- Dataclasses
- TypedDicts
from typing import List
from pydantic import BaseModel
from google import genai
import outlines
class Character(BaseModel):
name: str
age: int
skills: List[str]
# Create the model
model = outlines.from_gemini(genai.Client(), "gemini-1.5-flash-latest")
# Call it with the ouput type to generate structured text
result = model("Create a character", Character)
print(result) # '{"name": "Evelyn", "age": 34, "skills": ["archery", "stealth", "alchemy"]}'
print(Character.model_validate_json(result)) # name=Evelyn, age=34, skills=['archery', 'stealth', 'alchemy']
Lists of Structured Objects
A specificity of Gemini is that, despite not supporting regex, it does support a list of structured objects as an output type. To use it, put any of three available types described above in the typing List
class
from dataclasses import dataclass
from google import genai
import outlines
@dataclass
class Character:
name: str
age: int
skills: List[str]
# Create the model
model = outlines.from_gemini(genai.Client(), "gemini-1.5-flash-latest")
# Call it with the ouput type to generate structured text
result = model("Create a character", list[Character])
print(result) # '[{"name": "Evelyn", "age": 34, "skills": ["archery", "stealth", "alchemy"]}, {["name":...'
Attention
The structured objects must be in a built-in list
, not a List
from the typing
library
Inference arguments
You can provide the same optional parameters you would pass to the google.genai.Client
client both during the initialization of the Gemini model and when generating text. Some of the common inference arguments include max_output_tokens
, temperature
, and other generation parameters.
Consult the Google Generative AI documentation for the full list of parameters.