Classification
Classification is a classic problem in NLP and finds many applications: spam detection, sentiment analysis, triaging of incoming requests, etc. We will use the example of a company that wants to sort support requests between those that require immediate attention (URGENT
), those that can wait a little (STANDARD
). You could easily extend the example by adding new labels.
This tutorial shows how one can implement multi-label classification using Outlines.
As always, we start with initializing the model. Since we are GPU poor we will be using a quantized version of Mistal-7B-v0.1:
import outlines
import transformers
MODEL_NAME = "TheBloke/Mistral-7B-OpenOrca-AWQ"
model = outlines.from_transformers(
transformers.AutoModelForCausalLM.from_pretrained(MODEL_NAME),
transformers.AutoTokenizer.from_pretrained(MODEL_NAME)
)
We will use a prompt template stored in a text file:
from outlines import Template
customer_support = Template.from_file("prompt_templates/classification.txt")
Choosing between multiple choices
Outlines provides a convenient way to do multi-label classification, passing a Literal type hint to the outlines.Generator
object:
from typing import Literal
import outlines
generator = outlines.Generator(model, Literal["URGENT", "STANDARD"])
requests = [
"My hair is one fire! Please help me!!!",
"Just wanted to say hi"
]
prompts = [customer_support(request=request) for request in requests]
We can now ask the model to classify the requests:
Using JSON-structured generation
Another (convoluted) way to do multi-label classification is to JSON-structured generation in Outlines. We first need to define our Pydantic schema that contains the labels:
from enum import Enum
from pydantic import BaseModel
class Label(str, Enum):
urgent = "URGENT"
standard = "STANDARD"
class Classification(BaseModel):
label: Label
We can then create a generator with the Pydantic model we just defined and call it: