Skip to content

Knowledge Graph Extraction

In this guide, we use outlines to extract a knowledge graph from unstructured text.

We will use llama.cpp using the llama-cpp-python library. Outlines supports llama-cpp-python, but we need to install it ourselves:

pip install llama-cpp-python

We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):

import llama_cpp
from outlines import generate, models

model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF",

(Optional) Store the model weights in a custom folder

By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model Hermes-2-Pro-Llama-3-8B by NousResearch from HuggingFace:


We initialize the model:

import llama_cpp
from llama_cpp import Llama
from outlines import generate, models

llm = Llama(

Knowledge Graph Extraction

We first need to define our Pydantic class for each node and each edge of the knowledge graph:

from pydantic import BaseModel, Field

class Node(BaseModel):
    """Node of the Knowledge Graph"""

    id: int = Field(..., description="Unique identifier of the node")
    label: str = Field(..., description="Label of the node")
    property: str = Field(..., description="Property of the node")

class Edge(BaseModel):
    """Edge of the Knowledge Graph"""

    source: int = Field(..., description="Unique source of the edge")
    target: int = Field(..., description="Unique target of the edge")
    label: str = Field(..., description="Label of the edge")
    property: str = Field(..., description="Property of the edge")

We then define the Pydantic class for the knowledge graph and get its JSON schema:

from typing import List

class KnowledgeGraph(BaseModel):
    """Generated Knowledge Graph"""

    nodes: List[Node] = Field(..., description="List of nodes of the knowledge graph")
    edges: List[Edge] = Field(..., description="List of edges of the knowledge graph")

schema = KnowledgeGraph.model_json_schema()

We then need to adapt our prompt to the Hermes prompt format for JSON schema:

def generate_hermes_prompt(user_prompt):
    return (
        "You are a world class AI model who answers questions in JSON "
        f"Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema><|im_end|>\n"
        + user_prompt
        + "<|im_end|>"
        + "\n<|im_start|>assistant\n"

For a given user prompt, for example:

user_prompt = "Alice loves Bob and she hates Charlie."

We can use generate.json by passing the Pydantic class we previously defined, and call the generator with the Hermes prompt:

from outlines import generate, models

model = models.LlamaCpp(llm)
generator = generate.json(model, KnowledgeGraph)
prompt = generate_hermes_prompt(user_prompt)
response = generator(prompt, max_tokens=1024, temperature=0, seed=42)

We obtain the nodes and edges of the knowledge graph:

# [Node(id=1, label='Alice', property='Person'),
# Node(id=2, label='Bob', property='Person'),
# Node(id=3, label='Charlie', property='Person')]
# [Edge(source=1, target=2, label='love', property='Relationship'),
# Edge(source=1, target=3, label='hate', property='Relationship')]

(Optional) Visualizing the Knowledge Graph

We can use the Graphviz library to visualize the generated knowledge graph. For detailed installation instructions, see here.

from graphviz import Digraph

dot = Digraph()
for node in response.nodes:
    dot.node(str(, node.label, shape='circle', width='1', height='1')
for edge in response.edges:
    dot.edge(str(edge.source), str(, label=edge.label)

dot.render('knowledge-graph.gv', view=True)

Image of the Extracted Knowledge Graph

This example was originally contributed by Alonso Silva.