mlx-lm
Outlines provides an integration with mlx-lm, allowing models to be run quickly on Apple Silicon via the mlx library.
Installation
You need to install the mlx
and mlx-lm
libraries on a device which supports Metal to use the mlx-lm integration.
Load the model
You can initialize the model by passing the name of the repository on the HuggingFace Hub. The official repository for mlx-lm supported models is mlx-community.
This will download the model files to the hub cache folder and load the weights in memory.
The arguments model_config
and tokenizer_config
are available to modify loading behavior. For example, per the mlx-lm
documentation, you must set an eos_token for qwen/Qwen-7B
. In outlines you may do so via
model = models.mlxlm(
"mlx-community/Meta-Llama-3.1-8B-Instruct-8bit",
tokenizer_config={"eos_token": "<|endoftext|>", "trust_remote_code": True},
)
Main parameters:
(Subject to change. Table based on mlx-lm.load docstring)
Parameters | Type | Description | Default |
---|---|---|---|
tokenizer_config |
dict |
Configuration parameters specifically for the tokenizer. Defaults to an empty dictionary. | {} |
model_config |
dict |
Configuration parameters specifically for the model. Defaults to an empty dictionary. | {} |
adapter_path |
str |
Path to the LoRA adapters. If provided, applies LoRA layers to the model. | None |
lazy |
bool |
If False, evaluate the model parameters to make sure they are loaded in memory before returning. | False |
Generate text
You may generate text using the parameters described in the text generation documentation.
With the loaded model, you can generate text or perform structured generation, e.g.
from outlines import models, generate
model = models.mlxlm("mlx-community/Meta-Llama-3.1-8B-Instruct-8bit")
generator = generate.text(model)
answer = generator("A prompt", temperature=2.0)
Streaming
You may creating a streaming iterable with minimal changes
from outlines import models, generate
model = models.mlxlm("mlx-community/Meta-Llama-3.1-8B-Instruct-8bit")
generator = generate.text(model)
for token_str in generator.text("A prompt", temperature=2.0):
print(token_str)
Structured
You may perform structured generation with mlxlm to guarantee your output will match a regex pattern, json schema, or lark grammar.
Example: Phone number generation with pattern "\\+?[1-9][0-9]{7,14}"
: