vllm_offline
Integration with the vllm
library (offline mode).
VLLMOffline
Bases: Model
Thin wrapper around a vllm.LLM
model.
This wrapper is used to convert the input and output types specified by the
users at a higher level to arguments to the vllm.LLM
model.
Source code in outlines/models/vllm_offline.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
__init__(model)
Create a VLLM model instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
LLM
|
A |
required |
Source code in outlines/models/vllm_offline.py
generate(model_input, output_type=None, **inference_kwargs)
Generate text using vLLM.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prompt
|
The prompt based on which the model will generate a response. |
required | |
output_type
|
Optional[Any]
|
The logits processor the model will use to constrain the format of the generated text. |
None
|
inference_kwargs
|
Any
|
Additional keyword arguments to pass to the |
{}
|
Returns:
Type | Description |
---|---|
Union[str, List[str], List[List[str]]]
|
The text generated by the model. |
Source code in outlines/models/vllm_offline.py
generate_stream(model_input, output_type, **inference_kwargs)
Not available for vllm.LLM
.
TODO: Implement the streaming functionality ourselves.
Source code in outlines/models/vllm_offline.py
load_lora(adapter_path)
Load a LoRA adapter. Deprecated since v1.0.0.
Use the lora_request
argument when calling the model or generator
instead.
Source code in outlines/models/vllm_offline.py
VLLMOfflineTypeAdapter
Bases: ModelTypeAdapter
Type adapter for the VLLMOffline
model.
Source code in outlines/models/vllm_offline.py
format_input(model_input)
Generate the prompt argument to pass to the model.
Argument
model_input The input passed by the user.
Source code in outlines/models/vllm_offline.py
format_output_type(output_type=None)
Generate the structured output argument to pass to the model.
For vLLM, the structured output definition is set in the
GuidedDecodingParams
constructor that is provided as a value to the
guided_decoding
parameter of the SamplingParams
constructor, itself
provided as a value to the sampling_params
parameter of the generate
method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_type
|
Optional[Any]
|
The structured output type provided. |
None
|
Returns:
Type | Description |
---|---|
dict
|
The arguments to provide to the |
Source code in outlines/models/vllm_offline.py
from_vllm_offline(model)
Create an Outlines VLLMOffline
model instance from a vllm.LLM
instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
LLM
|
A |
required |
Returns:
Type | Description |
---|---|
VLLMOffline
|
An Outlines |