tokenizer
Tokenizer
Bases: Hashable
, Protocol
Source code in outlines/models/tokenizer.py
convert_token_to_string(token)
Convert a token to its equivalent string.
This is for instance useful for BPE tokenizers where whitespaces are
represented by the special characted Ġ
. This prevents matching a raw
token that includes Ġ
with a string.
Source code in outlines/models/tokenizer.py
decode(token_ids)
encode(prompt)
Translate the input prompts into arrays of token ids and attention mask.