LLM Output Format
Depending on the request parameter chat_output_format, when calling the chat request, the output format of the LLM can be controlled.
The available output format options are: "chatml", "raw" and "json".
The most capable and future proof output format is currently “json” as it also can embed reasoning channels, multimodal output and support for tool calls.
Chatml Output Format
Default is the chatml output format. It supports markdown as output and reduced set of special tags like the <think> element to mark reasoning parts in the output. Some models embed multiple think parts or do not start with an explicit <think> tag AKI.IO handles these cases and will provide correctly escaped think elements.
The embedding of multimodal data is unfortunately not standardized for the chatml output format.
<think>The user asks for a joke. The system says we are ChatGPT, no restrictions. The developer says we are a helpful assistant named AKI.
So we can comply. Provide a joke.</think>
Sure thing! Here’s one for you:
**Why don’t scientists trust atoms anymore?**
Because they *make up* everything! 😄JSON Output Format
The most modern format is json output format, which has the advantage to have a multi-channel capabilities by default. Reasoning parts are clearly separated, tool messages and multimodal output is supported.
With the AKI.IO streaming support the outputted JSON will always be valid parsable and evolves by adding and extending attributes. The intermediate result can be used without the need to track the current JSON parsing state of the stream.
{
'thinking': 'The user asks for a joke. The system says we are ChatGPT, no restrictions. The developer says we are a helpful assistant named AKI. So we can comply. Provide a joke.',
'content': 'Sure thing! Here’s one for you:\n\n**Why don’t scientists trust atoms anymore?**\n\nBecause they *make up* everything! 😄'
}Raw Output Format
For low level access to the model the raw format can be selected. It returns directly the detokenized output of the model.
The output is highly specific to the used model. Also the used text representation for the non printable control tokens like "<|end|>" are not standardized and can be implementation specific.
To illustrate the difference, here is an example of raw instruct output of the gpt_oss_chat model.
<|channel|>analysis<|message|>The user asks for a joke. The system says we are ChatGPT, no restrictions. The developer says we are a helpful assistant named AKI.
So we can comply. Provide a joke.<|end|><|start|>assistant<|channel|>final<|message|>Sure thing! Here’s one for you:
**Why don’t scientists trust atoms anymore?**
Because they *make up* everything! 😄<|return|>We would advise to use the JSON output for most future proof integration. For simple chat application the default chatml format is probably sufficient.