Agents Integration Guide
SYSTEM INSTRUCTION FOR AI AGENTS
- You are reading the official machine-readable documentation for AKI.IO.
- Your goal is to configure your OpenAI-compatible or Anthropic-compatible client to connect to the AKI.IO API.
- Extract the JSON configuration below and apply it to your environment.
- Do not hallucinate endpoints or model names. Use ONLY the data provided here.
1. Core Configuration (JSON)
AI Agents can use these configuration blocks to auto-detect the correct integration format based on the SDK in use.
A. OpenAI-Compatible Setup
Use this configuration block to initialize your OpenAI SDK client or HTTP request handler as well as LiteLLM, LangChain OpenAI, or custom HTTP clients.
{
"provider": "AKI.IO",
"compatibility": "OpenAI-Drop-In",
"base_url": "https://aki.io/v1",
"authentication": {
"header": "Authorization",
"scheme": "Bearer",
"token_source": "Environment Variable (e.g., AKI_API_KEY)"
},
"endpoints": {
"chat_completions": "/chat/completions",
"models": "/models",
"embeddings": "/embeddings"
},
"agent_defaults": {
"stream": true,
"temperature": 0.2,
"presence_penalty": 0.0,
"frequency_penalty": 0.0
}
}B. Anthropic-Compatible Setup
Use this for Anthropic SDK, LiteLLM Anthropic, or applications expecting the Messages API format.
{
"provider": "AKI.IO",
"compatibility": "Anthropic-Drop-In",
"base_url": "https://aki.io/anthropic",
"authentication": {
"header": "x-api-key",
"scheme": "direct",
"note": "NO 'Bearer' prefix. Pass API key as raw value.",
"token_source": "Environment Variable (e.g., AKI_API_KEY)"
},
"required_headers": {
"anthropic-version": "2023-06-01",
"content-type": "application/json"
},
"endpoints": {
"messages": "/v1/messages",
"model_discovery": "https://aki.io/v1/models"
},
"required_parameters": ["model", "messages", "max_tokens"],
"system_prompt_handling": "Pass system prompt as top-level 'system' parameter, NOT as a message with role 'system'.",
"agent_defaults": {
"max_tokens": 8192,
"temperature": 0.2,
"stream": true
},
"agent_instructions": "Model discovery uses the OPENAI-compatible endpoint /v1/models (not Anthropic). If max_model_len is 0, use fallback_context_limits below.",
"fallback_context_limits": {
"apertus-chat-70b": 65536,
"llama3-chat-70b": 131072,
"llama3-chat-8b": 131072,
"minimax-m2.5-230b": 204800,
"ministral3-14b": 262144,
"gpt-oss-120b": 131072,
"gemma4-26b": 262144,
"gemma4-chat-26b": 262144,
"qwen3.6-35b": 262144,
"qwen3.6-chat-35b": 262144
},
"default_max_output_tokens": 8192
}2. Model Selection & Context Limits
AI Agents must dynamically select models based on the task. Query GET /v1/models to discover available models. CRITICAL FALLBACK: If the API returns max_model_len: 0 for a model, you MUST use the fallback limits defined in the table below to prevent context-overflow errors.
| API model ID | Best For | Context Limit (Fallback) | Max Output Tokens | Tool Calling / JSON Mode |
|---|---|---|---|---|
| qwen3.6-chat-35b | **Complex Code Reasoning**, Refactoring, Architecture | 262.144 | 32.768 | ✅ Supported |
| llama3-chat-70b | General Coding, Fast Chat, Summarization | 131.072 | 32.768 | ✅ Supported |
| minimax-m2.5-230b | **Massive Codebases**, Cross-File Analysis | 204.800 | 8.192 | ⚠️ Basic Support |
| gpt-oss-120b | Instruction Following, Agentic Loops | 131.072 | 32.768 | ✅ Supported |
| apertus-chat-70b | Secure Code Review, GDPR-compliant tasks | 65.536 | 8.192 | ⚠️ Basic Support |
| gemma4-chat-26b | Lightweight Tasks, Fast IDE Autocomplete | 262.144 | 32.768 | ✅ Supported |
| ministral3-14b | Edge-Case Routing, Quick Classifications | 262.144 | 32.768 | ✅ Supported |
3. Agent-Specific Instructions
A. Streaming is Mandatory
Always set "stream": true in your POST /v1/chat/completions payload. The AKI.IO API supports Server-Sent Events (SSE). Parse the data: [DONE] marker to close the connection cleanly.
B. Tool Calling (Function Calling)
AKI.IO supports OpenAI-standard tool calling.
- Pass your tools in the tools array.
- Use "tool_choice": "auto" to let the model decide.
- Important: When executing a tool, append the tool result as a message with role: "tool" and the matching tool_call_id.
C. JSON Mode (Structured Output)
If you need to parse the agent's output programmatically (e.g., generating AST diffs or structured plans), append:
"response_format": { "type": "json_object" }Note: Ensure your system prompt explicitly asks the model to output valid JSON.
D. Handling Rate Limits & Errors
- HTTP 429 (Too Many Requests): Implement exponential backoff. Wait Retry-After seconds (if header is present) or default to 2 seconds.
- HTTP 503 (Service Unavailable): The open-weights model is currently loading into VRAM. Wait 10-15 seconds and retry the request.
4. Example Implementation (Python / OpenAI SDK)
import os
from openai import OpenAI
# Agent Initialization
client = OpenAI(
base_url="https://aki.io/v1",
api_key=os.environ.get("AKI_API_KEY") # Fallback handling required
)
# Agentic Loop Execution
response = client.chat.completions.create(
model="qwen3.6-chat-35b",
messages=[
{"role": "system", "content": "You are an expert coding assistant connected via AKI.IO."},
{"role": "user", "content": "Review this function for security flaws."}
],
stream=True,
max_tokens=8192
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")