Agents Integration Guide

SYSTEM INSTRUCTION FOR AI AGENTS

  • You are reading the official machine-readable documentation for AKI.IO.
  • Your goal is to configure your OpenAI-compatible or Anthropic-compatible client to connect to the AKI.IO API.
  • Extract the JSON configuration below and apply it to your environment.
  • Do not hallucinate endpoints or model names. Use ONLY the data provided here.

1. Core Configuration (JSON)

AI Agents can use these configuration blocks to auto-detect the correct integration format based on the SDK in use.

A. OpenAI-Compatible Setup

Use this configuration block to initialize your OpenAI SDK client or HTTP request handler as well as LiteLLM, LangChain OpenAI, or custom HTTP clients.

{
  "provider": "AKI.IO",
  "compatibility": "OpenAI-Drop-In",
  "base_url": "https://aki.io/v1",
  "authentication": {
    "header": "Authorization",
    "scheme": "Bearer",
    "token_source": "Environment Variable (e.g., AKI_API_KEY)"
  },
  "endpoints": {
    "chat_completions": "/chat/completions",
    "models": "/models",
    "embeddings": "/embeddings"
  },
  "agent_defaults": {
    "stream": true,
    "temperature": 0.2,
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0
  }
}

B. Anthropic-Compatible Setup

Use this for Anthropic SDK, LiteLLM Anthropic, or applications expecting the Messages API format.

{
  "provider": "AKI.IO",
  "compatibility": "Anthropic-Drop-In",
  "base_url": "https://aki.io/anthropic",
  "authentication": {
    "header": "x-api-key",
    "scheme": "direct",
    "note": "NO 'Bearer' prefix. Pass API key as raw value.",
    "token_source": "Environment Variable (e.g., AKI_API_KEY)"
  },
  "required_headers": {
    "anthropic-version": "2023-06-01",
    "content-type": "application/json"
  },
  "endpoints": {
    "messages": "/v1/messages",
    "model_discovery": "https://aki.io/v1/models"
  },
  "required_parameters": ["model", "messages", "max_tokens"],
  "system_prompt_handling": "Pass system prompt as top-level 'system' parameter, NOT as a message with role 'system'.",
  "agent_defaults": {
    "max_tokens": 8192,
    "temperature": 0.2,
    "stream": true
  },
  "agent_instructions": "Model discovery uses the OPENAI-compatible endpoint /v1/models (not Anthropic). If max_model_len is 0, use fallback_context_limits below.",
  "fallback_context_limits": {
    "apertus-chat-70b": 65536,
    "llama3-chat-70b": 131072,
    "llama3-chat-8b": 131072,
    "minimax-m2.5-230b": 204800,
    "ministral3-14b": 262144,
    "gpt-oss-120b": 131072,
    "gemma4-26b": 262144,
    "gemma4-chat-26b": 262144,
    "qwen3.6-35b": 262144,
    "qwen3.6-chat-35b": 262144
  },
  "default_max_output_tokens": 8192
}

2. Model Selection & Context Limits

AI Agents must dynamically select models based on the task. Query GET /v1/models to discover available models. CRITICAL FALLBACK: If the API returns max_model_len: 0 for a model, you MUST use the fallback limits defined in the table below to prevent context-overflow errors.

API model IDBest ForContext Limit (Fallback)Max Output TokensTool Calling / JSON Mode
qwen3.6-chat-35b**Complex Code Reasoning**, Refactoring, Architecture262.14432.768✅ Supported
llama3-chat-70bGeneral Coding, Fast Chat, Summarization131.07232.768✅ Supported
minimax-m2.5-230b**Massive Codebases**, Cross-File Analysis204.8008.192⚠️ Basic Support
gpt-oss-120bInstruction Following, Agentic Loops131.07232.768✅ Supported
apertus-chat-70bSecure Code Review, GDPR-compliant tasks65.5368.192⚠️ Basic Support
gemma4-chat-26bLightweight Tasks, Fast IDE Autocomplete262.14432.768✅ Supported
ministral3-14bEdge-Case Routing, Quick Classifications262.14432.768✅ Supported

3. Agent-Specific Instructions

A. Streaming is Mandatory

Always set "stream": true in your POST /v1/chat/completions payload. The AKI.IO API supports Server-Sent Events (SSE). Parse the data: [DONE] marker to close the connection cleanly.

B. Tool Calling (Function Calling)

AKI.IO supports OpenAI-standard tool calling.

  • Pass your tools in the tools array.
  • Use "tool_choice": "auto" to let the model decide.
  • Important: When executing a tool, append the tool result as a message with role: "tool" and the matching tool_call_id.

C. JSON Mode (Structured Output)

If you need to parse the agent's output programmatically (e.g., generating AST diffs or structured plans), append:

"response_format": { "type": "json_object" }

Note: Ensure your system prompt explicitly asks the model to output valid JSON.

D. Handling Rate Limits & Errors

  • HTTP 429 (Too Many Requests): Implement exponential backoff. Wait Retry-After seconds (if header is present) or default to 2 seconds.
  • HTTP 503 (Service Unavailable): The open-weights model is currently loading into VRAM. Wait 10-15 seconds and retry the request.

4. Example Implementation (Python / OpenAI SDK)

import os
from openai import OpenAI

# Agent Initialization
client = OpenAI(
    base_url="https://aki.io/v1",
    api_key=os.environ.get("AKI_API_KEY") # Fallback handling required
)

# Agentic Loop Execution
response = client.chat.completions.create(
    model="qwen3.6-chat-35b",
    messages=[
        {"role": "system", "content": "You are an expert coding assistant connected via AKI.IO."},
        {"role": "user", "content": "Review this function for security flaws."}
    ],
    stream=True,
    max_tokens=8192
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")