We provide an official AKI.IO Python Client Pip package that wraps all available AKI.IO models and streaming functionality in an easy to use but powerful Python interface.

Install the aki-io PIP

Installation of the aki-io PIP package is done by following command line.

pip install "git+https://github.com/aki-io-labs/aki-io.git#subdirectory=python"

The PIP package requires Python version 3.8 or higher. The aki-io pip only depends on the requests and aiohttp pips for doing synchronous or respectively asynchronous HTTPS requests depending on the interface you prefer. They additional pips will be automatically installed in case they are not already available in your Python environment.

Note: currently the installation is through a Github Repository, there will be soon an offical AKI.IO Pip package release through PyPi.

Simple LLM Chat Example

This is the simplest way to call the AKI.IO API. Fill in the input parameters in a Dict object, call the API and get the results. The example show a chat request with an instruct chat context to the LLama3 8b Chat endpoint.

from aki_io import Aki

aki = Aki('llama3_8b_chat', 'fc3a8c50-b12b-4d6a-ba07-c9f6a6c32c37')

chat_context = [
    {"role": "system", "content": "You are a helpful assistant named AKI."},
    {"role": "assistant", "content": "How can I help you today?"},
    {"role": "user", "content": "Tell me a joke"},
]

params = {
    "chat_context": chat_context,
    "top_k": 40,
    "top_p": 0.9,
    "temperature": 0.8,
    "max_gen_tokens": 1000,
}

    
result = aki.do_api_request(params) # Do the API call and wait for result
if result['success']:
    print("API JSON response:\n", result)
    print("\nChat response:\n", result['text'])
    print("\nGenerated Tokens:", result['num_generated_tokens'], )
else:
    print("API Error:", result.get('error_code'), "-", result.get('error'))

LLM Streaming Chat Example with Callbacks

To use the streaming function a simple way is to use aki-io PIP callback mechanism. The progress_callback(...) function is called as soon as new chat output was received. The main aki.do_api_request(...) call blocks until all data was received and the function returns with the final result. This is an easy way to provide progress updates to the user without having to use the Python Async framework.

from aki_io import Aki

aki = Aki('llama3_8b_chat', 'fc3a8c50-b12b-4d6a-ba07-c9f6a6c32c37')

chat_context = [
    {"role": "system", "content": "You are a helpful assistant named AKI."},
    {"role": "assistant", "content": "How can I help you today?"},
    {"role": "user", "content": "Tell me a funny store with more than 100 words"},
]

params = {
    "chat_context": chat_context,
    "top_k": 40,
    "top_p": 0.9,
    "temperature": 0.8,
    "max_gen_tokens": 1000,
}

output_position = 0

def progress_callback(progress, progress_data):
    global output_position
    if progress_data and 'text' in progress_data:
        text = progress_data.get('text')
        print(text[output_position:], end='', flush=True)
        output_position = len(text)

result = aki.do_api_request(params, progress_callback) # Do the API call and stream the output
if result['success']:
    print("\nAPI JSON response:\n", result)
    print("\nFinal Chat response:\n", result['text'])
    print("\nGenerated Tokens:", result['num_generated_tokens'], )
else:
    print("API Error:", result.get('error_code'), "-", result.get('error'))

LLM Chat with Async Callbacks

To run a chat stream in the background while continue to process other requests or to run multiple chat requests the aki-io PIP supports Pythons asyncio interface. The API request can be started with asyncio.create_task(…) function. All result processing and error handling is done through the callback functions.

import asyncio
from aki_io import Aki

aki = Aki('llama3_8b_chat', 'fc3a8c50-b12b-4d6a-ba07-c9f6a6c32c37')

chat_context = [
    {"role": "system", "content": "You are a helpful assistant named AKI."},
    {"role": "assistant", "content": "How can I help you today?"},
    {"role": "user", "content": "Tell me a funny store with more than 100 words"},
]

params = {
    "chat_context": chat_context,
    "top_k": 40,
    "top_p": 0.9,
    "temperature": 0.8,
    "max_gen_tokens": 4000,
}

output_position = 0

def progress_callback(progress, progress_data):
    global output_position
    if progress_data and 'text' in progress_data:
        text = progress_data.get('text')
        print(text[output_position:], end='', flush=True)
        output_position = len(text)

def result_callback(result):
    if result['success']:
        print(result.get('text')[output_position:], end='', flush=True)    
        print("\n\nGenerated Tokens:", result['num_generated_tokens'], )
    else:
        print("API Error:", result.get('error_code'), "-", result.get('error'))            


# use create_task() instead of run() to fire in background
asyncio.run(aki.do_api_request_async(
        params,
        result_callback,
        progress_callback # optional
    )) 


asyncio.run(aki.close_session())

Most Pythonic LLM Streaming with Generator Interface

The most "Pythonic" solution for doing streaming with the aki-io Pip is the Python generator pattern. The results are read like a file stream from the generator requests whereby the return types of the iterator can vary depending on the state of the stream. The results differ in tuples of progress_info and progress_data objects and then the final result dictionary.

All processing logic is nicely kept within a single loop. The generator can be of course run in an async function to run multiple generators in parallel.

import asyncio
from aki_io import Aki

aki = Aki('llama3_8b_chat', 'fc3a8c50-b12b-4d6a-ba07-c9f6a6c32c37')

chat_context = [
    {"role": "system", "content": "You are a helpful assistant named AKI."},
    {"role": "assistant", "content": "How can I help you today?"},
    {"role": "user", "content": "Tell me a funny store with more than 100 words"},
]

params = {
    "chat_context": chat_context,
    "top_k": 40,
    "top_p": 0.9,
    "temperature": 0.8,
    "max_gen_tokens": 1000
}


async def do_example_request():
    output_generator = aki.get_api_request_generator(params)
    
    try:
        async for result in output_generator:
            if isinstance(result, tuple) and len(result) == 2:
                progress_info, progress_data = result
                print(f"Progress: {progress_info} - {progress_data}")
            else:
                print(f"Result: {result}")
    except Exception as e:
        print(f"Error occurred: {e}")


# use create_task() instead of run() to fire in background
asyncio.run(do_example_request()) 

asyncio.run(aki.close_session())

The above examples demonstrate the available patterns how to use the aki-io PIP. Use the pattern that suits your use case best.

The same patterns can be used on all type of AKI.IO endpoints only the send input params and return progress_data and output data will depend and differ on the type of the endpoint.

The expected input parameters and resulting output parameters are described in detail for LLMs here and Image Generators here. More endpoint types will be soon available.