API Reference
MerlionOS Inference exposes an OpenAI-compatible API, so any client that works with the OpenAI API works with MerlionOS Inference.
Start the Server
Section titled “Start the Server”# In the MerlionOS shell:merlion> ai-serve 8080
# Or configure the port:merlion> ai-serve 3000Endpoints
Section titled “Endpoints”Chat Completions
Section titled “Chat Completions”POST /v1/chat/completionsGenerate a chat response from a conversation.
curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "smollm-135m-q4", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is Singapore?"} ], "temperature": 0.7, "max_tokens": 256 }'Response:
{ "id": "chatcmpl-merlion", "object": "chat.completion", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "Singapore is a city-state..." }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 24, "completion_tokens": 64, "total_tokens": 88 }}Text Completions
Section titled “Text Completions”POST /v1/completionsGenerate text from a prompt.
curl http://localhost:8080/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "smollm-135m-q4", "prompt": "The capital of France is", "max_tokens": 32, "temperature": 0.0 }'List Models
Section titled “List Models”GET /v1/modelscurl http://localhost:8080/v1/models{ "object": "list", "data": [{ "id": "smollm-135m-q4", "object": "model", "owned_by": "merlionos" }]}Health Check
Section titled “Health Check”GET /healthcurl http://localhost:8080/health{ "status": "healthy", "uptime_seconds": 42}Prometheus Metrics
Section titled “Prometheus Metrics”GET /metricscurl http://localhost:8080/metrics# HELP merlionos_uptime_seconds System uptimemerlionos_uptime_seconds 42# HELP merlionos_heap_used_bytes Heap memory usedmerlionos_heap_used_bytes 131072# HELP merlionos_phys_allocated_bytes Physical memory allocatedmerlionos_phys_allocated_bytes 4194304Client Compatibility
Section titled “Client Compatibility”The API is compatible with:
- OpenAI Python SDK (
openai.ChatCompletion.create()) - LangChain (set
base_urlto MerlionOS) - LlamaIndex (OpenAI-compatible provider)
- curl / httpie / any HTTP client
Python Example
Section titled “Python Example”from openai import OpenAI
client = OpenAI( base_url="http://merlionos-host:8080/v1", api_key="not-needed" # MerlionOS doesn't require auth)
response = client.chat.completions.create( model="smollm-135m-q4", messages=[{"role": "user", "content": "Hello from Python!"}])
print(response.choices[0].message.content)QEMU Network Setup
Section titled “QEMU Network Setup”To access the API from the host when running in QEMU:
# Forward host port 8080 to QEMU guest port 8080make run-net# or manually:qemu-system-x86_64 ... \ -netdev user,id=n0,hostfwd=tcp::8080-:8080 \ -device virtio-net-pci,netdev=n0Then access from the host: http://localhost:8080/v1/chat/completions