Gateway
Overview
Section titled “Overview”Gateway provides OpenAI-compatible endpoints for chat completions, embeddings, and model listing. Point your existing OpenAI SDK—or Reactor’s client—at Gateway and route requests to OpenRouter, Amazon Bedrock, Azure Foundry, or any OpenAI-compatible API.
A built-in model registry ships sensible defaults. Define aliases like reasoning/cheapest to let Gateway pick the best model by cost or availability. Usage events emit to Analytics for token tracking keyed by the authenticated user.
Gateway is in beta: the core API is stable, but provider coverage, quota enforcement, and regional routing continue to expand.
Key features
Section titled “Key features”- OpenAI-compatible API —
/ai/v1/chat/completions,/ai/v1/embeddings,/ai/v1/models. - Multi-provider dispatch — OpenRouter, Bedrock (SigV4), Azure Foundry, generic OpenAI-compatible APIs.
- Model registry — Built-in defaults with per-project TOML overlays.
- Routing aliases — Strategies:
first,random,round_robin,cheapest,fastest. - Streaming — Server-Sent Events with
data: {...}chunks and[DONE]terminator. - Usage metering — Token counts emitted per request for billing and analytics.
- Extension seam — Cloud deployments can inject quota checks and regional routing without forking the OSS crate.
Quickstart
Section titled “Quickstart”List models, send a chat completion, and stream the response.
reactor auth login user@example.com --password '...'reactor ai models listreactor ai test gpt-4o-mini --prompt "Explain Reactor.cloud in one sentence"reactor ai chat claude-3-5-sonnet --prompt "Write a haiku about backends" --streamimport { createClient } from '@reactor/client';
const reactor = createClient({ url: process.env.REACTOR_URL! });await reactor.auth.signInWithPassword({ email, password });
// Non-streamingconst response = await reactor.ai.chatCompletion({ model: 'gpt-4o-mini', messages: [{ role: 'user', content: 'Hello!' }],});console.log(response.choices[0].message.content);
// Streamingconst stream = await reactor.ai.chatCompletionStream({ model: 'reasoning/cheapest', messages: [{ role: 'user', content: 'Explain quantum computing' }],});for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content ?? '');}# List modelscurl -s "$REACTOR_URL/ai/v1/models" \ -H "Authorization: Bearer $REACTOR_TOKEN"
# Chat completioncurl -s -X POST "$REACTOR_URL/ai/v1/chat/completions" \ -H "Authorization: Bearer $REACTOR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o-mini", "messages": [{"role":"user","content":"Hello!"}], "max_tokens": 256 }'
# Streamingcurl -N -X POST "$REACTOR_URL/ai/v1/chat/completions" \ -H "Authorization: Bearer $REACTOR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o-mini", "messages": [{"role":"user","content":"Hello!"}], "stream": true }'How-to guides
Section titled “How-to guides”Use a routing alias
Section titled “Use a routing alias”Aliases resolve to one or more models using a strategy defined in the registry.
# ai-models.toml (project overlay)[aliases.reasoning/cheapest]strategy = "cheapest"targets = ["gpt-4o-mini", "claude-3-haiku"]
[aliases.reasoning/best]strategy = "first"targets = ["gpt-4", "claude-3-5-sonnet"]reactor ai aliases listreactor ai chat reasoning/cheapest --prompt "Summarize this API design"const response = await reactor.ai.chatCompletion({ model: 'reasoning/cheapest', messages: [{ role: 'user', content: 'Draft a product announcement' }],});Generate embeddings
Section titled “Generate embeddings”reactor ai embed text-embedding-3-small "Hello world"const { data } = await reactor.ai.embed({ model: 'text-embedding-3-small', input: ['Hello', 'World'],});console.log(data[0].embedding.length);curl -s -X POST "$REACTOR_URL/ai/v1/embeddings" \ -H "Authorization: Bearer $REACTOR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "model": "text-embedding-3-small", "input": "Hello world" }'Drop-in OpenAI SDK replacement
Section titled “Drop-in OpenAI SDK replacement”Point the official OpenAI client at Gateway:
import OpenAI from 'openai';
const openai = new OpenAI({ baseURL: `${process.env.REACTOR_URL}/ai/v1`, apiKey: process.env.REACTOR_TOKEN!, // JWT access token});
const completion = await openai.chat.completions.create({ model: 'gpt-4o-mini', messages: [{ role: 'user', content: 'Hello!' }],});Configuration
Section titled “Configuration”[gateway]# Optional registry overlay (merges with built-in defaults)registry_overlay = "./ai-models.toml"# Or fetch remotely:# registry_url = "https://example.com/models.toml"
[gateway.providers.openrouter]enabled = true# api_key via OPENROUTER_API_KEY env
[gateway.providers.bedrock]enabled = falseregion = "us-east-1"# AWS credentials via standard env vars
[gateway.providers.foundry]enabled = falseendpoint = "https://my-resource.openai.azure.com"# api_key via AZURE_FOUNDRY_API_KEY envEnvironment variables:
| Variable | Required | Description |
|---|---|---|
REACTOR_AI_BIND | No | Default 0.0.0.0:8004 (may vary by deploy) |
REACTOR_AI_AUTH_URL | Yes (microservices) | Identity service URL |
OPENROUTER_API_KEY | If OpenRouter enabled | Provider API key |
AWS_ACCESS_KEY_ID | If Bedrock enabled | AWS credentials |
AWS_SECRET_ACCESS_KEY | If Bedrock enabled | AWS credentials |
AWS_REGION | No | Default us-east-1 |
AZURE_FOUNDRY_API_KEY | If Foundry enabled | Azure API key |
AZURE_FOUNDRY_ENDPOINT | If Foundry enabled | Azure resource URL |
Reference reactor.toml section name in project config:
[ai]registry_overlay = "./ai-models.toml"Limits and quotas
Section titled “Limits and quotas”| Limit | Value | Notes |
|---|---|---|
| Authentication | Required | No anonymous inference |
| Streaming | SSE | OpenAI-compatible chunk format |
| Provider errors | Passed through | Mapped to provider_error with upstream detail |
| Quota enforcement | Cloud only | OSS ships no-op extensions |
| Credit ledger / billing | Cloud only | Consumes Analytics usage events |
| Prompt caching | Not in v0 | Planned via reactor-cache integration |
| Function calling | Partial | Full tool support across providers in progress |
Built-in alias strategies:
| Strategy | Behavior |
|---|---|
first | Use first available target |
random | Random target per request |
round_robin | Rotate across targets |
cheapest | Lowest input_price_per_mtok + output_price_per_mtok |
fastest | Provider latency heuristic (beta) |
Usage event shape (sent to Analytics):
{ "event": "ai.usage", "properties": { "model_id": "gpt-4o-mini", "user_id": "user_01HZ...", "tokens_in": 42, "tokens_out": 128 }}API and SDK links
Section titled “API and SDK links”- HTTP base path:
/ai/v1/ - OpenAPI reference: Gateway API
- JavaScript SDK:
reactor.ai,@reactor/ai - Swift SDK:
ReactorAI - CLI:
reactor ai - Guide: AI chatbot backend
| Method | Path | Description |
|---|---|---|
GET | /ai/v1/health | Service health |
GET | /ai/v1/models | List available models |
POST | /ai/v1/chat/completions | Chat (streaming or not) |
POST | /ai/v1/embeddings | Generate embeddings |
Troubleshooting
Section titled “Troubleshooting”model_not_found (404)
Section titled “model_not_found (404)”The model ID or alias is not in the registry. List models with GET /ai/v1/models or add an overlay in ai-models.toml.
provider_error (502)
Section titled “provider_error (502)”The upstream provider rejected or failed the request. Check provider credentials, rate limits, and model availability in the provider’s dashboard. Error details may include the upstream message.
reactor ai doctorreactor ai test gpt-4o-mini --prompt "ping"unauthorized (401)
Section titled “unauthorized (401)”Missing or expired JWT. Refresh your access token via Identity before calling Gateway.
Streaming stalls or incomplete responses
Section titled “Streaming stalls or incomplete responses”Some proxies buffer SSE. Connect directly to the API origin or disable buffering for /ai/v1/chat/completions when stream: true. Ensure your client reads until the [DONE] line.
quota_exceeded (429, Cloud only)
Section titled “quota_exceeded (429, Cloud only)”Your organization’s credit balance or rate limit was hit. Check usage in Analytics or your Cloud dashboard. Self-hosted OSS deployments do not enforce quotas by default.