Observability
Reactor exposes a unified observability surface across all capabilities. Whether you run a single reactor-server on a VPS or a fleet of reactor-shared instances, the same endpoints and metric names apply.
Overview
Section titled “Overview”| Signal | Endpoint | Format |
|---|---|---|
| Metrics | GET /metrics | Prometheus text exposition |
| Health | GET /health | JSON aggregate |
| Diagnostics | GET /_admin/doctor | JSON per-capability probes |
| Logs | GET /_admin/logs | SSE stream (JSON lines) |
| Tracing | stdout / log aggregator | Structured JSON via tracing crate |
flowchart TB RS[reactor-server] --> M[/metrics] RS --> H[/health] RS --> D[/_admin/doctor] RS --> L[/_admin/logs] M --> Prom[Prometheus] Prom --> Graf[Grafana] L --> CLI[reactor logs] RS --> Agg[Loki / Datadog / etc.]Prometheus metrics
Section titled “Prometheus metrics”Scrape GET /metrics on your server’s bind address. The unified binary aggregates counters and histograms from every mounted capability into a single registry.
Scrape configuration
Section titled “Scrape configuration”scrape_configs: - job_name: reactor scrape_interval: 15s static_configs: - targets: ["reactor.internal:8000"] metrics_path: /metricsFor Fly.io or Kubernetes, add a scrape annotation or use service discovery.
Key metrics
Section titled “Key metrics”| Metric | Type | Description |
|---|---|---|
reactor_http_requests_total | counter | Requests by capability, method, status |
reactor_http_request_duration_seconds | histogram | Request latency |
reactor_request_latency_p99 | gauge | P99 latency (shared cluster) |
| Metric | Type | Description |
|---|---|---|
reactor_db_pool_size | gauge | Active pool connections |
reactor_db_pool_idle | gauge | Idle connections |
reactor_db_acquire_duration_seconds | histogram | Connection acquire time |
| Metric | Type | Description |
|---|---|---|
reactor_jobs_queue_depth | gauge | Pending job count |
reactor_jobs_runs_total | counter | Completed/failed runs |
reactor_jobs_run_duration_seconds | histogram | Job execution time |
| Metric | Type | Description |
|---|---|---|
reactor_fn_invocations_total | counter | Invocations by function, runtime |
reactor_fn_invoke_duration_seconds | histogram | Cold/warm invoke latency |
reactor_fn_warm_pool_size | gauge | Bun warm instances |
| Metric | Type | Description |
|---|---|---|
reactor_tenant_cache_active | gauge | Cached tenant adapters |
reactor_quota_exceeded_total | counter | Quota breaches by tenant |
supavisor_db_pool_waiting | gauge | Queries waiting for pool connection |
nats_consumer_pending | gauge | NATS consumer lag |
Example queries
Section titled “Example queries”# Error rate by capability (5m window)sum(rate(reactor_http_requests_total{status=~"5.."}[5m])) by (capability)/ sum(rate(reactor_http_requests_total[5m])) by (capability)
# P95 request latencyhistogram_quantile(0.95, sum(rate(reactor_http_request_duration_seconds_bucket[5m])) by (le, capability))
# Job queue backlog alertreactor_jobs_queue_depth > 100
# Shared cluster tenant cache pressurereactor_tenant_cache_active > 4500Alert rules
Section titled “Alert rules”groups: - name: reactor rules: - alert: ReactorHighErrorRate expr: | sum(rate(reactor_http_requests_total{status=~"5.."}[5m])) / sum(rate(reactor_http_requests_total[5m])) > 0.05 for: 5m labels: severity: warning annotations: summary: "Reactor error rate above 5%"
- alert: ReactorHealthCheckFailing expr: up{job="reactor"} == 0 for: 2m labels: severity: critical
- alert: JobQueueBacklog expr: reactor_jobs_queue_depth > 500 for: 10m labels: severity: warningHealth checks
Section titled “Health checks”GET /health
Section titled “GET /health”Returns 200 only when every mounted capability’s internal health check passes. Returns 503 with the failing capability listed otherwise.
curl -s http://localhost:8000/health | jq{ "status": "ok", "capabilities": { "auth": "ok", "data": "ok", "storage": "ok", "functions": "ok", "jobs": "ok", "sites": "ok" }}Use this endpoint for load balancer probes, Fly.io http_checks, and Kubernetes readiness probes.
GET /_admin/doctor
Section titled “GET /_admin/doctor”Deep diagnostic probes — database connectivity, storage backend reachability, runtime presence, migration state. Requires admin bearer token.
curl -s \ -H "Authorization: Bearer $REACTOR_ADMIN_TOKEN" \ http://localhost:8000/_admin/doctor | jq{ "auth": { "status": "ok", "checks": [{ "name": "db_ping", "status": "ok" }] }, "data": { "status": "ok", "checks": [{ "name": "migrations", "status": "ok" }] }, "storage": { "status": "ok", "checks": [{ "name": "fs_writable", "status": "ok" }] }, "functions": { "status": "ok", "checks": [{ "name": "wasmtime", "status": "ok" }] }}CLI equivalent:
reactor doctorLogging
Section titled “Logging”Structured tracing
Section titled “Structured tracing”Configure log output in Reactor.toml:
[tracing]filter = "info,reactor_auth=debug,reactor_data=warn,reactor_jobs=debug"fmt = "json" # use "pretty" for local developmentEach capability emits spans with its crate name as the tracing target. In the unified binary, all output fans into one stream.
Example JSON log line:
{ "timestamp": "2026-05-29T12:00:00.000Z", "level": "INFO", "target": "reactor_data", "fields": { "message": "request completed", "method": "GET", "path": "/data/v1/users", "status": 200, "duration_ms": 12 }}Admin log stream
Section titled “Admin log stream”Tail logs remotely via SSE:
# All capabilitiescurl -N \ -H "Authorization: Bearer $REACTOR_ADMIN_TOKEN" \ "http://localhost:8000/_admin/logs?follow=1"
# Filter by capabilitycurl -N \ -H "Authorization: Bearer $REACTOR_ADMIN_TOKEN" \ "http://localhost:8000/_admin/logs?capability=jobs&follow=1&since=2026-05-29T00:00:00Z"CLI:
reactor logs --followreactor logs --capability functions --since 1hOn Reactor.cloud:
reactor cloud logs my-project --followLog aggregation
Section titled “Log aggregation”reactor: logging: driver: json-file options: max-size: "50m" max-file: "5"Forward with Fluent Bit, Vector, or Promtail to Loki/Elasticsearch.
flyctl logs --app my-reactorFly ships logs to their aggregator. Export via Vector or a log drain for long-term retention.
Logs go to journald:
journalctl -u reactor-server -fDistributed tracing
Section titled “Distributed tracing”Reactor v0 uses the Rust tracing ecosystem with structured spans. OpenTelemetry export is on the roadmap — today, correlate requests via:
- Request ID — propagated in
X-Request-Idresponse headers - JWT
subclaim — user-scoped log filtering - Capability target — filter by
reactor_auth,reactor_data, etc.
For shared cluster deployments, tenant ID appears in NATS topic names (reactor.{ref}.data.*) and quota metrics labels.
Debug a slow request
Section titled “Debug a slow request”# Enable debug tracing temporarilyREACTOR_TRACING__FILTER="debug" reactor-server
# Or via config reload (restart required in v0)Dashboards
Section titled “Dashboards”Recommended Grafana panels for an O2 deployment:
| Panel | Query |
|---|---|
| Request rate | sum(rate(reactor_http_requests_total[5m])) by (capability) |
| Error rate | 5xx / total requests |
| Latency P50/P95/P99 | histogram quantiles |
| DB pool utilization | reactor_db_pool_size - reactor_db_pool_idle |
| Job queue depth | reactor_jobs_queue_depth |
| Function invocations | rate(reactor_fn_invocations_total[5m]) |
For C6@fly shared cluster, add:
- Tenant cache active vs capacity
- Quota exceeded rate by resource type
- Supavisor pool waiting count
- NATS consumer pending messages
Version endpoint
Section titled “Version endpoint”curl -H "Authorization: Bearer $TOKEN" http://localhost:8000/_admin/version{ "reactor": "0.1.0", "capabilities": { "auth": "0.1.0", "data": "0.1.0", "storage": "0.1.0", "functions": "0.1.0", "jobs": "0.1.0", "sites": "0.1.0" }}Related
Section titled “Related”- Configuration —
[tracing]settings - Security — securing admin endpoints used for logs/doctor
- Self-hosting — health check setup for Fly.io and Docker