Observability

Reactor exposes a unified observability surface across all capabilities. Whether you run a single reactor-server on a VPS or a fleet of reactor-shared instances, the same endpoints and metric names apply.

Overview

Signal	Endpoint	Format
Metrics	`GET /metrics`	Prometheus text exposition
Health	`GET /health`	JSON aggregate
Diagnostics	`GET /_admin/doctor`	JSON per-capability probes
Logs	`GET /_admin/logs`	SSE stream (JSON lines)
Tracing	stdout / log aggregator	Structured JSON via `tracing` crate

flowchart TB
  RS[reactor-server] --> M[/metrics]
  RS --> H[/health]
  RS --> D[/_admin/doctor]
  RS --> L[/_admin/logs]
  M --> Prom[Prometheus]
  Prom --> Graf[Grafana]
  L --> CLI[reactor logs]
  RS --> Agg[Loki / Datadog / etc.]

Prometheus metrics

Scrape GET /metrics on your server’s bind address. The unified binary aggregates counters and histograms from every mounted capability into a single registry.

Scrape configuration

scrape_configs:
  - job_name: reactor
    scrape_interval: 15s
    static_configs:
      - targets: ["reactor.internal:8000"]
    metrics_path: /metrics

For Fly.io or Kubernetes, add a scrape annotation or use service discovery.

Key metrics

Metric	Type	Description
`reactor_http_requests_total`	counter	Requests by capability, method, status
`reactor_http_request_duration_seconds`	histogram	Request latency
`reactor_request_latency_p99`	gauge	P99 latency (shared cluster)

Metric	Type	Description
`reactor_db_pool_size`	gauge	Active pool connections
`reactor_db_pool_idle`	gauge	Idle connections
`reactor_db_acquire_duration_seconds`	histogram	Connection acquire time

Metric	Type	Description
`reactor_jobs_queue_depth`	gauge	Pending job count
`reactor_jobs_runs_total`	counter	Completed/failed runs
`reactor_jobs_run_duration_seconds`	histogram	Job execution time

Metric	Type	Description
`reactor_fn_invocations_total`	counter	Invocations by function, runtime
`reactor_fn_invoke_duration_seconds`	histogram	Cold/warm invoke latency
`reactor_fn_warm_pool_size`	gauge	Bun warm instances

Metric	Type	Description
`reactor_tenant_cache_active`	gauge	Cached tenant adapters
`reactor_quota_exceeded_total`	counter	Quota breaches by tenant
`supavisor_db_pool_waiting`	gauge	Queries waiting for pool connection
`nats_consumer_pending`	gauge	NATS consumer lag

Example queries

# Error rate by capability (5m window)
sum(rate(reactor_http_requests_total{status=~"5.."}[5m])) by (capability)
/ sum(rate(reactor_http_requests_total[5m])) by (capability)

# P95 request latency
histogram_quantile(0.95, sum(rate(reactor_http_request_duration_seconds_bucket[5m])) by (le, capability))

# Job queue backlog alert
reactor_jobs_queue_depth > 100

# Shared cluster tenant cache pressure
reactor_tenant_cache_active > 4500

Alert rules

groups:
  - name: reactor
    rules:
      - alert: ReactorHighErrorRate
        expr: |
          sum(rate(reactor_http_requests_total{status=~"5.."}[5m]))
          / sum(rate(reactor_http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Reactor error rate above 5%"

      - alert: ReactorHealthCheckFailing
        expr: up{job="reactor"} == 0
        for: 2m
        labels:
          severity: critical

      - alert: JobQueueBacklog
        expr: reactor_jobs_queue_depth > 500
        for: 10m
        labels:
          severity: warning

Health checks

`GET /health`

Returns 200 only when every mounted capability’s internal health check passes. Returns 503 with the failing capability listed otherwise.

curl -s http://localhost:8000/health | jq

{
  "status": "ok",
  "capabilities": {
    "auth": "ok",
    "data": "ok",
    "storage": "ok",
    "functions": "ok",
    "jobs": "ok",
    "sites": "ok"
  }
}

Use this endpoint for load balancer probes, Fly.io http_checks, and Kubernetes readiness probes.

`GET /_admin/doctor`

Deep diagnostic probes — database connectivity, storage backend reachability, runtime presence, migration state. Requires admin bearer token.

curl -s \
  -H "Authorization: Bearer $REACTOR_ADMIN_TOKEN" \
  http://localhost:8000/_admin/doctor | jq

{
  "auth": { "status": "ok", "checks": [{ "name": "db_ping", "status": "ok" }] },
  "data": { "status": "ok", "checks": [{ "name": "migrations", "status": "ok" }] },
  "storage": { "status": "ok", "checks": [{ "name": "fs_writable", "status": "ok" }] },
  "functions": { "status": "ok", "checks": [{ "name": "wasmtime", "status": "ok" }] }
}

CLI equivalent:

reactor doctor

Logging

Structured tracing

Configure log output in Reactor.toml:

[tracing]
filter = "info,reactor_auth=debug,reactor_data=warn,reactor_jobs=debug"
fmt = "json"    # use "pretty" for local development

Each capability emits spans with its crate name as the tracing target. In the unified binary, all output fans into one stream.

Example JSON log line:

{
  "timestamp": "2026-05-29T12:00:00.000Z",
  "level": "INFO",
  "target": "reactor_data",
  "fields": {
    "message": "request completed",
    "method": "GET",
    "path": "/data/v1/users",
    "status": 200,
    "duration_ms": 12
  }
}

Admin log stream

Tail logs remotely via SSE:

# All capabilities
curl -N \
  -H "Authorization: Bearer $REACTOR_ADMIN_TOKEN" \
  "http://localhost:8000/_admin/logs?follow=1"

# Filter by capability
curl -N \
  -H "Authorization: Bearer $REACTOR_ADMIN_TOKEN" \
  "http://localhost:8000/_admin/logs?capability=jobs&follow=1&since=2026-05-29T00:00:00Z"

CLI:

reactor logs --follow
reactor logs --capability functions --since 1h

On Reactor.cloud:

reactor cloud logs my-project --follow

Log aggregation

reactor:
  logging:
    driver: json-file
    options:
      max-size: "50m"
      max-file: "5"

Forward with Fluent Bit, Vector, or Promtail to Loki/Elasticsearch.

flyctl logs --app my-reactor

Fly ships logs to their aggregator. Export via Vector or a log drain for long-term retention.

Logs go to journald:

journalctl -u reactor-server -f

Distributed tracing

Reactor v0 uses the Rust tracing ecosystem with structured spans. OpenTelemetry export is on the roadmap — today, correlate requests via:

Request ID — propagated in X-Request-Id response headers
JWT sub claim — user-scoped log filtering
Capability target — filter by reactor_auth, reactor_data, etc.

For shared cluster deployments, tenant ID appears in NATS topic names (reactor.{ref}.data.*) and quota metrics labels.

Debug a slow request

# Enable debug tracing temporarily
REACTOR_TRACING__FILTER="debug" reactor-server

# Or via config reload (restart required in v0)

Dashboards

Recommended Grafana panels for an O2 deployment:

Panel	Query
Request rate	`sum(rate(reactor_http_requests_total[5m])) by (capability)`
Error rate	5xx / total requests
Latency P50/P95/P99	histogram quantiles
DB pool utilization	`reactor_db_pool_size - reactor_db_pool_idle`
Job queue depth	`reactor_jobs_queue_depth`
Function invocations	`rate(reactor_fn_invocations_total[5m])`

For C6@fly shared cluster, add:

Tenant cache active vs capacity
Quota exceeded rate by resource type
Supavisor pool waiting count
NATS consumer pending messages

Version endpoint

curl -H "Authorization: Bearer $TOKEN" http://localhost:8000/_admin/version

{
  "reactor": "0.1.0",
  "capabilities": {
    "auth": "0.1.0",
    "data": "0.1.0",
    "storage": "0.1.0",
    "functions": "0.1.0",
    "jobs": "0.1.0",
    "sites": "0.1.0"
  }
}

Configuration — [tracing] settings
Security — securing admin endpoints used for logs/doctor
Self-hosting — health check setup for Fly.io and Docker