Handbook - Operations

Health And Readiness

CloudGrid services expose liveness and readiness probes. Readiness returns unhealthy while required dependencies are unavailable or a service is.

On this page

CloudGrid services expose liveness and readiness probes. Readiness returns unhealthy while required dependencies are unavailable or a service is draining.

Probe Table

ServiceDefault portLivenessReadiness
BFF3000/livez/readyz, /api/health
OTLP collector HTTP4318/livez/readyz
OTLP collector gRPC4317Reported by HTTP /readyzReported by HTTP /readyz
storage-read8081/livez/readyz
storage-write8082/livez/readyz
control-plane8084/livez/readyz
AI eval runner8085/livez/readyz

Local Checks

curl -fsS http://localhost:3000/readyz
curl -fsS http://localhost:4318/readyz
curl -fsS http://localhost:8081/readyz
curl -fsS http://localhost:8082/readyz
curl -fsS http://localhost:8084/readyz

Readiness Dependencies

diagram
BFF /readyz NATS request/reply collector /readyz storage-read /readyz SurrealDB query readiness storage-write /readyz control-plane /readyz
Mermaid diagram rendered with beautiful-mermaid.

Common Readiness Failures

SymptomLikely causeCheck
BFF ready but GraphQL times outPrivate service not subscribed to NATS subjectstorage-read and control-plane logs
storage-read not readySurrealDB unavailable or schema readiness failedSurrealDB logs and storage-read /readyz
collector rejects ingestinvalid auth, content type, size, or bridge publish failurecollector logs and NATS readiness
control-plane not readySurrealDB schema or configured self-observability project validation failedcontrol-plane logs

Health Versus Readiness

Liveness answers whether the process should keep running. Readiness answers whether the process should receive traffic.

Do not treat /livez as a dependency check. Use /readyz for service routing and local troubleshooting.

Next Step

Inspect private bridge behavior with Message bridge operations.

Last updated .