Handbook - Operations

Troubleshooting

This page maps common symptoms to the most likely CloudGrid boundary.

On this page

This page maps common symptoms to the most likely CloudGrid boundary.

First Checks

docker compose --env-file .env ps
curl -fsS http://localhost:3000/readyz
curl -fsS http://localhost:4318/readyz
curl -fsS http://localhost:8081/readyz
curl -fsS http://localhost:8082/readyz
curl -fsS http://localhost:8084/readyz

Common Problems

ProblemWhat to check
Frontend shows no telemetryConfirm a project is selected, ingest returned 200, storage-write is ready, storage-read is ready, and the UI time range includes the data.
BFF logs MESSAGE_BRIDGE_TIMEOUTConfirm storage-read and control-plane are running and connected to the same CLOUDGRID_NATS_URL.
Collector returns ERR-015Missing auth in deployed mode or local token mode.
Collector returns ERR-016Token lacks required ingest scope, project is not active, or local token is unknown.
Metrics explorer has no seriesConfirm /v1/metrics ingest returned 200, selected time range includes points, and storage-read can query metric names and series.
Dashboard widget is emptyRun the same metric, log, or trace query in /metrics, /logs, or /traces. Widgets use the same project-scoped GraphQL queries.
Live view stallsCheck storage-write post-persist notifications, storage-read live session logs, and BFF WebSocket logs.

Debug Flow

diagram
No Yes No Yes No Yes No Yes No data in UI Project selected? Open /projects and select project Collector returned 200? Check collector auth/content-type/size logs storage-write ready? Check SurrealDB and storage-write logs storage-read ready? Check storage-read readiness and NATS Check UI time range and filters
Mermaid diagram rendered with beautiful-mermaid.

Error Mapping

CloudGrid public errors use canonical codes from specs/03-contracts/errors.yaml.

CodeTypical meaning
ERR-001Validation failed.
ERR-002Unsupported media type.
ERR-003Invalid cursor.
ERR-006Storage unavailable.
ERR-009Runtime configuration invalid.
ERR-013Message bridge unavailable.
ERR-014Message bridge timeout.
ERR-015Authentication required.
ERR-016Authorization forbidden.

Next Step

Use Health and readiness and Message bridge operations for deeper checks.

Last updated .