CloudGrid Documentation
CloudGrid is an OTLP observability workspace for project-scoped traces, logs, metrics, live trace receiving, dashboards, ingest credentials, retention, and evaluations.
On this page
CloudGrid is an OTLP observability workspace for project-scoped traces, logs, metrics, live trace receiving, dashboards, ingest credentials, retention policies, alert management, and optional AI evaluation and optimization workflows.
This handbook is written as a product journey rather than a folder dump. Start by understanding the boundary of the product, run a stack, send telemetry, configure access, then move into daily workflows and operations.
Install AI Skills First
If you use an AI coding assistant with CloudGrid, install the checked-in CloudGrid skills before setup, operations, or extension work:
npx skills add cloudgrid/cloudgrid --all
The command uses the Vercel Labs skills CLI to install the CloudGrid skill
catalog. See Install AI skills for
local checkout installs, single-skill installs, and the current skill list.
Recommended Path
| Step | Read | Outcome |
|---|---|---|
| 1 | Install AI skills | Give compatible AI assistants the CloudGrid setup, operations, investigation, and extension playbooks. |
| 2 | What CloudGrid is | Understand the product boundary, implemented surfaces, and known production gaps. |
| 3 | Runtime modes | Decide whether you are running local mode or deployed SSO mode. |
| 4 | Release Compose or Local quickstart | Start CloudGrid either from published images or from source. |
| 5 | Send telemetry | Prove the OTLP collector, bridge, writer, reader, and UI path end to end. |
| 6 | Concepts | Understand companies, projects, access, signals, live traces, metrics, retention, and alerts. |
| 7 | Guides | Work with ingest credentials, traces, logs, metrics, dashboards, and AI Chat. |
| 8 | Evaluations | Build datasets, run evaluations, compare candidates, and optimize targets. |
| 9 | Configuration | Add the right local, deployed, SSO, SMTP, storage, and self-observability values. |
| 10 | Operate | Start, stop, monitor, troubleshoot, and assess production readiness. |
| 11 | Architecture | Reason about service boundaries, flows, tenancy, and extension points. |
| 12 | Reference | Look up commands, ports, environment variables, routes, contracts, and errors. |
How The Handbook Is Organized
The left navigation mirrors the journey:
| Area | What belongs there |
|---|---|
| Overview | Product scope, runtime modes, and the route tour. |
| Getting started | The two supported local paths and the first telemetry export. |
| Concepts | Companies, projects, access, signals, live traces, metrics, retention, and alerts. |
| Guides | Task guides for day-to-day observability and AI Chat work. |
| Evaluations | End-user workflows for datasets, evaluation runs, comparisons, optimization, and promotion evidence. |
| Configuration | Local mode, deployed mode, SSO, invitations, SMTP, Kubernetes, storage, and environment values. |
| Operations | Health checks, resets, bridge behavior, retention, alerting, troubleshooting, and production readiness. |
| Architecture | Internal service boundaries, flows, and extension boundaries. |
| Reference | Stable lookup tables. |
System Thumbnail
The browser talks only to the TypeScript BFF. Public telemetry reads use GraphQL. The BFF talks to private services through NATS request/reply. The collector publishes ingest commands and never writes SurrealDB directly. Only storage-write mutates telemetry, and only storage-read fetches telemetry.
Source Of Truth
User-facing docs explain how to use and operate CloudGrid. Implementation behavior is defined by the contract and spec sources summarized in Contracts. If a feature is not specified and implemented, the handbook must not present it as available.
Last updated .