Handbook

CloudGrid Documentation

CloudGrid is an OTLP observability workspace for project-scoped traces, logs, metrics, live trace receiving, dashboards, ingest credentials, retention, and evaluations.

On this page

CloudGrid is an OTLP observability workspace for project-scoped traces, logs, metrics, live trace receiving, dashboards, ingest credentials, retention policies, alert management, and optional AI evaluation and optimization workflows.

This handbook is written as a product journey rather than a folder dump. Start by understanding the boundary of the product, run a stack, send telemetry, configure access, then move into daily workflows and operations.

Install AI Skills First

If you use an AI coding assistant with CloudGrid, install the checked-in CloudGrid skills before setup, operations, or extension work:

npx skills add cloudgrid/cloudgrid --all

The command uses the Vercel Labs skills CLI to install the CloudGrid skill catalog. See Install AI skills for local checkout installs, single-skill installs, and the current skill list.

Recommended Path

Step	Read	Outcome
1	Install AI skills	Give compatible AI assistants the CloudGrid setup, operations, investigation, and extension playbooks.
2	What CloudGrid is	Understand the product boundary, implemented surfaces, and known production gaps.
3	Runtime modes	Decide whether you are running local mode or deployed SSO mode.
4	Release Compose or Local quickstart	Start CloudGrid either from published images or from source.
5	Send telemetry	Prove the OTLP collector, bridge, writer, reader, and UI path end to end.
6	Concepts	Understand companies, projects, access, signals, live traces, metrics, retention, and alerts.
7	Guides	Work with ingest credentials, traces, logs, metrics, dashboards, and AI Chat.
8	Evaluations	Build datasets, run evaluations, compare candidates, and optimize targets.
9	Configuration	Add the right local, deployed, SSO, SMTP, storage, and self-observability values.
10	Operate	Start, stop, monitor, troubleshoot, and assess production readiness.
11	Architecture	Reason about service boundaries, flows, tenancy, and extension points.
12	Reference	Look up commands, ports, environment variables, routes, contracts, and errors.

How The Handbook Is Organized

The left navigation mirrors the journey:

Area	What belongs there
Overview	Product scope, runtime modes, and the route tour.
Getting started	The two supported local paths and the first telemetry export.
Concepts	Companies, projects, access, signals, live traces, metrics, retention, and alerts.
Guides	Task guides for day-to-day observability and AI Chat work.
Evaluations	End-user workflows for datasets, evaluation runs, comparisons, optimization, and promotion evidence.
Configuration	Local mode, deployed mode, SSO, invitations, SMTP, Kubernetes, storage, and environment values.
Operations	Health checks, resets, bridge behavior, retention, alerting, troubleshooting, and production readiness.
Architecture	Internal service boundaries, flows, and extension boundaries.
Reference	Stable lookup tables.

System Thumbnail

The browser talks only to the TypeScript BFF. Public telemetry reads use GraphQL. The BFF talks to private services through NATS request/reply. The collector publishes ingest commands and never writes SurrealDB directly. Only storage-write mutates telemetry, and only storage-read fetches telemetry.

Source Of Truth

User-facing docs explain how to use and operate CloudGrid. Implementation behavior is defined by the contract and spec sources summarized in Contracts. If a feature is not specified and implemented, the handbook must not present it as available.

Last updated 2026-05-18.