Handbook

CloudGrid Documentation

CloudGrid is an OTLP observability workspace for project-scoped traces, logs, metrics, live trace receiving, dashboards, ingest credentials, retention, and evaluations.

On this page

CloudGrid is an OTLP observability workspace for project-scoped traces, logs, metrics, live trace receiving, dashboards, ingest credentials, retention policies, alert management, and optional AI evaluation and optimization workflows.

This handbook is written as a product journey rather than a folder dump. Start by understanding the boundary of the product, run a stack, send telemetry, configure access, then move into daily workflows and operations.

Install AI Skills First

If you use an AI coding assistant with CloudGrid, install the checked-in CloudGrid skills before setup, operations, or extension work:

npx skills add cloudgrid/cloudgrid --all

The command uses the Vercel Labs skills CLI to install the CloudGrid skill catalog. See Install AI skills for local checkout installs, single-skill installs, and the current skill list.

StepReadOutcome
1Install AI skillsGive compatible AI assistants the CloudGrid setup, operations, investigation, and extension playbooks.
2What CloudGrid isUnderstand the product boundary, implemented surfaces, and known production gaps.
3Runtime modesDecide whether you are running local mode or deployed SSO mode.
4Release Compose or Local quickstartStart CloudGrid either from published images or from source.
5Send telemetryProve the OTLP collector, bridge, writer, reader, and UI path end to end.
6ConceptsUnderstand companies, projects, access, signals, live traces, metrics, retention, and alerts.
7GuidesWork with ingest credentials, traces, logs, metrics, dashboards, and AI Chat.
8EvaluationsBuild datasets, run evaluations, compare candidates, and optimize targets.
9ConfigurationAdd the right local, deployed, SSO, SMTP, storage, and self-observability values.
10OperateStart, stop, monitor, troubleshoot, and assess production readiness.
11ArchitectureReason about service boundaries, flows, tenancy, and extension points.
12ReferenceLook up commands, ports, environment variables, routes, contracts, and errors.

How The Handbook Is Organized

The left navigation mirrors the journey:

AreaWhat belongs there
OverviewProduct scope, runtime modes, and the route tour.
Getting startedThe two supported local paths and the first telemetry export.
ConceptsCompanies, projects, access, signals, live traces, metrics, retention, and alerts.
GuidesTask guides for day-to-day observability and AI Chat work.
EvaluationsEnd-user workflows for datasets, evaluation runs, comparisons, optimization, and promotion evidence.
ConfigurationLocal mode, deployed mode, SSO, invitations, SMTP, Kubernetes, storage, and environment values.
OperationsHealth checks, resets, bridge behavior, retention, alerting, troubleshooting, and production readiness.
ArchitectureInternal service boundaries, flows, and extension boundaries.
ReferenceStable lookup tables.

System Thumbnail

diagram
OTLP sender Go OTLP collector NATS JetStream storage-write SurrealDB Browser UI TypeScript BFF storage-read control-plane
Mermaid diagram rendered with beautiful-mermaid.

The browser talks only to the TypeScript BFF. Public telemetry reads use GraphQL. The BFF talks to private services through NATS request/reply. The collector publishes ingest commands and never writes SurrealDB directly. Only storage-write mutates telemetry, and only storage-read fetches telemetry.

Source Of Truth

User-facing docs explain how to use and operate CloudGrid. Implementation behavior is defined by the contract and spec sources summarized in Contracts. If a feature is not specified and implemented, the handbook must not present it as available.

Last updated .