CloudGrid

Turn every signal into owned operational evidence.

CloudGrid is the OpenTelemetry-native platform for teams that need traces, logs, metrics, dashboards, alerts, and AI evaluation to work from one project-scoped record. It is self-hosted, source-available, and built so telemetry stays inside the environment you control.

From incident to decision, the evidence stays connected.

Enterprise teams rarely fail because one signal is missing. They lose time when the trace, log line, metric trend, dashboard, alert, dataset row, and evaluation result live in different products with different ownership rules. CloudGrid keeps those pieces in one project workspace and lets the right service own each data path.

01
Capture the operational facts once.

CloudGrid receives OpenTelemetry traces, logs, and metrics into a project boundary. The same project also owns dashboards, alert rules, datasets, evaluation runs, and optimization results.

02
Keep the links that make evidence useful.

A log can point to a trace. A metric exemplar can point to a trace. A failed AI example can keep the source trace beside the dataset row and the evaluation result.

03
Make different teams work from the same record.

Platform teams investigate incidents, product teams review service behavior, and AI teams validate model or agent changes without exporting evidence into separate tools first.

04
Scale and customize the system around your environment.

The message bridge and adapter boundaries let ingestion, persistence, reads, alert delivery, and white-label customization evolve without turning the public UI into the integration layer.

Wired like a workflow. Audited like infrastructure.

CloudGrid is modular without asking users to operate the product as a pile of separate tools. The public UI talks to the BFF. Private services own telemetry semantics. The message bridge gives each path a contract, so scaling and customization stay controlled.

01
Ingest stays boring on purpose.

Collectors receive OTLP and publish bounded work. Public services do not become storage clients.

02
The bridge keeps services independent.

Request/reply reads, durable ingest, live fanout, and delivery dispatch move through explicit message contracts.

03
Storage services own telemetry semantics.

Write paths mutate storage. Read paths own filtering, sorting, grouping, counts, live matching, and authorization preparation.

04
The product surface stays focused.

Users see project evidence, dashboards, alerts, and evaluation decisions instead of infrastructure wiring.

Evaluate agents next to the spans they came from.

CloudGrid turns the telemetry you already collect into AI evaluation evidence. Curate datasets from known examples or trace-derived failures, run evaluations against a target, compare expected and actual outputs, optimize prompts or examples, and promote the candidate with validation evidence the team can inspect.

  • Schema-backed datasets with input, expected output, reason, split, and curation state.
  • Per-row results include metrics, trajectory summaries, and links back to trace evidence.
  • Provider profiles and model aliases stay project-scoped and controlled by settings.
Evaluation run · support-quality-validation
dataset: support-intents-v7 split: validation model alias: judge-fast rows: 124
complete
Exact match
82%
+14pp
Pass rate
91%
+9pp
p95 latency
420 ms
-18%
Regressions
3
-5
Improvements
28
+11
row
input
expected
actual
metric
trajectory summary
trace
cls-041 refund request billing_refund billing_refund pass Matched refund intent and ignored invoice wording. trc_92ad
ext-018 order email ORD-10482 ORD-10482 pass Extracted order id, total, currency, and two items. trc_a18f
cls-077 cancel + VAT cancellation billing_invoice fail Overweighted VAT detail; needs prompt example. trc_c443
ext-022 chat order FR FR pass Normalized country name and retained item quantity. trc_d091

Built in the open. Run on your own terms.

Source-available under Apache 2.0 with Commons Clause. One reviewable distribution, visible commercial paths, and telemetry that stays in your network unless you wire it to leave.