Handbook - Guides

AI Evaluation

Compatibility guide for the AI Eval workspace. The full end-user documentation now lives under Evaluations.

On this page

AI evaluation helps teams turn known inputs, expected outputs, and production trace evidence into repeatable quality measurements.

The full end-user documentation now lives in the dedicated Evaluations handbook topic:

This page remains as a short compatibility guide for existing links.

Enable the workspace with:

CLOUDGRID_AI_EVAL_ENABLED=true
VITE_CLOUDGRID_AI_EVAL_ENABLED=true

The AI Eval entry appears in the selected-project sidebar after Dashboards. The primary sections are Datasets and Evaluations.

First Setup

Enable AI Eval in Project Settings.
Open /ai-eval?tab=datasets.
Create a dataset with input type, expected-output type, and optional JSON Schema for each JSON value.
Add rows manually or import JSONL, JSON array, CSV, or ZIP files.
Mark rows ready when the input, expected output, and optional reason have been reviewed.
Open /ai-eval?tab=evaluations.
Create an evaluation that selects the dataset, split, target, metric, and run policy.
Start a run, review metric results, then compare or optimize candidate targets.

For detailed usage, continue with Datasets, Evaluations, and Optimizations.

AI Eval entry is missing:

Dataset row is rejected:

Run does not start:

Adapter-backed run times out:

Promotion is disabled:

Confirm a candidate target snapshot is selected.
Confirm a comparison exists.
Confirm full validation evidence exists. Quick-shot evidence alone is not enough.

Last updated 2026-05-25.