AI Evaluation
Compatibility guide for the AI Eval workspace. The full end-user documentation now lives under Evaluations.
On this page
AI evaluation helps teams turn known inputs, expected outputs, and production trace evidence into repeatable quality measurements.
The full end-user documentation now lives in the dedicated Evaluations handbook topic:
This page remains as a short compatibility guide for existing links.
Enable the workspace with:
CLOUDGRID_AI_EVAL_ENABLED=true
VITE_CLOUDGRID_AI_EVAL_ENABLED=true
The AI Eval entry appears in the selected-project sidebar after Dashboards. The primary sections are Datasets and Evaluations.
First Setup
- Enable AI Eval in Project Settings.
- Open
/ai-eval?tab=datasets. - Create a dataset with input type, expected-output type, and optional JSON Schema for each JSON value.
- Add rows manually or import JSONL, JSON array, CSV, or ZIP files.
- Mark rows
readywhen the input, expected output, and optional reason have been reviewed. - Open
/ai-eval?tab=evaluations. - Create an evaluation that selects the dataset, split, target, metric, and run policy.
- Start a run, review metric results, then compare or optimize candidate targets.
For detailed usage, continue with Datasets, Evaluations, and Optimizations.
Troubleshooting
AI Eval entry is missing:
- Check
CLOUDGRID_AI_EVAL_ENABLED. - Check
VITE_CLOUDGRID_AI_EVAL_ENABLEDfor frontend builds. - Confirm a project is selected.
Dataset row is rejected:
- Confirm raw JSON parses.
- Confirm the value matches the dataset JSON Schema.
- Confirm the row uses
training,validation, ortest. - Confirm the curation status is one of the v2 statuses.
Run does not start:
- Confirm the dataset has ready rows in the selected split.
- Confirm the target ref and metric settings are valid.
- Check AI Eval service health in the local or deployed runtime.
Adapter-backed run times out:
- Check adapter health and timeout settings.
- Confirm trace context propagation.
- Confirm the adapter returns a final output in the expected shape.
- Treat the timeout as evidence; do not auto-promote a candidate after timeout.
Promotion is disabled:
- Confirm a candidate target snapshot is selected.
- Confirm a comparison exists.
- Confirm full validation evidence exists. Quick-shot evidence alone is not enough.
Last updated .