Upgrade And Rollback
Upgrade or roll back CloudGrid Helm deployments using verified release artifacts and digest-pinned values.
On this page
CloudGrid production upgrades use the same Helm chart and OCI images as fresh installs. Treat every upgrade as a release promotion: verify artifacts, render manifests, deploy with digest-pinned values, then record benchmark evidence.
Upgrade Inputs
Prepare:
- target release
release-manifest.json; - target
release-values.yaml; - target
cloudgrid-<version>.tgzor chart OCI version; - target SBOMs, scan reports, checksums, and signatures;
- environment overlay values;
- current deployed values and Helm revision;
- backup or recovery point for SurrealDB and NATS according to your infrastructure policy.
Do not upgrade production from latest or from an unverified chart/image combination.
Preflight
helm -n cloudgrid history cloudgrid
helm -n cloudgrid get values cloudgrid --all > cloudgrid-current-values.yaml
kubectl -n cloudgrid get pods
Verify the target release:
sha256sum --check checksums.txt
cosign verify ghcr.io/cloudgrid-dev/cloudgrid-bff@sha256:<target-digest>
Render the target manifests:
helm template cloudgrid oci://ghcr.io/cloudgrid-dev/charts/cloudgrid \
--version <target-chart-version> \
-f release-values.yaml \
-f charts/cloudgrid/profiles/enterprise.yaml \
-f cloudgrid-prod.yaml
Confirm that BFF and collector are the only public ingress candidates, and that NATS and SurrealDB remain private.
Upgrade
helm upgrade cloudgrid oci://ghcr.io/cloudgrid-dev/charts/cloudgrid \
--namespace cloudgrid \
--version <target-chart-version> \
-f release-values.yaml \
-f charts/cloudgrid/profiles/enterprise.yaml \
-f cloudgrid-prod.yaml \
--wait \
--timeout 10m
Watch rollout:
kubectl -n cloudgrid rollout status deployment/cloudgrid-bff
kubectl -n cloudgrid rollout status deployment/cloudgrid-otlp-collector
kubectl -n cloudgrid rollout status deployment/cloudgrid-storage-read
kubectl -n cloudgrid rollout status deployment/cloudgrid-storage-write
Post-Upgrade Checks
- BFF health and readiness are passing.
- Collector health and readiness are passing.
- Storage-read, storage-write, and control-plane are ready.
- Browser SSO login works.
- OTLP emitters can send data.
- GraphQL trace/log/metric reads work.
- No service logs expose session cookies, provider tokens, raw OTLP payloads, or SurrealDB credentials.
Run benchmark probes before declaring the upgraded environment production-ready:
CLOUDGRID_ENABLE_BENCHMARKS=true \
CLOUDGRID_BENCH_DEPLOYMENT_PROFILE=production-like \
CLOUDGRID_BENCH_REQUIRED=true \
CLOUDGRID_BENCH_ENVIRONMENT_ID=prod-eu-1 \
CLOUDGRID_BENCH_IMAGE_TAG=v1.0.0-beta \
CLOUDGRID_BENCH_GRAPHQL_URL=https://cloudgrid.example.com/graphql \
CLOUDGRID_BENCH_OTLP_TRACES_URL=https://otlp.cloudgrid.example.com/v1/traces \
bun run bench:production
Store the JSON result from tmp/benchmarks/ with the release promotion record.
Rollback
Use Helm rollback when the previous release is still compatible with the current data state:
helm -n cloudgrid history cloudgrid
helm -n cloudgrid rollback cloudgrid <revision> --wait --timeout 10m
Then verify rollout status and repeat the health checks.
If a release includes data migrations or storage schema changes, use the rollback instructions published with that release. Do not assume that rolling back images is sufficient after irreversible storage changes.
Image-Only Rollback
If the chart version stays the same and only service images need to roll back, use the previous verified release-values.yaml:
helm upgrade cloudgrid oci://ghcr.io/cloudgrid-dev/charts/cloudgrid \
--namespace cloudgrid \
--version <current-chart-version> \
-f previous-release-values.yaml \
-f charts/cloudgrid/profiles/enterprise.yaml \
-f cloudgrid-prod.yaml \
--wait
This should still use image digests, not mutable tags.
Rollback Record
Record:
- failed release version and digest set;
- rollback revision or replacement digest set;
- reason for rollback;
- health and benchmark results after rollback;
- any manual data recovery action.
Next Step
Use Release artifact verification before every upgrade and Sizing and scaling after capacity-affecting changes.
Last updated .