Standalone Usage
⚠️ Experimental Feature: ChReK is currently in beta/preview. The ChReK DaemonSet runs in privileged mode to perform CRIU operations. Review the security implications before deploying.
This guide explains how to use ChReK (Checkpoint/Restore for Kubernetes) as a standalone component without deploying the full Dynamo platform. This is useful if you want to add checkpoint/restore capabilities to your own GPU workloads.
Table of Contents
- Overview
- Using ChReK Without the Dynamo Operator
- Prerequisites
- Step 1: Deploy ChReK
- Step 2: Build Checkpoint-Enabled Images
- Step 3: Create Checkpoint Jobs
- Step 4: Restore from Checkpoints
- Environment Variables Reference
- Checkpoint Flow Explained
- Troubleshooting
Overview
When using ChReK standalone, you are responsible for:
- Deploying the ChReK Helm chart (DaemonSet + PVC)
- Building checkpoint-enabled container images with the CRIU runtime dependencies
- Creating checkpoint jobs with the correct environment variables
- Creating restore pods that detect and use the checkpoints
The ChReK DaemonSet handles the actual CRIU checkpoint/restore operations automatically once your pods are configured correctly.
Using ChReK Without the Dynamo Operator
When using ChReK with the Dynamo operator, the operator automatically configures workload pods for checkpoint/restore. Without the operator, you must handle this configuration manually. This section documents what the operator normally injects and how to replicate it.
Container Naming
The ChReK DaemonSet needs to identify which container in your pod is the model-serving workload (as opposed to sidecars like istio-proxy or log collectors). It resolves the target container by name:
- If a container is named
main, it is selected - Otherwise, the first container in the pod spec is selected
When using the Dynamo operator, the model container is always named main. In standalone mode, you must either name your model container main or ensure it is the first container listed in your pod spec. All YAML examples in this guide use name: main.
Seccomp Profile
The operator sets a seccomp profile on all checkpoint/restore workload pods to block io_uring syscalls. The chrek DaemonSet deploys the profile file (profiles/block-iouring.json) to each node, but you must reference it in your pod specs:
Without this profile, io_uring syscalls during restore can cause CRIU failures.
Sleep Infinity Command for Restore Pods
The operator overrides the container command to ["sleep", "infinity"] on restore-target pods. This produces a Running-but-not-Ready placeholder pod that the chrek DaemonSet watcher detects and restores externally via nsenter. Without this override, the container runs its normal entrypoint (cold-starting instead of waiting for restore).
Recreate Deployment Strategy
The operator forces Recreate strategy when restore labels are present. This prevents the old and new pods from running simultaneously, which would cause failures — two pods competing for the same GPU checkpoint data. If you are using a Deployment, set this manually:
PVC Volume Mount Consistency
CRIU requires identical mount layouts between checkpoint and restore. The operator ensures the checkpoint PVC is mounted at the same path in both the checkpoint job and restore pod. When configuring manually, make sure your checkpoint job and restore pod use the exact same mountPath for the checkpoint PVC (e.g., /checkpoints).
Downward API Volume (Currently Unused)
The operator injects a Downward API volume at /etc/podinfo for post-restore identity discovery (pod name, namespace, UID). This is not currently consumed by any component — you can skip it for now.
Environment Variables
The following environment variables are normally injected by the operator. They are already documented in the Environment Variables Reference below, but note that without the operator you must set them manually:
- Checkpoint jobs:
DYN_READY_FOR_CHECKPOINT_FILE,DYN_CHECKPOINT_LOCATION,DYN_CHECKPOINT_STORAGE_TYPE,DYN_CHECKPOINT_HASH - Restore pods:
DYN_CHECKPOINT_PATH,DYN_CHECKPOINT_HASH
Prerequisites
- Kubernetes cluster with:
- NVIDIA GPUs with checkpoint support
- Privileged DaemonSet allowed (⚠️ the ChReK DaemonSet runs privileged - see Security Considerations)
- PVC storage (ReadWriteMany recommended for multi-node)
- Docker or compatible container runtime for building images
- Access to the ChReK source code:
deploy/chrek/
Security Considerations
⚠️ Important: The ChReK DaemonSet runs in privileged mode to perform CRIU checkpoint/restore operations. Your workload pods (checkpoint jobs, restore pods) do not need privileged mode — all CRIU privilege lives in the DaemonSet, which performs external restore via nsenter.
- The DaemonSet has
privileged: true,hostPID,hostIPC, andhostNetwork - This may violate security policies in production environments
- If the DaemonSet is compromised, it could potentially compromise node security
Recommended for:
- ✅ Development and testing environments
- ✅ Research and experimentation
- ✅ Controlled production environments with appropriate security controls
Not recommended for:
- ❌ Multi-tenant clusters without proper isolation
- ❌ Security-sensitive production workloads without risk assessment
- ❌ Environments with strict security compliance requirements
Technical Limitations
⚠️ Current Restrictions:
- vLLM backend only: Currently only the vLLM backend supports checkpoint/restore. SGLang and TensorRT-LLM support is planned.
- Single-node only: Checkpoints must be created and restored on the same node
- Single-GPU only: Multi-GPU configurations are not yet supported
- Network state: Active TCP connections are closed during restore
- Storage: Only PVC backend currently implemented (S3/OCI planned)
Step 1: Deploy ChReK
Install the Helm Chart
Verify Installation
Step 2: Build Checkpoint-Enabled Images
ChReK provides a placeholder target in its Dockerfile that layers CRIU runtime dependencies onto your existing container images. The DaemonSet performs restore externally via nsenter, so these dependencies must be present in the image.
Quick Start: Using the Placeholder Target (Recommended)
Example with a Dynamo vLLM image:
What the Placeholder Target Does
The ChReK Dockerfile’s placeholder stage automatically:
- ✅ Installs CRIU runtime libraries (required by
nsrestorerunning inside the pod’s namespaces) - ✅ Copies the
criubinary to/usr/local/sbin/criu - ✅ Copies
cuda-checkpointto/usr/local/sbin/cuda-checkpoint(used for CUDA state checkpoint/restore) - ✅ Copies
nsrestoreto/usr/local/bin/nsrestore(invoked by DaemonSet viansenter) - ✅ Creates checkpoint directories (
/checkpoints,/var/run/criu,/var/criu-work) - ✅ Preserves your original application image contents
The placeholder image does not override the entrypoint or CMD. For restore pods, the operator (or you, in standalone mode) overrides the command to sleep infinity.
💡 Tip: Using the
placeholdertarget is the recommended approach as it’s maintained with the ChReK codebase and ensures compatibility.
Step 3: Create Checkpoint Jobs
A checkpoint job loads your application, waits for the ChReK DaemonSet to checkpoint it, and then exits.
Required Environment Variables
Your checkpoint job MUST set these environment variables:
Required Labels
Add this label to enable DaemonSet checkpoint detection:
Example Checkpoint Job
Application Code Requirements
Your application must implement the checkpoint flow. The DaemonSet communicates with your application via Unix signals (not files):
SIGUSR1: Checkpoint completed — your process should exit gracefullySIGCONT: Restore completed — your process should wake up and continueSIGKILL: Checkpoint failed — process is terminated immediately (unhandleable)
Here’s the pattern used by Dynamo vLLM (see components/src/dynamo/vllm/checkpoint_restore.py):
Important Notes:
-
Ready File & Readiness Probe: The checkpoint job must have a readiness probe that checks for the ready file. The ChReK DaemonSet triggers checkpointing when:
- Pod has
nvidia.com/chrek-is-checkpoint-source: "true"label - Pod status is
Ready(readiness probe passes = ready file exists)
- Pod has
-
Signal handler ordering: Install signal handlers before writing the ready file. Otherwise there is a race window where the DaemonSet sends a signal while the default disposition (terminate) is still in effect.
-
Signal-based coordination: The DaemonSet sends
SIGUSR1after checkpoint completes,SIGCONTafter restore completes, andSIGKILLif checkpoint fails. Your application must handleSIGUSR1andSIGCONT(not poll for files).SIGKILLcannot be caught — the kernel terminates the process immediately. -
Three exit paths:
- SIGUSR1 received: Checkpoint complete, exit gracefully
- SIGCONT received: Process was restored, wake model and continue
- SIGKILL received: Checkpoint failed, process terminated immediately (no handler needed)
Step 4: Restore from Checkpoints
The DaemonSet performs restore externally — your restore pod just needs to be a placeholder that sleeps until the DaemonSet restores the checkpointed process into it.
Example Restore Pod
How Restore Works
- Pod starts as placeholder: The
sleep infinitycommand keeps the pod Running but not Ready - DaemonSet detects restore pod: The watcher finds pods with
nvidia.com/chrek-is-restore-target=truethat are Running but not Ready - External restore via nsenter: The DaemonSet enters the pod’s namespaces and performs CRIU restore, including GPU state
- Application continues: Your application resumes exactly where it was checkpointed
Environment Variables Reference
Checkpoint Jobs
Restore Pods
Signals (DaemonSet → Application)
The DaemonSet communicates checkpoint/restore completion via Unix signals, not files:
CRIU tuning options are configured via the ChReK Helm chart’s config.checkpoint.criu values, not environment variables. See the Helm Chart Values for available options.
Checkpoint Flow Explained
1. Checkpoint Creation Flow
2. Restore Flow
Troubleshooting
Checkpoint Not Created
Symptom: Job runs but no checkpoint appears in /checkpoints/
Checks:
-
Verify the pod has the label:
-
Check pod readiness:
-
Check ready file was created:
-
Check DaemonSet logs:
Restore Fails
Symptom: Pod fails to restore from checkpoint
Checks:
-
Verify checkpoint files exist:
-
Check DaemonSet logs for restore errors:
-
Check pod events for restore status annotations:
-
Ensure checkpoint and restore have same:
- Container image (built with
placeholdertarget) - GPU count
- Volume mounts (same
mountPathfor checkpoint PVC)
- Container image (built with
Restore Pod Not Detected
Symptom: Pod runs sleep infinity but DaemonSet never restores it
Checks:
-
Verify the pod has the required labels:
Must have both
nvidia.com/chrek-is-restore-target: "true"andnvidia.com/chrek-checkpoint-hash: "<hash>". -
Verify the pod is Running but not Ready (this is the trigger):
-
Verify the DaemonSet is running on the same node:
Additional Resources
- ChReK Helm Chart Values
- Dynamo vLLM ChReK Integration - Reference signal handler implementation
- ChReK Dockerfile
- CRIU Documentation
- CUDA Checkpoint Utility
Getting Help
If you encounter issues:
- Check the Troubleshooting section
- Review DaemonSet logs:
kubectl logs -n <namespace> daemonset/chrek-agent - Open an issue on GitHub