vLLM-Omni
Dynamo supports multimodal generation through the vLLM-Omni backend. This integration exposes text-to-text, text-to-image, and text-to-video capabilities via OpenAI-compatible API endpoints.
Prerequisites
This guide assumes familiarity with deploying Dynamo with vLLM as described in the vLLM backend guide.
Installation
Dynamo container images include vLLM-Omni pre-installed. If you are using pip install ai-dynamo[vllm], vLLM-Omni is not included automatically because the matching release is not yet available on PyPI. Install it separately from source:
Supported Modalities
The --output-modalities flag determines which endpoint(s) the worker registers. When set to image, both /v1/chat/completions (returns inline base64 images) and /v1/images/generations are available. When set to video, the worker serves /v1/videos.
Tested Models
To run a non-default model, pass --model to any launch script:
Text-to-Text
Launch an aggregated deployment (frontend + omni worker):
This starts Qwen/Qwen2.5-Omni-7B with a single-stage thinker config on one GPU.
Verify the deployment:
This script uses a custom stage config (stage_configs/single_stage_llm.yaml) that configures the thinker stage for text generation. See Stage Configuration for details.
Text-to-Image
Launch using the provided script with Qwen/Qwen-Image:
Via /v1/chat/completions
The response includes base64-encoded images inline:
Via /v1/images/generations
Text-to-Video
Launch using the provided script with Wan-AI/Wan2.1-T2V-1.3B-Diffusers:
Generate a video via /v1/videos:
The response returns a video URL or base64 data depending on response_format:
The /v1/videos endpoint also accepts NVIDIA extensions via the nvext field for fine-grained control:
CLI Reference
For the full list of Omni-related flags (including --omni, --output-modalities, --stage-configs-path, --media-output-fs-url, --media-output-http-url, and the --omni-* diffusion flags), run:
See also the Argument Reference in the Reference Guide.
Storage Configuration
Generated images and videos are stored via fsspec, which supports local filesystems, S3, GCS, and Azure Blob.
By default, media is written to the local filesystem at file:///tmp/dynamo_media. To use cloud storage:
When --media-output-http-url is set, response URLs are rewritten as {base-url}/{storage-path} (e.g., https://cdn.example.com/media/videos/req-id.mp4). When unset, the raw filesystem path is returned.
For S3 credential configuration, set the standard AWS environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or use IAM roles. See the fsspec S3 docs for details.
Stage Configuration
Omni pipelines are configured via YAML stage configs. See examples/backends/vllm/launch/stage_configs/single_stage_llm.yaml for an example. For full documentation on stage config format and multi-stage pipelines, refer to the vLLM-Omni Stage Configs documentation.
Current Limitations
- Only text prompts are supported as input (no multimodal input yet).
- KV cache events are not published for omni workers.
- Each worker supports a single output modality at a time.