Multimodal Model Serving

Deploy multimodal models with image, video, and audio support in Dynamo

Dynamo supports multimodal inference across multiple LLM backends, enabling models to process images, video, and audio alongside text.

Security Requirement: Multimodal processing must be explicitly enabled at startup. See the relevant backend documentation (vLLM, SGLang, TRT-LLM) for the necessary flags. This prevents unintended processing of multimodal data from untrusted sources.